Is there something better than unsafePerformIO for this....?
I've so far avoided ever needing unsafePerformIO
, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO
call, to make it lazy. IMHO, once you use unsafeInterleaveIO
, you might as well use unsafePerformIO
, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType
function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String
. Why should each main
even need to care about the library choice of a submodule many levels deep?).
I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData
into Reader [String]
, so that they can just ask for the results as needed (or some Reader Config
, where Config
holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config
I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p
do the interpreter and continue the computation after receiving a value b
:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin
:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ (Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile
and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Client
s and Server
s from Pipes.Core instead of the free monad.