Compiling very large constants with GHC
Today I asked GHC to compile an 8MB Haskell source file. GHC thought about it for about 6 minutes, swallowing almost 2GB of RAM, and then finally gave up with an out-of-memory error.
[As an aside, I'm glad GHC had the good sense to abort rather than floor my whole PC.]
Basically I've got a program that reads a text file, does some fancy parsing, builds a data structure and then uses show
to dump this into a file. Rather than include the whole parser and the source data in my final application, I'd like to include the generated data as a compile-time constant. By adding some extra stuff to the output from show
, you can make it a valid Haskell module. But GHC apparently doesn't enjoy compiling multi-MB source files.
(The weirdest part is, if you just read
the data back, it actually doesn't take much time or memory. Strange, considering that both String
I/O and read
are supposedly very inefficient...)
I vaguely recall that other people have had trouble with getting GHC to compile huge files in the past. FWIW, I tried using -O0
, which speeded up the crash but did not prevent it. So what is the best way to include large compile-time constants in a Haskell program?
(In my case, the constant is just a nested Data.Map
with some interesting labels.)
Initially I thought GHC might just be unhappy at reading a module consisting of one line that's eight million characters long. (!!) Something to do with the layout rule or such. Or perhaps that the deeply-nested expressions upset it. But I tried making each subexpression a top-level identifier, and that was no help. (Adding explicit type signatures to each one did appear to make the compiler slightly happier, however.) Is there anything else I might try to make the compiler's job simpler?
In the end, I was able to make the data-structure I'm actually trying to store much smaller. (Like, 300KB.) This made GHC far happier. (And the final application much faster.) But for future reference, I'd be interested to know what the best way to approach this is.
Your best bet is probably to compile a string representation of your value into the executable. To do this in a clean manner, please refer to my answer in a previous question.
To use it, simply store your expression in myExpression.exp
and do read [litFile|myExpression.exp|]
with the QuasiQuotes
extension enabled, and the expression will be "stored as a string literal" in the executable.
I tried doing something similar for storing actual constants, but it fails for the same reason that embedding the value in a .hs
file would. My attempt was:
Verbatim.hs
:
module Verbatim where
import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.Meta.Parse
readExp :: String -> Q Exp
readExp = either fail return . parseExp
verbatim :: QuasiQuoter
verbatim = QuasiQuoter { quoteExp = readExp }
verbatimFile :: QuasiQuoter
verbatimFile = quoteFile verbatim
Test program:
{-# LANGUAGE QuasiQuotes #-}
module Main (main) where
import Verbatim
main :: IO ()
main = print [verbatimFile|test.exp|]
This program works for small test.exp
files, but fails already at about 2MiB on this computer.
There's a simple solution — your literal should have type ByteString
. See https://github.com/litherum/publicsuffixlist/pull/1 for details.
上一篇: 为什么一个函数只能被一个类型类型约束?
下一篇: 用GHC编译非常大的常量