Haskell type vs. newtype with respect to type safety

2018-06-15 05:50:11

I know newtype is more often compared to data in Haskell, but I'm posing this comparison from more of a design point-of-view than as a technical problem.

In imperitive/OO languages, there is the anti-pattern "primitive obsession", where the prolific use of primitive types reduces the type-safety of a program and introduces accidentally interchangeability of same-typed values, otherwise intended for different purposes. For example, many things can be a String, but it would be nice if a compiler could know, statically, which we mean to be a name and which we mean to be the city in an address.

So, how often then, do Haskell programmers employ newtype to give type distinctions to otherwise primitive values? The use of type introduces an alias and gives a program's readability clearer semantics, but doesn't prevent accidentally interchanges of values. As I learn haskell I notice that the type system is as powerful as any I have come across. Therefore, I would think this is a natural and common practice, but I haven't seen much or any discussion of the use of newtype in this light.

Of course a lot of programmers do things differently, but is this at all common in haskell?

The main uses for newtypes are:

For defining alternative instances for types.

Documentation.

Data/format correctness assurance.

I'm working on an application right now in which I use newtypes extensively. newtypes in Haskell are a purely compile-time concept. Eg with unwrappers below, unFilename (Filename "x") compiled to the same code as "x". There is absolutely zero run-time hit. There is with data types. This makes it a very nice way to achieve the above listed goals.

-- | A file name (not a file path).
newtype Filename = Filename { unFilename :: String }
    deriving (Show,Eq)

I don't want to accidentally treat this as a file path. It's not a file path. It's the name of a conceptual file somewhere in the database.

It's very important for algorithms to refer to the right thing, newtypes help with this. It's also very important for security, for example, consider upload of files to a web application. I have these types:

-- | A sanitized (safe) filename.
newtype SanitizedFilename = 
  SanitizedFilename { unSafe :: String } deriving Show

-- | Unique, sanitized filename.
newtype UniqueFilename =
  UniqueFilename { unUnique :: SanitizedFilename } deriving Show

-- | An uploaded file.
data File = File {
   file_name     :: String         -- ^ Uploaded file.
  ,file_location :: UniqueFilename -- ^ Saved location.
  ,file_type     :: String         -- ^ File type.
  } deriving (Show)

Suppose I have this function which cleans a filename from a file that's been uploaded:

-- | Sanitize a filename for saving to upload directory.
sanitizeFilename :: String            -- ^ Arbitrary filename.
                 -> SanitizedFilename -- ^ Sanitized filename.
sanitizeFilename = SanitizedFilename . filter ok where 
  ok c = isDigit c || isLetter c || elem c "-_."

Now from that I generate a unique filename:

-- | Generate a unique filename.
uniqueFilename :: SanitizedFilename -- ^ Sanitized filename.
               -> IO UniqueFilename -- ^ Unique filename.

It's dangerous to generate a unique filename from an arbitrary filename, it should be sanitized first. Likewise, a unique filename is thus always safe by extension. I can save the file to disk now and put that filename in my database if I want to.

But it can also be annoying to have to wrap/unwrap a lot. In the long run, I see it as worth it especially for avoiding value mismatches. ViewPatterns help somewhat:

-- | Get the form fields for a form.
formFields :: ConferenceId -> Controller [Field]
formFields (unConferenceId -> cid) = getFields where
   ... code using cid ..

Maybe you'll say that unwrapping it in a function is a problem -- what if you pass cid to a function wrongly? Not an issue, all functions using a conference id will use the ConferenceId type. What emerges is a sort of function-to-function-level contract system that is forced at compile time. Pretty nice. So yeah I use it as often as I can, especially in big systems.

I think this is mostly a matter of the situation.

Consider pathnames. The standard prelude has "type FilePath = String" because, as a matter of convenience, you want to have access to all the string and list operations. If you had "newtype FilePath = FilePath String" then you would need filePathLength, filePathMap and so on, or else you would forever be using conversion functions.

On the other hand, consider SQL queries. SQL injection is a common security hole, so it makes sense to have something like

newtype Query = Query String

and then add extra functions that will convert a string into a query (or query fragment) by escaping quote characters, or fill in blanks in a template in the same way. That way you can't accidentally convert a user parameter into a query without going through the quote escaping function.

For simple X = Y declarations, type is documentation; newtype is type checking; this is why newtype is compared to data .

I fairly frequently use newtype for just the purpose you describe: ensuring that something which is stored (and often manipulated) in the same way as another type is not confused with something else. In that way it works just as a slightly more efficient data declaration; there's no particular reason to chose one over the other. Note that with GHC's GeneralizedNewtypeDeriving extension, for either you can automatically derive classes such as Num , allowing your temperatures or yen to be added and subtracted just as you can with the Int s or whatever lies beneath them. One wants to be a little bit careful with this, however; typically one doesn't multiply a temperature by another temperature!

For an idea of how often these things are used, In one reasonably large project I'm working on right now, I have about 122 uses of data , 39 uses of newtype , and 96 uses of type .

But the ratio, as far as "simple" types are concerned, is a bit closer than that demonstrates, because 32 of those 96 uses of type are actually aliases for function types, such as

type PlotDataGen t = PlotSeries t -> [String]

You'll note two extra complexities here: first, it's actually a function type, not just a simple X = Y alias, and second that it's parameterized: PlotDataGen is a type constructor that I apply to another type to create a new type, such as PlotDataGen (Int,Double) . When you start doing this kind of thing, type is no longer just documentation, but is actually a function, though at the type level rather than the data level.

newtype is occasionally used where type can't be, such as where a recursive type definition is necessary, but I find this to be reasonably rare. So it looks like, on this particular project at least, about 40% of my "primitive" type definitions are newtype s and 60% are type s. Several of the newtype definitions used to be types, and were definitely converted for the exact reasons you mentioned.

So in short, yes, this is a frequent idiom.

链接地址: http://www.djcxy.com/p/43272.html

上一篇: 不规则的孔类型分辨率

下一篇: Haskell类型与新类型相关的类型安全性