How I learned Haskell

- January 27, 2017
Kwang's Haskell Blog - How I learned Haskell

How I learned Haskell

Posted on January 27, 2017 by Kwang Yul Seo
Tags: Haskell

Happy lunar new year! Today I would like to share my experience of learning Haskell. It’s been a really long journey but a really wonderful one.

First Encounter

Back in 2000, I was an undergraduate student majoring in computer science. At that time, I was into system programming and enjoyed learning low-level implementation details of operating systems and system applications. My primary language was C and programmed all programming assignments in C. I was proud that I understood how my code was compiled and executed on the machine.

One day my friend told me about Haskell. I wasn’t interested in functional programming at that time, but I became curious because he enthusiastically persuaded me to learn Haskell. I ended up reading two books on Haskell.

Both books were really nice and I learned I could program in an entirely different way. However, I wasn’t sure if Haskell could solve the real-world problems I wanted to solve. So my interest in Haskell stopped there.

Dark age

My first job was mainly about porting and optimizing Java Virtual Machine for embedded systems. My company licensed Sun’s CDC JVM and I was responsible for maintaining it.

It was 2002 and Linux was still a luxury for embedded systems. RTOSes such as pSOS and VxWorks were popular on STBs and I ported JVM to these OSes. These RTOSes didn’t have distinctions between kernel and user space and an application was linked statically with the kernel and ran as a single (kernel) process application on the device.

The implication was profound. I had no safety guarantee provided by modern operating systems. A bug in an application could corrupt the kernel data and crash the entire system. Moreover, because there were dozens of threads competing for shared resources, race conditions and dead locks were a common place. It took hours or even days to find and fix a trivial bug.

The situation was much better when debugging an application written in Java. Thanks to the safety guarantee of Java, certain types of bugs were impossible. A Java program can’t corrupt memory and crash the system. Dead locks are reported systematically by the JVM. It was relatively fun to fix a bug in Java applications.

These experiences motivated me to find systematic ways of preventing bugs and led me to read Benjamin C. Pierce’s Types and Programming Languages. It was the best computer science text book I’ve ever read. I understood why types were important in statically typed functional languages like Haskell! If universities had used this book as a undergraduate PL textbook, many of the confusions and misunderstandings about dynamic vs static typing would have disappeared.

Stuck again

By fully understanding the merits of type systems, I started to learn Haskell in a different perspective. I read A Gentle Introduction To Haskell and many tutorials on monads. It wasn’t very hard to understand specific instances of monads such as Reader, Writer, State, List and Maybe. But I couldn’t figure out how they were related. I managed to write simple applications in Haskell, but wasn’t confident that I could use Haskell productively because I couldn’t fully understand one of the core ideas of Haskell.

The Challenges of Multi-core Programming

In the meantime, I changed my career and founded a tech start-up in 2008. I built mobile web browsers for the embedded systems. I created a port of WebKit and hacked various components of WebKit to speed up the performance. The primary means for optimization was to leverage the multi-core CPU and GPU.

WebKit performs lots of tasks concurrently but it is mostly single-threaded. Loading a page does not benefit much from having a multi-core CPU. So I offloaded some tasks to separate threads but I only gained marginal performance benefits in exchange for largely increased complexity. I learned a lesson that I must pay very high costs of complexity to get small benefits of performance boost. Considering the already complex nature of WebKit, I ended up abandoning most of performance optimizations to keep the complexity under control.

While struggling to squeeze performance out of WebKit, I learned Haskell again to get some insights on parallel programming because Haskell was the only programming language which natively supported STM(Software Transaction Memory). Simon Marlow’s Parallel and Concurrent Programming in Haskell helped me understand how Haskell supported parallel and concurrent programming. Though I learned many valuable lessons from the book, I also felt that the lazy nature of Haskell didn’t go well with parallel programming.

Reunion

I have spent more than 10 years of my career on embedded systems and increasingly got frustrated with the tools available. So I decided to teach myself Haskell again and use it at work. This time I started to read classic papers on functional programming and Haskell.

Philip Wadler’s Monads for functional programming clicked my mind and I finally became enlightened. The paper is really well written, but I don’t think I could understand Monad just because I read the paper. Years of trials and errors were necessary to understand abstract concepts like monad. It was the most exciting moment of my long journey to Haskell.

Once I understood how I could learn abstractions, the rest was easy. Now I don’t get discouraged just because I don’t understand abstractions at first glance. It takes time and practice to understand abstract things. I also realized that monad was just the beginning. There exist many Haskell idioms that require other abstract concepts such as applicative functor, arrow, profunctor and so on.

Here is the list of papers I found most enlightening when learning Haskell. I also recommend you read any paper with “Functional Pearl” attached to it.

Back to Real-World

I was confident that Haskell was a good programming language and I was looking for opportunities to use Haskell in production. Bryan O’Sullivan, Don Stewart, and John Goerzen’s Real World Haskell was a good reference in this direction. It showed how I could use Haskell to do my daily work such as networking, system programming, databases and web programming.

Finally, I started to read real-world Haskell code available on the hackage and realized that the Haskell I know was different from the Haskell that is actually used. Real world Haskell uses lots of GHC extensions which makes me feel it is an entirely different language. A typical Haskell module starts with:

{-# LANGUAGE CPP                        #-}
{-# LANGUAGE FlexibleContexts           #-}
{-# LANGUAGE ConstraintKinds            #-}
{-# LANGUAGE FlexibleInstances          #-}
{-# LANGUAGE FunctionalDependencies     #-}
{-# LANGUAGE OverloadedStrings          #-}
{-# LANGUAGE QuasiQuotes                #-}
{-# LANGUAGE RecordWildCards            #-}
{-# LANGUAGE TupleSections              #-}
{-# LANGUAGE TypeFamilies               #-}
{-# LANGUAGE RankNTypes                 #-}
{-# LANGUAGE DeriveDataTypeable         #-}

It seems sticking to Haskell 98 or 2000 does not have much practical sense because many Haskell packages already use many GHC extensions. So I learned them too. 24 Days of GHC Extensions was a really great way of learning this topic.

I like the approach of Yesod Web Framework Book which explains the GHC extensions used in the library before explaining how to use the library. This is often the first step to learn a new library for many Haskell programmers. For example, you can’t use Servant unless you understand DataKinds and TypeOperators. So I encourage Haskell library authors to write more about the GHC extensions they use.

I also found that some packages are essential to use Haskell in practice.

  • String type has problems. You need either text or bytestring for efficient string data processing.
  • Lazy IO looks nice, but does not work well in practice. To process streaming data properly you need either pipes or conduit.
  • You will need a custom monad or monad transformer for your application sooner or later. Either mtl or transformers is required.
  • JSON is a really universal data exchange format these days. aeson will help you here.
  • QuickCheck is a bonus you get from using Haskell!

Haskell in production

I founded a small Haskell shop this year and started to use Haskell in production. I realized that using Haskell in production was, surprisingly, easier than learning Haskell. It took me more than 10 years to learn Haskell, but I felt confident that I could use Haskell in production only after a few months of experiments.

There is one thing I would like to emphasize. Using Haskell does not mean that you must understand all the dependencies you use. Haskell programmers tend to care much about the implementation details of their dependencies because Haskell makes it so easy to understand the meaning of programs with types and equational reasoning. But in my opinion, this is a blessed curse.

That’s not how civilization works. You can drive a car without understanding how engines work. Python or JavaScript programmers do not care about the implementation details of their dependencies because it is simply impossible. Haskell is no exception. Time and money are limited. Please don’t spend too much time on understanding things. Spend more time on building things. Be practical.

Fortunately, some library authors provide a high-level overview of their library. Type-level Web APIs with Servant is a great example. It explains the core concepts and the implementation techniques of the library without involving accidental complexities of implementation details. I would love to see more papers like this.

Tools and Libraries

Stackage and the Stack are indispensable tools for using Haskell in production. All the hard work of FP Complete gave me confidence that Haskell was production-ready. The Haskell ecosystem is not small anymore. There are multiple competing web frameworks such as Yesod, Scotty, Snap, Happstack and Servant. Qualities of these packages are all good.

If you write web servers in Haskell, all the packages you need such as web servers, web frameworks, logging packages, database drivers are already available. I use Servant, persistent and esqueleto for my server. So far, everything works fine.

Haskell Community

Haskell community is relatively small compared to other major languages, but I am often surprised by the quality of feedback I get from the community. Haskell is a great language, but the community is even greater. That’s the biggest reason why I love programming in Haskell.

My journey to Haskell is still going on.

Read more

Type inference algorithms of Haskell-like languages

- December 31, 2016
Kwang's Haskell Blog - Type inference algorithms of Haskell-like languages

Type inference algorithms of Haskell-like languages

Posted on December 31, 2016 by Kwang Yul Seo

I collected papers on the type inference algorithms used by Haskell-like languages.

Haskell

Haskell supports advanced type system features such as GADTs, type classes and type families. The current type checker implemented by GHC is described in OutsideIn(X): Modular type inference with local assumptions.

PureScript

The type checker of PureScript is inspired by the following papers. It supports type classes, row polymorphism, higher kinded polymorphism (rank N types). There are no soundness proofs yet.

Elm

Elm’s type checker is an implementation of Pottier and Rem’s The Essence of ML Type Inference with two extensions:

There is no support for type classes or higher kinded polymorphism yet.

Read more

Learn Haskell to be a better programmer

- June 1, 2016
Kwang's Haskell Blog - Learn Haskell to be a better programmer

Learn Haskell to be a better programmer

Posted on June 1, 2016 by Kwang Yul Seo

Haskell is notorious for being hard to learn. Pure functions, lazy evaluation, Haskell type system are just start.

To use Haskell effectively, it is necessary to learn the abstract concepts borrowed from the category theory such as Functor, Applicative Functor and Monad. You need to understand these concepts throughly because most of Haskell code is written using these abstract non-sense.

Of course, it is still possible to write IO code and use Maybe, Either and List types without understanding Monad. But then why do you want to learn Haskell anyway? If you avoid learning these concepts, you can’t learn much from Haskell. It would be much beneficial to learn other more practical languages.

Before explaining why you should learn Haskell, let’s ask a question. What’s the essence of programming?

Programming is basically instructing the computer to some labor. For example, “Load the value at memory x into the register, add 1 to it, and store the value back to the memory” is a program.

But a program is not just a sequence of instructions. It is a solution to our real problem. What makes programming interesting is that the problems we solve as a programmer is much bigger in size than simply loading a value from the memory and doing some arithmetic.

So programming is to divide a big problem that can’t be solved at once into many small problems, solve them independently, and compose the resulting small programs into a program that solves the original problem. In other words, the essence of programming is recomposition after decomposition. See Bartosz Milewski’s Category: The Essence of Composition.

Here comes the most important property of programming, which is called composability. We need to solve many complex problems which are similar but not exactly same. If we can compose small reusable programs into a new program which solves the new problem, the productivity of a programmer will be dramatically increased.

The changes of programming paradigm in the history can be explained as our continuous endeavor to enhance the composability. For example, the shift from assembly programming with goto to structure programming emphasizing subroutine and loop was necessary as the problem size increases. We desperately needed a better way to compose programs.

But as the complexity of problem drastically increased again in 80-90s, we needed a new programming paradigm called object-oriented programming. Classes and objects, encapsulation and information hiding were another endeavor to improve the composability of programs.

Now in 2010s, functional programming is gaining attention. The complexity of problems we have today is enormous and we need new tools and concepts to cope with ever increasing complexity. Classes are not enough. We need more composability.

Haskell provides new tools and concepts which can help organize code better. Concepts like Functor, Applicative Functor, Monad, Arrow and Lens all provide new means to compose programs. See Gabriel Gonzalez’s The category design pattern.

In fact, you already know some of these concepts. For example, ES6’s Promise, C#’s null propagation operator, Python’s list comprehension all share the same monadic structure. But you probably never noticed the common structure lying behind these different language features. After you learn Haskell, you will begin to see the common structure you’ve never imagined before.

In summary, the essence of programming is composition. Haskell provides new tools and concepts to compose programs. Learning Haskell improves your code organizational skills and make you prepared to handle more complex problems. Learn you a Haskell for great good!

Read more

Evaluation Strategy: Haskell vs Scala

- March 2, 2014
Kwang's Haskell Blog - Evaluation Strategy: Haskell vs Scala

Evaluation Strategy: Haskell vs Scala

Posted on March 2, 2014 by Kwang Yul Seo

Haskell is a non-strict language, and GHC uses a strategy called laziness which combines non-strictness and sharing for efficiency.

Thus, you can easily implement const which never uses the second argument.

const x y = x

With this definition, it is okay to pass undefined as the second argument of const because y is not never evaluated. But in Haskell, you can also make an argument strict using the BangPatterns GHC extension.

const x !y = x

Interestingly, the situation is reversed in Scala whose default evaluation strategy is strict.

def const(x: Int, y:Int) = x

You can make an argument non-strict by putting the => symbol between the variable name and the type.

def const(x: Int, y: => Int) = x
Read more

Learning Agda to be a better Haskell programmer

- February 21, 2014
Kwang's Haskell Blog - Learning Agda to be a better Haskell programmer

Learning Agda to be a better Haskell programmer

Posted on February 21, 2014 by Kwang Yul Seo

In my previous post Learning Prolog to be a better Haskell programmer, I advocated learning Prolog is quite helpful to get more intuitions on Haskell type-level programming.

I think a good next step is to learn a dependent typed programming language such as Agda or Epigram. As learning Haskell is a good way to develop oneself as a better Java programmer, learning a dependent typed programming language is a good way to develop oneself as a better Haskell programmer.

Among many dependent typed programming languages, I recommend Agda simply because its surface syntax is quite similar to that of Haskell. Because dependent typed programming languages in general are not mature enough to perform day-to-day programming task and most of them are more or less equivalent in powers, choosing a syntactically familiar language helps you understand more advanced type system behind the syntax.

As the name “dependent type” implies, the biggest difference lies in the type system. While Haskell’s type system strictly splits values and types, Agda blurs the distinction between types and values. Type level programming in Haskell with type families or functional dependencies is esoteric at best, but type level programming in Agda is a norm.

For example, it is possible to define a type of lists of a certain length. In this setting, it is a type error to pass an empty list to head.

data Vec (A : Set) : Nat -> Set where
     [] : Vec A zero
     _::_ : {n : Nat} -> A -> Vec A n -> Vec A (suc n)
 
head : {A : Set}{n : Nat} -> Vec A (suc n) -> A
head (x :: xs) = x

Please refer to Dependently Typed Programming in Agda and Daniel Peebles’s introduction on Agda for more information on Agda.

Read more

Learning Prolog to be a better Haskell programmer

- February 17, 2014
Kwang's Haskell Blog - Learning Prolog to be a better Haskell programmer

Learning Prolog to be a better Haskell programmer

Posted on February 17, 2014 by Kwang Yul Seo

While learning some advanced topics of the Haskell type system, I found type level programming is reminiscent of logic programming.

For example, Fun with Functional Dependencies shows how to implement insertion sort using functional dependencies in a programming style similar to Prolog. Fun with Type Functions also shows a similar example using type families.

This similarity is not a coincidence because of the correspondence between a logic system and a type system, which is known as Curry-Howard isomorphism.

Lesson: Learn Prolog to be a better Haskell programmer!

References

There is a discussion on the Haskell-cafe.

Read more

Data.Map vs Data.IntMap

- February 12, 2014
Kwang's Haskell Blog - Data.Map vs Data.IntMap

Data.Map vs Data.IntMap

Posted on February 12, 2014 by Kwang Yul Seo

A map is the one of most widely used data structures in many applications. Thus, many language runtimes provide an efficient implementation of a map. In a purely functional programming language, map is usually implemented as a balanced binary tree. Haskell is no exception here and the implementation of Haskell’s Data.Map is based on size balanced binary trees described in

  • Stephen Adams, “Efficient sets: a balancing act”, Journal of Functional Programming 3(4):553-562, October 1993, .
  • J. Nievergelt and E.M. Reingold, “Binary search trees of bounded balance”, SIAM journal of computing 2(1), March 1973.

Data.Map is parameterized over key and value types, so that you can use any type you want as long as key is an instance of Ord type class. So, for example, you can use Int as the key type and store any type you want.

However, Haskell also provides a special version Data.IntMap for Int key. It seems redundant at first, but Data.IntMap is different from Data.Map in that it supports efficient merging of two maps. The implementation of Data.IntMap is described in

  • Chris Okasaki and Andy Gill, “Fast Mergeable Integer Maps”, Workshop on ML, September 1998, pages 77-86,
  • D.R. Morrison, “/PATRICIA — Practical Algorithm To Retrieve Information Coded In Alphanumeric/”, Journal of the ACM, 15(4), October 1968, pages 514-534.

The author of Data.IntMap mentions that insertions and deletions of Data.IntMap when compared to a generic size-balanced map implementation are also much faster. This observation suggests that we should use Data.IntMap whenever possible whether or not we need union or intersection of twp maps.

Read more

Record wildcards

- February 10, 2014
Kwang's Haskell Blog - Record wildcards

Record wildcards

Posted on February 10, 2014 by Kwang Yul Seo

Haskell record syntax is a bit verbose. For records with many fields, it is tiresome to write each field individually in a record pattern, as in

data C = C {a :: Int, b :: Int, c :: Int, d :: Int}

f (C {a = 1, b = b, c = c, d = d}) = b + c + d

Record wildcard syntax lets us use .. in a record pattern, which simplifies pattern f=f to f. The above pattern can be rewritten with record wildcards syntax

f (C {a = 1, ..}) = b + c + d

This simple example does not show the merit of record wildcards vividly. Let’s see a real world example. hs-java is a package written by Ilya V. Portnov, which provides data types for Java .class files format and functions to assemble/disassemble Java bytecode.

The datatype for a JVM class file is Class, which has many fields as in

data Class stage = Class {
  magic :: Word32,                         -- ^ Magic value: 0xCAFEBABE
  minorVersion :: Word16,
  majorVersion :: Word16,
  constsPoolSize :: Word16,                -- ^ Number of items in constants pool
  constsPool :: Pool stage,                -- ^ Constants pool itself
  accessFlags :: AccessFlags stage,        -- ^ See @JVM.Types.AccessFlag@
  thisClass :: Link stage B.ByteString,    -- ^ Constants pool item index for this class
  superClass :: Link stage B.ByteString,   -- ^ --/-- for super class, zero for java.lang.Object
  interfacesCount :: Word16,               -- ^ Number of implemented interfaces
  interfaces :: [Link stage B.ByteString], -- ^ Constants pool item indexes for implemented interfaces
  classFieldsCount :: Word16,              -- ^ Number of class fileds
  classFields :: [Field stage],            -- ^ Class fields
  classMethodsCount :: Word16,             -- ^ Number of class methods
  classMethods :: [Method stage],          -- ^ Class methods
  classAttributesCount :: Word16,          -- ^ Number of class attributes
  classAttributes :: Attributes stage      -- ^ Class attributes
  }

It is declared as an instance of Binary class for serialization. Its put method uses the record wildcards syntax not to repeat field names as in the following:

instance Binary (Class File) where
  put (Class {..}) = do
    put magic
    put minorVersion
    put majorVersion
    putPool constsPool
    put accessFlags
    put thisClass
    put superClass
    put interfacesCount
    forM_ interfaces put
    put classFieldsCount
    forM_ classFields put
    put classMethodsCount
    forM_ classMethods put
    put classAttributesCount
    forM_ (attributesList classAttributes) put

You can see the real difference by comparing this with a more verbose version which does not use record wildcards.

instance Binary (Class File) where
  put (Class {magic=magic, minorVersion=minorVersion, majorVersion=majorVersion, constsPool=constsPool, accessFlags=accessFlags, thisCla    ss=thisClass, superClass=superClass, interfacesCount=interfacesCount, interfaces=interfaces, classFieldsCount=classFieldsCount, classFie    lds=classFields, classMethodsCount=classMethodsCount, classMethods=classMethods, classAttributesCount=classAttributesCount, classAttributes=classAttributes}) = do
 ...
Read more

Multi-line strings in Haskell

- February 6, 2014
Kwang's Haskell Blog - Multi-line strings in Haskell

Multi-line strings in Haskell

Posted on February 6, 2014 by Kwang Yul Seo

Haskell supports multi-line string literals in several ways.

unlines

unlines joins lines, after appending a terminating newline to each.

multi = unlines ["line1", "line2", "line3"]

Multi-line string literal

We should escape it using \ and then another \ where the string starts again.

multi = "line1\
\line2\
\line3"

Quasiquotation

The raw-strings-qq package provides a quasiquoter for raw string literals. In addition to supporting multi-line string, it does not recognize escape sequences. So we don’t need to add \ as in multi-line string literals. {-# LANGUAGE QuasiQuotes #-} is required to use this feature.

{-# LANGUAGE QuasiQuotes #-}
import Text.RawString.QQ
multi = [r|line1
line2
line3|]

I prefer quasiquotation because I use multi-line string literals for HTML/XML fragments and it is laborious to escape all special characters such as quotes.

{-# LANGUAGE QuasiQuotes #-}
import Text.RawString.QQ
 
multiline :: String
multiline = [r|<HTML>
<HEAD>
<TITLE>Auto-generated html formated source</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
</HEAD>
<BODY LINK="800080" BGCOLOR="#ffffff">
<P> </P>
<PRE>|]

There are other quasi-quote packages such as string-qq, string-quote and interpolatedstring-qq.

Read more

Default values of Haskell Types

- February 6, 2014
Kwang's Haskell Blog - Default values of Haskell Types

Default values of Haskell Types

Posted on February 6, 2014 by Kwang Yul Seo

Java types have default values. If you don’t explicitly assign a value in a field declaration, it gets its default value of the given type. For example, the default value of int is 0 and the default value of boolean is false.

On the contrary, Haskell types do not provide default values. This is natural because Haskell does not destructively update a variable initialized with the the default value.

However, it is handy to have a default value in some cases such as a record with many fields. Haskell libraries often provide a default value for such a record for the ease of construction. For example, llvm-general-pure provides defaultModule, which is the default value of Module. The data constructor Module has 4 fields:

  • moduleName :: String
  • moduleDataLayout :: Maybe DataLayout – a DataLayout, if specified, must match that of the eventual code generator
  • moduleTargetTriple :: Maybe String
  • moduleDefinitions :: [Definition]

Using defaultModule, you don’t need to supply all these fields. You can construct a Module value using Haskell record update syntax as in the following:

defaultModule { moduleName="mymodule" }

The data-default package provides a type class Default, which is useful for this purpose. If a given type is an instance of Default, you can get its default value using def method. Instances are provided for (), Set, Map, Int, Integer, Float, Double, and many others.

Prelude Data.Default> def :: Int
0
Prelude Data.Default> def :: [a]
[]
Prelude Data.Default> def :: Double
0.0
Read more

Top 30 Haskell packages with most reverse dependencies

- February 3, 2014
Kwang's Haskell Blog - Top 30 Haskell packages with most reverse dependencies

Top 30 Haskell packages with most reverse dependencies

Posted on February 3, 2014 by Kwang Yul Seo

Reverse Dependencies shows the reverse dependencies of each Haskell package (those packages that depend upon it). I sorted the list to check the packages with most reverse dependencies. Here are top 30 Haskell packages among 2348 packages.

  1. base 5961
  2. containers 2110
  3. bytestring 2018
  4. mtl 1664
  5. directory 1068
  6. text 979
  7. transformers 971
  8. filepath 934
  9. time 723
  10. array 696
  11. process 628
  12. QuickCheck 621
  13. network 581
  14. parsec 577
  15. random 537
  16. HUnit 491
  17. template-haskell 488
  18. vector 429
  19. binary 399
  20. test-framework 379
  21. utf8-string 334
  22. unix 331
  23. old-locale 325
  24. deepseq 323
  25. pretty 288
  26. stm 274
  27. test-framework-quickcheck2 270
  28. test-framework-hunit 264
  29. attoparsec 263
  30. aeson 254

Though the list does not show the importance of each package, there is high chance that you need to use at least one of these packages when you write your own package. So to become a seasoned Haskell programmer, you need to get accustomed to these packages.

Read more

Data.Typeable and Data.Dynamic in Haskell

- February 3, 2014
Kwang's Haskell Blog - Data.Typeable and Data.Dynamic in Haskell

Data.Typeable and Data.Dynamic in Haskell

Posted on February 3, 2014 by Kwang Yul Seo

This article is written in literate Haskell.

Data.Typeable is a way to implement dynamic (delayed) type checking in Haskell using a universal type.

For example, you can implement a heterogenous list in Haskell. toDyn converts any Typeable instance into Dynamic which is similar to Java Object type. Any type that is an instance of Typeable class can be wrapped with Dynamic type.

> import Data.Dynamic
> import Data.Maybe
> hlist :: [Dynamic]
> hlist = [ toDyn ("string" :: String)
>         , toDyn (7 :: Int)
>         , toDyn (pi :: Double)
>         , toDyn 'x'
>         , toDyn (((), Just "foo") :: ((), Maybe String))
>         ]
> dyn :: Dynamic
> dyn = hlist !! 1

To be precise, hlist is not actually a heterogenous list from the point of Haskell type system. It is just a homogenous list of Dynamic. The chapter 20, “Untyped Means Uni-Typed” of Harper’s textbook also emphasizes this observation: dynamic types (with typeable representations) are statically typed languages with only one type.

You can convert a Dynamic object back into an ordinary Haskell value using fromDynamic. Type checking is dynamic because it is delayed to runtime.

> v :: Int
> v = case fromDynamic dyn of
>         Nothing -> error "Type mismatch"
>         Just x  -> x

You can make any type Typeable by adding deriving Data.Typeable. In GHC, you need to turn on -XDeriveDataTypeable option to make GHC automatically derive the instance for you.

The Data.Typeable class is used primarily for generic programming in the “Scrap Your Boilerplate (SYB)” style. I will write more on this later.

References

Read more

Implementing Union-Find algorithms in Haskell

- January 30, 2014
Kwang's Haskell Blog - Implementing Union-Find algorithms in Haskell

Implementing Union-Find algorithms in Haskell

Posted on January 30, 2014 by Kwang Yul Seo

The union/find algorithm makes critical use of updatable states. Its efficiency relies on the set representations being simplified each time the structure is examined. Purely functional languages, which lack updatable states seem inherently inefficient to implement the algorithm.

Fortunately, we can implement the algorithm efficiently in Haskell with the help of ST monad which provides support for strict state threads. It is described in the PLDI ’94 paper by John Launchbury and Simon Peyton Jones Lazy Functional State Threads.

data UnionFind s = UnionFind {
    ids:: STUArray s Int Int
  , szs:: STUArray s Int Int
  }
 
newUnionFind :: Int -> ST s (UnionFind s)
newUnionFind n = liftM2 UnionFind (newListArray (0, n-1) [0..n-1]) (newArray (0, n-1) 1)
 
find :: (UnionFind s) -> Int -> Int -> ST s Bool
find uf p q = liftM2 (==) (root uf p) (root uf q)
 
root :: (UnionFind s) -> Int -> ST s Int
root uf i = do
    id <- readArray (ids uf) i
    if (id /= i)
        then do
            gpid <- readArray (ids uf) id
            writeArray (ids uf) i gpid
            root uf id
        else return i
 
unite :: (UnionFind s) -> Int -> Int -> ST s ()
unite uf p q = do
    i <- root uf p
    j <- root uf q
    szi <- readArray (szs uf) i
    szj <- readArray (szs uf) j
    if (szi < szj)
        then do
            writeArray (ids uf) i j
            writeArray (szs uf) j (szi + szj)
        else do
            writeArray (ids uf) j i
            writeArray (szs uf) i (szj + szi)

The code above implements the weighted quick-union with path compression specified in Union-Find Algorithms. You can see that the code is almost the same to the Java code. Because ST monad uses mutable memory internally, the behavior is also the same to that of imperative languages.

You can run the code with runST function.

main = print $ runST $ do
    uf <- newUnionFind 10
    unite uf 3 4 -- 0, 1, 2, {3, 4}, 5, 6, 7, 8, 9
    unite uf 4 9 -- 0, 1, 2, {3, 4, 9}, 5, 6, 7, 8
    unite uf 8 0 -- {0, 8}, 1, 2, {3, 4, 9}, 5, 6, 7, 8
    unite uf 2 3 -- {0, 8}, 1, {2, 3, 4, 9}, 5, 6, 7
    unite uf 5 6 -- {0, 8}, 1, {2, 3, 4, 9}, {5, 6}, 7
    unite uf 5 9 -- {0, 8}, 1, {2, 3, 4, 5, 6, 9}, 7
    unite uf 7 3 -- {0, 8}, 1, {2, 3, 4, 5, 6, 7, 9}
    unite uf 4 8 -- 1, {0, 2, 3, 4, 5, 6, 7, 8, 9}
    find uf 1 2 -- False

The complete source code is here.

Read more

Calling C library functions dynamically in Haskell

- January 28, 2014
Kwang's Haskell Blog - Calling C library functions dynamically in Haskell

Calling C library functions dynamically in Haskell

Posted on January 28, 2014 by Kwang Yul Seo
Tags: Haskell, FFI, C

Haskell FFI is used to call functions from C and for C to call Haskell functions. A classic example from Haskell Wiki page FFI Introduction is to call a C library function, sin from Haskell. In this method, you have to declare the type of C function you want to call using foreign import ccall.

{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C
 
foreign import ccall "sin" c_sin :: CDouble -> CDouble
  
sin1 :: Double -> Double
sin1 d = realToFrac (c_sin (realToFrac d))

What if you don’t know the type of a C function you want to call until you run your program (e.g., JIT compilation). You can’t use foreign import ccall because you don’t know the function type statically. In this case, you can use the Haskell binding for libffi, a Portable Foreign Function Interface Library.

import System.Posix.DynamicLinker
import Foreign.C
import Foreign.Ptr
import Foreign.LibFFI
 
sin2 :: Double -> IO Double
sin2 d = do
    sin <- dlsym Default "sin"
    ret <- callFFI sin retCDouble [argCDouble (realToFrac d)]
    return $ realToFrac ret

Now you don’t need to declare it using foreign import ccall. You can just pass the return type and the list of argument types to callFFI function.

So far so good, but there is one problem with this approach. sin2 is no longer a pure function. The return type must be IO Double because calling callFFI function requires IO monad. If you call a impure C function with side effects, this is okay though.

callFFI :: FunPtr a -> RetType b -> [Arg] -> IO b
Read more

Using LLVM in Haskell

- January 27, 2014
Kwang's Haskell Blog - Using LLVM in Haskell

Using LLVM in Haskell

Posted on January 27, 2014 by Kwang Yul Seo
Tags: Haskell, LLVM

Stephen Diehl’s Implementing a JIT Compiled Language with Haskell and LLVM is a good introductory material on how to use LLVM in Haskell. It explains how to build a compiler for a toy language, Kaleidoscope in Haskell.

Because LLVM is a compiler infrastructure written in C++, we need a Haskell binding for LLVM. The tutorial uses llvm-general. llvm-general depends on another package named llvm-general-pure which provides a pure Haskell AST (no FFI needed) for LLVM IR. BTW, Bryan O’Sullivan’s LLVM binding is deprecated in favor of llvm-general.

Read more

GHC typecheck build (no codegen)

- January 27, 2014
Kwang's Haskell Blog - GHC typecheck build (no codegen)

GHC typecheck build (no codegen)

Posted on January 27, 2014 by Kwang Yul Seo
Tags: Haskell, GHC, build

When programming in Haskell, I rebuild my program as frequently as possible to check if there is any type error. This is a good practice because Haskell finds most of bugs just by type checking.

However, as a program gets bigger, the longer it takes to build the program. This is problematic for a large project. Fortunately, GHC provides -fno-code option. When this option is given, GHC omits code generation and only performs type checking.

In combination with -fforce-recomp, one can force typecheck build with the following command:

cabal build --ghc-options="-fforce-recomp -fno-code"
Read more

Switching from monads to applicative functors

- January 26, 2014
Kwang's Haskell Blog - Switching from monads to applicative functors

Switching from monads to applicative functors

Posted on January 26, 2014 by Kwang Yul Seo

Many applications of monads actually do not require monads but only applicative functors. Monads allow us to run actions depending on the results of earlier actions, but not all applications of monads need this extended functionality.

It is better to use applicative functors where possible because there are some advantages of applicative functors. Applicative functor on the Haskell Wiki mentions two:

  • Code that uses only on the Applicative interface are more general than ones uses the Monad interface, because there are more applicative functors than monads. The ZipList is an applicative functor on lists, where liftA2 is implemented by zipWith. It is a typical example of an applicative functor that is not a monad.

  • Programming with Applicative has a more applicative/functional feel. Especially for newbies, it may encourage functional style even when programming with effects. Monad programming with do notation encourages a more sequential & imperative style.

There is another advantage. Applicative functors do not need special transformers because they can be combined in a generic way.

But there is a problem. It is usually not easy to decide if we need monads or applicative functors up front. You ambitiously start with applicative functors and find later that you actually needed monads. Sad!

Here is my tip. I start with monads but use only return, ap, liftM, liftM2, … instead of do, >>=. The most common pattern is

do x <- fx
   y <- fy
   return (g x y)

This can be rewritten as liftM2 g fx fy. Once you are sure that you need only those monad methods, you can mechanically switch from monads to applicative functors using the following translation table:

  • import Control.Monad -> import Control.Applicative
  • return -> pure
  • ap -> -> (<*>)
  • liftM -> liftA or (<$>)
  • liftM2 -> liftA2
  • (Monad m =>) -> (Applicative f =>)
Read more

Using string literal for symbols

- January 24, 2014
Kwang's Haskell Blog - Using string literal for symbols

Using string literal for symbols

Posted on January 24, 2014 by Kwang Yul Seo

In writing a compiler or an interpreter, it is common to define a symbol or identifier type which is distinct from String to make it more type-safe.

newtype Symbol = Symbol String

Now String and Symbol are distinct types and we can’t accidentally mix two. But it also means that we have to use Symbol constructor whenever we need to create a symbol from a string literal.

Symbol "main"

You can avoid this labor using the GHC extension, OverloadedStrings. By making Symbol an instance of IsString, GHC automatically coerces a string literal into a symbol.

{-# LANGUAGE OverloadedStrings #-}
import Data.String

instance IsString Symbol where
    fromString = Symbol . fromString

s :: Symbol
s = "hello"
Read more

Type Checking with Phantom Type

- January 20, 2014
Kwang's Haskell Blog - Type Checking with Phantom Type

Type Checking with Phantom Type

Posted on January 20, 2014 by Kwang Yul Seo

In a toy programming language compiler I’ve written recently, the AST for Program has a type variable a which is not used on the right-hand side of its definition.

data Program a =
    PDefs [Def]
    deriving (Eq, Show)

Here Program is a phantom type because the type parameter a does not appear after the = sign.

But why do we need a type variable if it won’t be used anyway? You can see the hint from the type signature of the type checker:

check :: (Program NoTypeAnnotation) -> Either TypeError (Program TypeAnnotation)

Once a program passes type checking, its type changes from Program NoTypeAnnotation to Program TypeAnnotation. A type variable a is used to differentiate a program with and without type annotations.

Later compiler phases explicitly require its argument type to be Program TypeAnnotation, so I can’t accidentally pass a program which hasn’t been type-checked yet. For example, desugarProgram requires its argument to be of type Program TypeAnnotation.

desugarProgram :: Program TypeAnnotation -> ProgramC

NoTypeAnnotation and TypeAnnotation are declared as empty types because their values are never needed. EmptyDataDecls language extension allows us to not specify any constructors as in the following:

data NoTypeAnnotation
data TypeAnnotation
Read more

Applicative parsing

- January 16, 2014
Kwang's Haskell Blog - Applicative parsing

Applicative parsing

Posted on January 16, 2014 by Kwang Yul Seo

While implementing a parser for a toy imperative language, I used a parser combinator library Parsec, which supports both applicative parsing and monadic parsing. I first wrote the parser in monadic style and then changed it to applicative style because it gave me a more succinct parser.

This article explains the difference between these two styles and discusses the benefits of applicative parsing over monadic parsing.

Here a simple C like program fragment:

int max(int a, b)
{
    if (a > b)
        return a;
    else
        return b;
}

The following is a snippet of the AST.

data Def = DFun Type Id [Arg] [Stm]

data Stm =
    ...
    | SIfElse Exp Stm Stm

The parser for function definitions is compact. DFun is a pure function which takes 4 arguments. For each argument, we need to perform an effectful computation, which is parsing. So I used ($) to lift a pure value to the effectful world and applied effectful arguments using (*).

def :: Parser Def
def = DFun <$> typ <*> ident <*> parens (commaSep arg) <*> braces (many stm)

The same parser can be written in monadic style as in the following. You can see that we need to create more bindings to name intermediate results.

def :: Parser Def
def = do
    t <- typ
    id <- ident
    args <- parens (commaSep arg)
    stms <- braces (many stm)
    return $ Dfun t id args stms

Let’s look at another example. ifElseStm is a parser which parses an if-else statement into a STM instance with SIfElse data constructor. Here (*>) is used to sequencing actions, discarding the result of the first operand because we don’t need to store “if” and “else” keywords in the AST.

ifElseStm :: Parser Stm
ifElseStm =  SIfElse <$> (reserved "if" *> parens exp) <*> stm <*> (reserved "else" *> stm)

The same parser also can be written in monadic style.

ifElseStm :: Parser Stm
ifElseStm =  do
    reserved "if"
    condExp <- exp
    thenStm <- stm
    reserved "else"
    elseStm <- stm
    return $ SIfElse condExp thenStm elseStm

In monadic parsing, (>>) is used to discard the result of an effectful computation. This parser also creates a lot of bindings to name intermediate results.

From these two examples, we can see the stylistic difference between applicative parsing and monadic parsing. Applicative parsing is more succinct in this particular example. This is not always true, so I usually try both options to find the better one.

However, the difference is not just on the style. The real difference is in how sequential composition is handled. Monad is more powerful than applicative functor, but that’s the exact reason why we can reason about monad less than applicative functor. This difference in turn gives applicative parsing better performance optimization chances. What are the benefits of applicative parsing over monadic parsing? explains this point well.

For more in-depth understanding of applicative programming, refer to FUNCTIONAL PEARL: Applicative programming with effects.

Read more

The power of lazy evaluation

- January 15, 2014
Kwang's Haskell Blog - The power of lazy evaluation

The power of lazy evaluation

Posted on January 15, 2014 by Kwang Yul Seo

How to replace failure by a list of successes is a classic paper which shows the power of lazy evaluation with respect to exception handling, backtracking and pattern matching. It says we don’t need special language constructs because these features can be emulated only with lazy evaluation.

The idea is simple. Each term that may raise an exception of backtrack is replaced by a term that returns a list of values; [] for failure and a singleton list for success.

  • (failure) -> []
  • (success) -> [v]

A term that might return many values through backtracking is replaced by a term that returns a list of those values.

  • (backtracking) -> [v1, v2, …]

There are two ways of combining terms that may raise exceptions or backtrack.

“or” combination is a way of combining two terms so that evaluation of the resulting term succeeds whenever evaluation of the first term or the second term succeeds.

“or” combination can be implemented with list append, (++).

“and” combination is a way of combining two terms so that evaluation of the resulting term succeeds whenever evaluation of the first term and the second term succeeds.

“and” combination can be implemented with cartesian product.

Let’s look at an example. assoc is a function that looks up entries in an association list. Given a list of pairs xys and a value x, the call assoc xys x returns y such that the pair (x, y) is in the list xys. If there is no such y, the call should raise an exception. assoc can be written in Haskell using list comprehension:

assoc xys x = [y | (x', y) <- xys, x' == x]

Here are the results of some experiments on assoc.

  • assoc [(“a”, 1), (“a”, 3)] “a” = [1,3]
  • assoc [(“a”, 1), (“b”, 2)] “b” = [2]
  • assoc [(“a”, 1), (“b”, 2)] “c” = [0]

“Or” combinator (assoc xys1 x) ? (assoic xys2 x) can be written

assoc xys1 x ++ assoc xys2 x

and “and” combinator (assoc xys x) + (assoic xys2 x) can be written

[y1 + y2 | [y1 <- assoic xys1 x, y2 <- assoc xys2 x]

Backtracking usually requires special language features such as coroutines or generators because evaluation of an expression needs to be suspended after it has returned one value, and then may be resumed later if more values are needed. Lazy evaluation provides this property without any additional language features. That is, evaluation of a list is already performed only when more elements are needed.

Read more

Overloaded string literals

- January 15, 2014
Kwang's Haskell Blog - Overloaded string literals

Overloaded string literals

Posted on January 15, 2014 by Kwang Yul Seo

In Haskell, the type of a string literal "Hello World" is always String which is defined as [Char] though there are other textual data types such as ByteString and Text. To put it another way, string literals are monomorphic.

GHC provides a language extension called OverloadedStrings. When enabled, literal strings have the type IsString a => a. IsString moudle is defined in Data.String module of base package:

class IsString a where
    fromString :: String -> a

ByteString and Text are examples of IsString instances, so you can declare the type of string literals as ByteString or Text when OverloadedStrings is enabled.

{-# LANGUAGE OverloadedStrings #-}
a :: Text
a = "Hello World"

b :: ByteString
b = "Hello World"

Of course, String is also an instance of IsString. So you can declare the type of a string literal as String as usual.

c :: String
c = "Hello World"
Read more

Difference List for Writer Monad

- January 14, 2014
Kwang's Haskell Blog - Difference List for Writer Monad

Difference List for Writer Monad

Posted on January 14, 2014 by Kwang Yul Seo

In writing a compiler in Haskell, it is conventional to use the Writer monad to accumulate emitted code.

The class declaration of MonadWriter is

class (Monoid w, Monad m) => MonadWriter w m | m -> w where

Here w must be a Monoid instance, so that MonadWriter can combine the outputs of the subcomputations using mappend.

A naive implementation of a compiler can use MonadWriter [String] to accumulate emitted code. Haskell list is an instance of Monoid and its mappend function is implemented with (++) which appends two lists.

This looks okay at first, but it is not an efficient way to use the Writer monad because the time complexity of (++) is O(n) on the length of the first operand. Because the first operand gets bigger as MonadWriter accumulates more code, the time complexity of MonadWriter [a] becomes quadratic.

A Difference list is the rescue because it is a Monoid instance which supports O(1) append and snoc operations on lists. So we can use MonadWriter (DList Instruction) instead of MonadWriter [Instruction] to accumulate emitted code and convert it to a list using DList.toList.

Wikipedia explains the term difference list as in the following:

In the second approach, difference lists are implemented as single-argument functions, which take a list as argument and prepend to that list. As a consequence, concatenation of difference lists of the second type is implemented essentially as function composition, which is O(1). However, of course the list still has to be constructed eventually (assuming all of its elements are needed), which is plainly at least O(n).

There is a chapter on Real World Haskell about the difference list: Chapter 13. Data Structures. Learn you Haskell for a Great Good also provides a chapter on the difference list: For a Few Monads More.

Read more

DoAndIfThenElse language extension

- January 13, 2014
Kwang's Haskell Blog - DoAndIfThenElse language extension

DoAndIfThenElse language extension

Posted on January 13, 2014 by Kwang Yul Seo

Have you ever encountered “Unexpected semi-colons in conditional” errors while building your project with Cabal and wondered why? This blog post explains about this puzzling error.

Here is a simple program which prints the given command line arguments.

import System.Environment
 
main = do
    args <- getArgs
    if length args < 1 then
        putStrLn "Usage: DoAndIfThenElse [args]"
    else
        putStrLn $ concat args

You can build this Haskell program with ghc –make:

ghc --make DoAndIfThenElse.hs
[1 of 1] Compiling Main             ( DoAndIfThenElse.hs, DoAndIfThenElse.o )
Linking DoAndIfThenElse ...

Okay. Then let’s create a Cabal build script and build this program with Cabal.

name:            DoAndIfThenElse
version:         0.0.1
cabal-version:   >= 1.8
build-type:      Simple
 
executable DoAndIfThenElse
  hs-source-dirs:    src
  main-is:           DoAndIfThenElse.hs
  build-depends:     base

The source code is exactly the same, but now GHC suddenly complains about the unexpected semi-colons we never inserted anyway.

src/DoAndIfThenElse.hs:5:8:
    Unexpected semi-colons in conditional:
        if length args < 1 then putStrLn
                                  "Usage: DoAndIfThenElse [args]"; else putStrLn $ concat args
    Perhaps you meant to use -XDoAndIfThenElse?

You can fix this problem by adding DoAndIfThenElse language pragma at the beginning:

{-# LANGUAGE DoAndIfThenElse #-}

Or you can fix it by changing the indentation of then and else of if expression.

if length args < 1
    then putStrLn "Usage: DoAndIfThenElse [args]"
    else putStrLn $ concat args

So the problem is on the indentation. You have to keep then and else at deeper indentation levels than the if block they belong.

ghc –make is okay because GHC automatically turns on the syntax extension DoAndIfThenElse. However, Cabal is more picky, so you have to turn it on manually either at the top of your code files, or in your Cabal files.

There is also a StackOverflow question on this issue, Unexpected semi-colons in conditional.

Read more

Parsing arithmetic expressions with Parsec

- January 7, 2014
Kwang's Haskell Blog - Parsing arithmetic expressions with Parsec

Parsing arithmetic expressions with Parsec

Posted on January 7, 2014 by Kwang Yul Seo
Tags: Haskell, Parsec

Parsec provides Text.Parsec.Expr module to parse expressions. buildExpressionParser creates an expression parser from the given operator table.

For example, let’s assume that we want to parse a simple arithmetic expression such as 1+2*3-4/2. Here * and / has higher precedence than +, - and all operators are left-associative. We can represent the grammar in BNF Converter notation.

EAdd. Exp ::= Exp "+" Exp1 ;
ESub. Exp ::= Exp "-" Exp1 ;
EMul. Exp1 ::= Exp1 "*" Exp2 ;
EDiv. Exp1 ::= Exp1 "/" Exp2 ;
EInt. Exp2 ::= Integer ;

coercions Exp 2 ;

Here’s our AST for the arithmetic expression.

data Exp = EAdd Exp Exp
         | ESub Exp Exp
         | EMul Exp Exp
         | EDiv Exp Exp
         | EInt Integer

We can build an expression parser, expr as in the following. We specify operators with their precedences and associativities in the operator table. Operators with higher precedence come first in the operator table. So * and / has higher precedence than + and - in this example.

term =  parens expr
    <|> EInt <$> natural

table = [ [binary "*" EMul AssocLeft, binary "/" EDiv AssocLeft ]
        , [binary "+" EAdd AssocLeft, binary "-" ESub AssocLeft ]
        ]

binary  name fun assoc = Infix   (do { reservedOp name; return fun }) assoc

expr :: Parser Exp
expr = buildExpressionParser table term

Text.Parsec.Expr can handle prefix and postfix operators too. We can use the following helper functions to specify these operators.

prefix  name fun       = Prefix (do { reservedOp name; return fun })
postfix name fun       = Postfix (do { reservedOp name; return fun })

See the calculator example for the complete code.

Read more

Defects of von Neumann languages

- December 26, 2013
Kwang's Haskell Blog - Defects of von Neumann languages

Defects of von Neumann languages

Posted on December 26, 2013 by Kwang Yul Seo

John Backus, the inventor of Fortran criticized the defects of von Neumann languages (aka, imperative languages) in his 1977 Turing Award Lecture. Most of his points still remain valid until today!

The defects can be summarized:

  • close coupling of semantics to state transitions
  • division of programming into a world of expressions and a world of statements
  • inability to effectively use powerful combining forms for building new programs from existing ones
  • lack of mathematical properties for reasoning about programs

He described conventional programming languages as fat and flabby because each successive programming language adds more and more features with little cleaning up. Even though most conventional languages are similar as they simply add higher level constructs to the same underlying von Neumann computing model, learning a new language still takes long time due to many subtle corner cases in their semantics.

He evaluated von Neumann model with 4 criteria:

  • Foundations: complex, bulky, not useful compared to Turing machines, various automata, Church’s lambda calculus or Curry’s system of combinators
  • History sensitivity: have storage, are history sensitive
  • Type of semantics: state transition with complex states (not simple as in automata)
  • Clarity and conceptual usefulness of programs: programs can be moderately clear, are not very useful conceptually

What’s the von Neumann computer? It has three parts: CPU, a store and a connecting tube that can transmit a single word between the CPU and the store. Programming on this machine is to change the contents of the store by repeatedly transferring single words between CPU and the store through the tube. So he called this tube the von Neumann bottleneck.

By the von Neumann bottleneck, he does not mean the limited bandwidth of the tunnel (bus). The real problem is that it becomes an intellectual bottleneck that keeps us tied to word-at-a-time thinking instead of larger conceptual thinking. The assignment statement in von Neumann languages corresponds to the von Neumann bottleneck which keeps us thinking in word-at-a-time terms in much the same way the computer’s bottleneck does.

So he proposed Functional Programming (FP) system as an alternative to von Neumann languages. FP is a functional programming language in point-free style. Please read the original paper for the details of the language. If you are interested in the implementation of FP, FP is my Haksell implementation of FP.

Read more