Passing Curiosity: Posts tagged functional programming

Yow! Conference, Sydney 2013

2017-01-02T00:00:00Z

It’s the beginning of a new year so I’m cleaning out some files in my drafts directory. This post was started on December 13, 2013.

~40 speakers
~440 attendees
three cities

YOW! LambdaJam in May was excellent and this was pretty great too. The YOW! people seem to put on great conferences.

Day one

Jeff Hawkins on machine intelligence

The day kicked off with Jeff Hawkins (of Palm and Handspring fame) giving a keynote in which he described the neurologically-inspired approach to machine intelligence being developed by his current company (Grok Solutions) and others. The basis of this approach is in building learning systems with many of the properties of biological intelligence (universality, robustness, etc.) by modelling them on the operation of neural structures in the neocortex.

One of the key points was the use of representations which enable data storage and processing in ways which are efficient and accurate enough for machine intelligence. In particular, the use of sparse distributed representations (SDR) is key to the model of intelligence described. Dense representations (such as ASCII) use a very small number of bits to represent particular states but each bit is devoid of semantic information: the state of “bit 3” in an ASCII character conveys no useful information. An SDR uses many more bits, each representing a particular feature in the learning domain (e.g. a property of objects or a word in a corpus); as such, most bits in a particular SDR instance will be 0 (hence the “sparse” in the name).

SDRs have several properties which make them useful for learning tasks: similar objects have similar representations; they allow sub-sampling without losing all meaning; they behave well with union/membership and other set operations (an SDR is, in some sense, similar to a Bloom filter). According to Jeff:

“All intelligent machines will be based on sparse distributed representations.”

The cortical learning algorithm developed by Grok Systems and implemented in the Numenta Platform for Intelligent Computing open source project (GPLv3) builds on these ideas and implements a learning system modelled on a cortical region to learn about “normal” inputs and then predict and detect anomalies from streaming input. Jeff described two applications in which this software has been deployed: monitoring and detecting anomalies in monitoring server metrics, and natural language processing.

The first example (built by Grok Systems and included in the NuPIC open source project) is used to monitor metrics from resources in Amazon Web Services and to detect anomalies in their behaviour. This approach can identify conditions which traditional (and, it must be said, much, much simpler) threshold-based approaches cannot.

The second example – developed by CEPT Systems – derives SDRs of words from Wikipedia pages and then deploys these SDRs in particular learning problems. This can be used to demonstrate the set-like properties of SDRs: sdr(apple) - sdr(fruit) = sdr(computer). A CLA trained on inputs like “ANIMAL VERB OBJECT” was able to make sensible predictions for new inputs it hadn’t seen before, including “fox” and “eat” yielding “rodent”.

This was a pretty great talk and got the conference off to a great start!

Charles Nutter on language engineering for the JVM

In the second session I saw Charles Nutter’s talk “Beyond JVM” in which he discussed the engineering issues which face JVM-targeting languages like JRuby. Charles discussed some of the pros and cons for targeting the JVM (many of the pros are also cons) and then jumped into four of the key challenges faced by the JRuby project: startup time, native interoperability, language performance, and the lack of flexibility in the JVM (the big ball of C++).

Charles discussed a number of ways to improve JVM and application startup time: tweaking JVM flags helps, but can be fragile in the face of different JVMs, JVM version changes, and typically impact later performance; keeping persistent JVM instances (using tools like Nailgun) can be cause problems cleaning up resources (memory leaks, background threads, etc); pre-loading JVMs with tools like Drip can improve performance while avoiding the cleanup problems with persistent JVMs.

The problem of native interoperability is a complex one with a range of solutions. The traditional approach used JNI which is horrible: you write code for both your intention (“I want to call getpid()”) and how to implement it. The JNR project provide a real foreign function interface on the JVM structured into a number of layers: jffi provides platform-specific FFI functionality, jnr-ffi defines structures, etc. to interface with jffi, jnr-posix exposes a range of POSIX APIs (the ones JRuby have needed so far) and jnr-constants defines a range of constants as defined on the host platform, and jnr-enxio implements Java NIO for arbitrary file descriptors (allowing a range of I/O functionality which can’t otherwise be expressed on JVM). JNR generates code which is as direct as possible for each particular case, resulting in very low overheads for each call.

One of the key motivations for JRuby is language performance. While the JVM specification made mention of non-Java languages, it didn’t go out of it’s way to actually support them. The relatively new invokedynamic bytecode allows language implementers to customise invocation mechanisms to suit the specifics of their language. The JVM will cache and optimise the results of dynamic invocations as normal. This can result in plain ruby code run on JRuby being faster than using a native extension under CRuby (redblack tree benchmark).

Finally, Charles discussed approaches that language implementors can use to deal with the inflexibility of the JVM internals. The Graal project allows language implementors to customise the way that their implementations are optimised and emit the ASM/HotSpot intermediate representation appropriate for the particular language’s constructs. Truffle, a framework built on top of Graal, allows you to implement an interpreter for your language (structured and annotated in a particular way) and to automatically derive a JIT for it. (This sounds a little like the second Futamura projection to me.)

This talk was very well presented and very informative. If I’d known it was “about” JRuby I probably wouldn’t have gone but I’m glad I did!

Julien Verlaguet on Facebook’s static typing for PHP

Julien Verlaguet is an engineer at Facebook and spoke about the work they’ve done to improve on the PHP language with HHVM and “Hack” - a statically typed version of PHP which was the primary subject of the talk.

Contrary to Facebook’s earlier attempts at improving the deployment and runtime story for PHP (the HipHop compiler translated PHP code into C++ which compiled into a native binary), HHVM is a fairly traditional virtual machine with a JIT. The HHVM blog has a bunch of interesting posts about the development of the VM and the JIT both, go read it!

HHVM supports two source languages: normal PHP and Hack. Hack (the code name might change) is a statically typed variant of PHP which is compatible with PHP, uses the same run-time representations within the VM and was designed for incremental adoption (a necessity when dealing with massive codebases like Facebook.com).

The static typing for Hack requires that the programmer add type annotations to class members, function parameters and return values and infers all other types. The types supported include the basic types built-in to PHP, collections and generics. It also distinguishes the types of nullable and non-nullable values. PHP was not designed for type checking, so the type checker must make several allowances. The most interesting is, perhaps, the delay of type unification to call sites rather than function definitions.

The Hack type checker is implemented as a daemon which listens for file system events on the code base and communicates with a client to “run” a check and present errors. The errors are designed to give specific, useful feedback to the programmer including references to each annotation which resulted in the error (“it tells a story”). The checker is also able to output coloured “coverage” style reports of code showing which code is checked/unchecked.

Conversion of existing PHP to Hack has happened in two ways: organic adoption by developers as they and their teams take up Hack; and automatic conversion using tools to analyse, refactor and monitor changes in the code base. This includes support for “soft” conversions, which are monitored but not enforced until they are known to be accurate.

Hack and HHVM sound like great improvements over PHP. I never got around to trying HPHP before it went away but perhaps I’ll give HHVM a go.

Kevlin Henney deconstructed the SOLID principles

Kevlin Henney

I’m not really one for talks about methodologies and such, but Kevlin’s talk “the SOLID Design Principles Deconstructed” was entertaining and not a little informative.

Gilad Bracha on Dart and Newspeak

Gilad Bracha is an engineer at Google where he works on Dart. He spoke about Dart and Newspeak.

Joe Albahari on concurrency in .NET

Joe Albahari spoke about concurrency in C# 5.

Scott Hanselman on the web platform

Scott Hanselman works on Azure and ASP.NET for Microsoft.

Day Two

Functional programming in industry

Kornelis Sietsma, Michael Neale and Jed Wesley-Smith gave a set of three talks about the adoption and use of functional programming languages at three different companies.

Jay Fields on adopting Clojure

Jay Fields

Daniel Spiewak on modules and the expression problem

Daniel Spiewak

Stewart Gleadow on mobile app and their APIs

Stewart Gleadow

Sponsors and Exhibitors

Sponsors include Suncorp, DiUS, ThoughtWorks, Mashery,

Some ScalaCheck generators fail too much

2015-12-29T00:00:00Z

Most of the code I’ll be working with in my new job (BTW blog: I have a new job) is written in Scala and uses property based testing with ScalaCheck. Yesterday I ran into a problem with an existing test suite that suddenly began failing with too many discarded tests:

[info] FormattersSpec
[info]   Formatters are invertible for:
[info]     + Mapping
[info]     + Identifier
[info]
[error]     x Metadata
[error]  Gave up after only 39 passed tests. 197 tests were discarded. (FormattersSpec.scala:11)

This test generates random Metadata values and makes sure that they can be serialised and deserialised correctly (i.e. values can be round-tripped). The property being test here is identical, only the Arbitrary, Serialise, and Deserialise instances vary in each case. The truly odd thing is that the pertinent code looks like this:

case class Identifier(name: String)
case class Metadata (id: Identifier, maps: Set[Identifier])

implicit val ArbIdentifier = Arbitrary(
  for {
    name  <- arbitrary[String]
  } yield Identifier(name)
)

implicit val ArbMetadata = Arbitrary(
  for {
    identifier <- arbitrary[Identifier]
    mappings   <- arbitrary[Set[Identifier]]
  } yield Metadata(identifier, mappings)
)

My first step was redefining a few related Arbitrary instances to avoid using suchThat (which discards invalid values) but this didn’t fix the problem. Eventually I tried redefining ArbMetadata like this:

implicit val ArbMetadata = Arbitrary(
  for {
    identifier <- arbitrary[Identifier]
    mappings   <- Gen.const(Set.empty[Set[Identifier]])
  } yield Metadata(identifier, mappings)
)

and the problem went away. Trying to use arbitrary[Set[Identifier]] in various ways in the Scala REPL confirmed that it is the problem; we can easily generate as large a List[Identifier] as we like, but a Set[Identifier] fails fairly frequently:

// This always generates a Some[List[Identifier]] value.
Gen.listOfN(100, arbitrary[Identifier]).map(_.length).sample
// Sometimes we get a Some[List[Set[Identifier]]] and others None.
Gen.listOfN(100, arbitrary[Set[Identifier]]).map(_.length).sample

It appears as though whatever mechanism is used by arbitrary[Set[_]] to construct the sets, it doesn’t fails when the generator for the value type returns duplicate elements. You can confirm this easily by trying arbitrary[Set[Unit]]; any Gen[Unit] has no choice but to return a the single value of type Unit (or to fail) and, as expected, this almost never succeeds. Replacing the problematic arbitrary[Set[Identifier]] in the original code with arbitrary[Seq[Identifier]].map(_.toSet) resolves the issue: constructing a set from a list of possibly duplicate Identifiers always works.

After a bit of reading in the ScalaCheck source code it seems as though the root cause of this problem is some instance of CanBuildFrom[Set[_], A, Set[A]] but I’ve no idea how to go about figure out which one or why it’s broken. In any case, I now know a bit more about working with Scala.

For more information, see the ScalaCheck issue #89.

Testing multiple GHC versions on Travis CI

2015-04-04T00:00:00Z

GHC 7.10.1 was recently released and, if nothing else, that means there’s a new version of the base library so a lot of developers will need to bump the upper bound specified in their cabal files. Having multiple versions of the compiler installed isn’t all that difficult, but actually building an testing a cabal package with multiple compiler versions is pretty tedious. It’s easy enough to do on Travis CI though and with just a little bit of cargo-culting, you too can be extending your set of supported GHC versions.

Multiple versions of GHC on Travis CI

Travis CI, for those who aren’t familiar with it, is a continuous integration service with pretty tight integration with GitHub. Using it is pretty straightforward: you add a YAML file to your repository describing how to test your project and then you turn it on. Using matrix variables in your YAML file you can specify multiple values for various aspects of your build process and Travis CI will run your job multiple times - once for each combination of values. The standard approach to testing with multiple GHC versions on Travis CI uses this to specify which versions of GHC and Cabal to install and use in the build; specify four versions and get run four times. Magic!

The .travis.yml file you use to do this is a little more complex than the usual one saying “make my Haskell go!”, but you can generally just copy it around from project to project. I edited my .travis.yml file slightly to tweak the way Travis CI sends me email and to select the versions of GHC that I care about and now every build for that project automatically covers all the cases I care about.

So what?

My edit-distance-vector package is very simple: it’s one module with 166 lines (including comments and white space) and 100 lines of tests (again, including comments and white space). Here are the issues picked up by testing with four versions of GHC:

Obviously, the version bounds on the base library need to be broadened. I’ve used base >=4.5 && <4.9 now.
Next I learn that the Sum type didn’t have a Num instance in earlier versions. This means that constants like 1 can’t have types like Sum Int so I’ve just applied the Sum constructor manually: Sum 1.
Then I learned that importing a module hiding something that it doesn’t export used to be an error (it is now a warning).

The commit fixing these issues is pretty trivial but made the library usable in a wider range of environments. Yay!

I think I’ll be using this by default in new Haskell repositories; when my code doesn’t work with some version of GHC I’d like it to be because I decided to do it, not just that I didn’t know.

Released some Haskell packages

2015-04-03T00:00:00Z

edit-distance-vector

The edit-distance-vector package is a small library for calculating the optimal edit script and cost to transform one sequence of values into another. The implementation uses the Wagner-Fischer algorithm and the rather fun constructN function.

I have a draft blog post on the way about the details of this package but until that’s done you’ll have to make do with the documentation.

aeson-diff

The aeson-diff package includes a library and two command-line programs for extracting the differences between two JSON documents and for applying these changes. The commands are:

aeson-diff which compares two JSON documents and generates a patch describing the differences between them; and
aeson-patch which takes a JSON document and updates it according to patch.

I find the aeson-diff command quite useful for comparing different versions of the JSON documents spewed out by several systems I have to deal with at work.

Reading from processes safely in Haskell

2015-03-04T00:00:00Z

Quite a bit of my work lately has been implementing HTTP interfaces to existing systems. In a few cases this required invoking existing command-line tools and parsing their output. The naive approach to invoking a process in Haskell and reading its output goes something like this:

import System.Exit
import System.Process

main :: IO ()
main = do
    let p = (shell "cat /usr/share/dict/words")
            { std_in  = Inherit
            , std_out = CreatePipe
            , std_err = Inherit
            }
    (Nothing, Just out, Nothing, ph) <- createProcess p
    ec <- waitForProcess ph
    case ph of
        ExitSuccess   -> hGetContents out >>= print
        ExitFailure _ -> error "Bad things happened. :-("

There is a potential problem in this code: we wait until the process has terminated before reading the Handle allowing its output to accumulate in the pipe buffer managed by the operating system in the mean time. This buffer has a fixed size on most systems (this is a good thing!); when it fills up, the writing process will go to sleep until the reader has consumed some data and freed some buffer space to hold the next write. Alas, the reader (the Haskell code above) is sleeping, waiting for the writer to terminate. The reader is sleeping, waiting for the writer to terminate; and the writer is sleeping, waiting for the reader to read. This is a deadlock!

The solution is to do the Right Thing (tm) and take care of any buffering behaviour we want ourselves. Thankfully this is pretty straightforward and it’s the sort of code you generally only need to write once. The very simplest case – reading from a process with a single output Handle – looks like this:

gatherOutput :: ProcessHandle -> Handle -> IO (ExitCode, ByteString)
gatherOutput ph h = work mempty
  where
    work acc = do
        -- Read any outstanding input.
        bs <- BS.hGetNonBlocking h (64 * 1024)
        let acc' = acc <> bs
        -- Check on the process.
        s <- getProcessExitCode ph
        -- Exit or loop.
        case s of
            Nothing -> work acc'
            Just ec -> do
                -- Get any last bit written between the read and the status
                -- check.
                last <- BS.hGetContents h
                return (ec, acc' <> last)

This is essentially a loop which reads some input from the Handle (possibly an empty string), checks to see if the process has terminated, and either returns the accumulated input or loops again. Extending this to gather the output of two handles (like stderr and stdout) is relatively straightforward.

Multiple JSON encodings in Haskell

2015-02-12T00:00:00Z

I’m currently working on a small RESTful API to control a system with a command-line interface. The command produces JSON output but it’s not really ideal to expose in an API. This post describes the approach I took to supporting two different JSON encodings for the same set of data types – one for communicating with API clients and another for communicating with the upstream system.

I’ll start with some data types to represent the data my API manages. In this post I’ll use the example of a painting robot. The robot can carry several colours of paint but can only paint with one “active” colour at a time. Here are some data types to represent these details:

newtype ColourName = ColourName { unColourName :: Text }
  deriving (Eq, Show)

data Colour = Colour
    { colorName :: ColourName
    , colourRGB :: (Word8, Word8, Word8)
    }
  deriving (Eq, Show)

newtype RobotName = RobotName { unRobotName :: Text }
  deriving (Eq, Show)

data Robot = Robot
    { robotName :: RobotName
    , robotActiveColour :: ColourName
    , robotAvailableColours :: [Colour]
    }

JSON for the API clients

The JSON encoding of Robot that I’d like to provide to API clients is pretty straightforward:

{ "name" : "Rosie the robot"
, "activeColour" : "red"
, "availableColours" :
    { "red"   : { "R": 255, "G":   0, "B":   0}
    , "green" : { "R":   0, "G": 255, "B":   0}
    , "blue"  : { "R":   0, "G":   0, "B": 255}
    }
}

The Haskell code to parse this JSON using aeson is straightforward too (though please note that the instances derived for the newtype are only safe to use within a larger JSON structure as they result in bare JSON strings, not objects or arrays):

deriving instance FromJSON ColourName
deriving instance ToJSON ColourName

instance FromJSON [Colour] where
    parseJSON (Object v) = mapM (uncurry colour) $ HashMap.toList v
      where
        colour name (Object o) = Colour
                <$> parseJSON (String name)
                <*> ((,,) <$> o .: "R" <*> o .: "G" <*> o .: "B")
        colour _ _ = fail "Colour must be a JSON object"
    parseJSON _ = fail "Colours must be a JSON object"

deriving instance FromJSON RobotName
deriving instance ToJSON RobotName

instance FromJSON Robot where
    parseJSON (Object v) = Robot
        <$> v .: "name"
        <*> v .: "activeColour"
        <*> v .: "availableColours"
    parseJSON _ = fail "Robot must be a JSON object"

To talk to the upstream system I’ll use the process library to execute a command which produces JSON on its standard output. A simple function to invoke a command, parse the JSON, and return the value (or an error) keeps the boilerplate contained:

shellOutJSON
    :: (MonadError String m, MonadIO m, FromJSON a)
    => String
    -> [String]
    -> m a
shellOutJSON cmd args = do
    -- Execute the command.
    (exit_code, out, _err) <- liftIO $ readProcessWithExitCode cmd args ""

    -- Check it succeeded.
    output <- case exit_code of
        ExitSuccess -> return $ BS.pack out
        ExitFailure err -> throwError $
            "Could not execute command: error " <> show err

    -- Decode the JSON.
    case eitherDecode output of
        Left e -> throwError $ "Error decoding JSON: " <> e
        Right v -> return v

It’s important to note that the call site is responsible for fixing the type a of value to be parsed from the JSON. This means that shellOutJSON will happily attempt to parse the JSON into any type you ask it to (so long as it has a FromJSON instance), whether or not you should expect the command to produce such JSON. The obvious potential problem – a caller asking for data in the wrong format – occurred twice in a dozen lines of code in my current project.

JSON for the upstream system

The second JSON encoding is the one used to communicate with the command-line application. The main difference from the API encoding is that it represents the active colour by adding a status property to each colours; exactly one of them is active and the rest are available. Rosie the robot is looks like this:

{ "name" : "Rosie the robot"
, "colours" :
    { "red"   : { "R": 255, "G":   0, "B":   0, "status": "active"}
    , "green" : { "R":   0, "G": 255, "B":   0, "status": "available"}
    , "blue"  : { "R":   0, "G":   0, "B": 255, "status": "available"}
    }
}

This is structure is great if you are using the data to output a nice table for a human to read but not so great in an API.

This additional format could be implemented with new data types to represent robots and colours and a few conversion functions (probably using the excellent lens package) to represent the weirdly formatted versions of our types. Or I could keep the same data types but create a newtype wrapper around each of them with new FromJSON instances implementing the new format.

Instead I’ll add a “wrapper” type with which to distinguish a normal Robot from one which should be formatted for the upstream system.

data Upstream a = Upstream { unwrapUpstream :: a }

This new type doesn’t “do” anything, it just tags the value it wraps and lets me distinguish a Robot from an Upstream Robot which should be formatted for the API and the upstream system respectively. (This is not strictly true: it does take up memory and does cost an additional pointer dereference to traverse). With the new Upstream type I can write a second FromJSON instance each of my types.

If there is no special upstream format for a type the new instance can just call the existing instance and stuff the result in an Upstream wrapper:

instance FromJSON (Upstream [Colour]) where
    parseJSON j = Upstream <$> parseJSON j

When the upstream encoding and the API encoding do differ, I write a FromJSON instance in exactly the same way I normally would (making sure to use the Upstream version of any other FromJSON instances I use):

instance FromJSON (Upstream Robot) where
    parseJSON (Object v) = Upstream <$>
        (Robot
            <$>  v .: "name"
            <*> (v .: "colours" >>= activeColours >>= exactlyOne)
            <*> (unwrapUpstream <$> v .: "colours"))
      where
        -- Parse a JSON object of colours into a list of 'ColourName's which
        -- have @status == "active".
        activeColours :: Value -> Parser [ColourName]
        activeColours (Object o) = (fmap fst . filter snd) <$>
            mapM (uncurry colour) (HM.toList o)
        activeColours _ = fail "Colours must be a JSON object."

        -- Given a name and a JSON value, parse a pair containing the name and
        -- whether the colour has @status == "active"@.
        colour :: Text -> Value -> Parser (ColourName, Bool)
        colour name (Object o) = (,)
            <$> parseJSON (String name)
            <*> ((String "active" ==) <$> (o .: "status"))
        colour _ _ = fail "Colour must be a JSON object."

    parseJSON _ = fail "Robot must be a JSON object"

-- | Parser to check that a list contains exactly one value.
exactlyOne :: [a] -> Parser a
exactlyOne [] = fail "Missing value"
exactlyOne [a] = pure a
exactlyOne _ = fail "More than one value"

With all these instances written I can update shellOutJSON to use the Upstream instances when it interacts with the command-line program. Two small changes – adding Upstream to the FromJSON constraint and the “success” pattern match – are enough to ensure that all communication with the upstream system uses the Upstream JSON encoding:

shellOutJSON
    :: (MonadError String m, MonadIO m, FromJSON (Upstream a))
    => [String]
    -> m a
shellOutJSON cmd = do
    -- Execute the command.
    (exit_code, out, _err) <- liftIO $ readProcessWithExitCode cmd [] ""

    -- Check it succeeded.
    output <- case exit_code of
        ExitSuccess -> return $ BS.pack out
        ExitFailure err -> throwError $
            "Could not execute command: errno = " <> show err

    -- Decode the JSON.
    case eitherDecode output of
        Left e -> throwError $ 
            "Error decoding JSON: " <> e
        Right (Upstream v) -> return v

Now any call to shellOutJSON will automatically parse using the correct JSON encoding and any existing code using shellOutJSON doesn’t have to change. Even better, any call which needs a type without an Upstream instance of FromJSON will result in a type error at run time:

lib/Server.hs:115:5:
    Could not deduce (FromJSON (Upstream Colour))
      arising from a use of ‘shellOutJSON’
    from the context (MonadError String m, MonadIO m)
      bound by the type signature for
                 getColour :: (MonadError String m, MonadIO m) =>
                                      ColourName -> m Colour
      at lib/Server.hs:(109,8)-(112,18)
    In a stmt of a 'do' block: shellOutJSON cmd ["colour", "list", colour_name]
    In the expression:
      shellOutJSON cmd ["colour", "list", color_name]
    In an equation for ‘getColour’:
        getColour name
          = do { let colour_name = T.unpack $ unColourName name
                 shellOutJSON cmd ["colour", ....] }

The second line of the error tells you exactly what’s missing: the compiler can’t find a FromJSON instance for Upstream Colour.

Conclusion

By using a “wrapper” type like Upstream a I reduced the amount of code I need to write and maintain (in particular, there’s no converting back and forth between Colour and WeirdlyFormattedColour data types). The values of my various types are clearly still related and Upstream is completely agnostic to the type being wrapped – an Upstream Robot is just a Robot inside an Upstream and neither the Robot not the Upstream cares about the other part at all.

Making the wrapper parametric like this (as opposed to, for example, creating a different newtype wrapper around each of the particular types) makes i possible to write code which – like the modified shellOutJSON – doesn’t care about the what is being wrapped, just that it is wrapped.

Adding and removing the Upstream wrapper at the system boundary minimises the amount code which can incorrectly use the wrong representation and, in particular, makes it impossible for these bugs to happen in the many places I use shellOutJSON. This forces me to define wrapped FromJSON instances for all the types, even the ones that use the same JSON representation, but this is a price I’m willing to pay for an interface that makes a class of errors impossible.

Using this approach in my current project made the code shorter, simpler (in terms of number of data types and functions defined), fixed two “wrong format” bugs, and made it impossible to reintroduce them.

Haskell at Work

2015-01-28T00:00:00Z

I recently (nearly 7 months ago) started work at Anchor Hosting as a software developer in the engineering department. Anchor is a hosting company, and its engineering group works on a variety of different things like time-series data storage and analysis, APIs and automation, business applications, and development tools.

I’ll describe some of our development tooling in this post and leave discussion of the “real” systems we work on for a later post.

Building and deploying

Given we’re a Haskell shop, we’ve needed to figure out how to build, deploy, and manage systems written in Haskell. I’m still not sold on the whole containerisation mania which seems to be sweeping certain parts of the IT world (and certainly not for general purpose multi-tenancy). Nevertheless, we’re finding Docker quite useful.

We have a Docker image containing GHC and cabal configured to use the Stackage package set, with a bunch of frequently used Haskell packages already installed. This image is built automatically using a small set of scripts.

Most of the systems we build operate as services (HTTP servers, agents on message queue, etc.) and we “package” them as Docker images too. We have another set of scripts which use the Haskell image to build a cabal package, extract the artefacts, and stuff them into a new Docker image. This approach results in an image which is significantly smaller than it otherwise would be.

Both sets of scripts can be used manually but they are also used in Jenkins jobs. Like everything mentioned so far Jenkins and its builders all run in Docker too, so we also have some scripts to build Jenkins Docker images.

All of these Docker images are run on CoreOS servers hosted on Anchor OpenCloud, Anchor’s new OpenStack deployment. Generally, we run each service as a Docker container managed by a systemd unit with its configuration and data files (such as they are) mounted in from the host file system. Each systemd unit deletes any old container and pulls the latest image before starting the service, so upgrading an instance is easy: just restart the systemd unit.

Development tools

Various members of the team all use different operating systems (various flavours of BSD, Linux, and Mac OS X), editors (vim, emacs, Sublime Text) and have different opinions and habits about coding style, etc. To help manage improve the consistency and, hopefully, quality of our code, we developed git-vogue. git-vogue runs as a pre-commit hook (currently for git, but it can be extended) and runs a range of checks over the modified or, optionally, all files in the repository. This isn’t perfect but has helped improve our code quite considerably.

FP-Syd, October 2013

2013-10-16T00:00:00Z

Little mention of linux.conf.au 2014 and how we should all take a look at the available programme and see if we want to go.

Erik’s ICFP roundup

The International Conference on Functional Programming (ICFP for short) is a three-day core conference and is collocated with a number of related events. ICFP 2013 was in Boston and a number of FP-Syd regulars presented and/or attended.

Erik was a long time LCA attendee but ICFP has supplanted it as his “must go” conference. I hope to make the same switch in 2014!

The Haskell Implementers Workshop covers the internals of Haskell implementations which, these days, means GHC to a very large extent. Covers a lot of interesting techniques, with a particular focus on compilers. Erik mentioned work on the non-safety of generalised newtype deriving; using Hermit (a dynamic/guided optimisation framework) to optimise scrap-your-boilerplate code; and Habit (a strict Haskell dialect for OS programming).

The Commercial Users of Functional Programming was, reportedly, a bit boring, but I’ve liked the few CUFP 2013 YouTube videos I’ve watched so far. YMMV.

The Haskell Symposium was a main draw (for Erik). Highlights which Erik found worth mentioning and I found worth noting down include:

Oleg asked difficult questions of a lot of speakers. I wonder what would happen if he asked an easy one?
Effects seemed something of a hot topic.
Demonstrations of Liquid Haskell (which sounds pretty great), and a Javascript backend for GHC.

Intel are developing a research compiler which uses GHC’s front-end to compile to Core and then uses their own backend. Does loop vectorisation, currently only better performance on a few benchmarks.
The third iteration of the I/O manager for GHC. Multithreaded, influenced by Kazu Yamamoto’s work on Warp and mighttpd. Benchmarks against Nginx seem very good; Warp with multiple cores sees extremely good speedups (contra Nginx).

The main event – ICFP – is an academic conference and a lot of the content will fly straight over the head of many a “working programmer”. Some of the highlights included:

A few talks on vectorisation (w/ SIMD from Intel, stream fusion, etc.) and optimisation (for GPUs, etc.)
A few talks on dependent types.
Tactics in Coq are untyped; one talk discussed an approach to typed tactic programming in Coq. Sounds especially interesting now that there is a “Coq fight” in the FP-Syd calendar for next year!
People who didn’t attend are encourage to watch the video of the “fun with semi-rings” talk. I haven’t been able to find it, though.
One talk described a useful-sounding approach to parsing context free grammars with a divide-and-conquer approach, allowing partial and parallel parsing.
Simon Peyton-Jones discussed the new curriculum for secondary computer science education in the United Kingdom.
An extension or two to System F: System Fc (explicitly kind equality) and System Fi (type indices). Everyone who can understand System F shouldn’t have a problem reading the System Fi paper.)
Constrained monad problem (which, apparently, Oleg said was crap?). Paper on solving a problem which occurs when using Monad but they should have used Applicative. Seems as though they mostly wanted the do syntactic sugar; see also idiom brackets and the attempt to generalise the Monad sugar.
“Querying ordered graphs.” Three words which sound interesting, but I’ve no idea why I wrote them down.
Also: experience reports! Someone took a Scheme compiler from 4-5 to 25 passes (“nanoparsing”?) and, at the same time, also added a good colouring register allocator. Apparently one of these changes made it better.

Other events:

Talking about a benchmark/framework to compare approaches to generic programming at the Workshop on Generic Programming.
Brent Yorgey doing animations with diagrams at the Workshop on Functional Art, Music, Modeling and Design.
Chordify is a system (written in Haskell) to analyse recordings and generate chord transcripts. It’s not perfect but gives pretty good approximations.

Ben talking about Data Flow Fusion

Ben Lippmeier – an FP-Syd regular – presented a paper at ICFP and reprised that presentation back in Sydney for those of use who weren’t in Boston. He described an approach using data flow to guide the compilation of programs using stream fusion.

Wants to process a list of points, adding 1 to each, filtering those about 0 and also finding the maximum.

Doing stream fusion

map f = unstream . mapsS f . stream
filter f = unstream . filterS f . stream

-- RULE to remove (stream . unstream)

Example computes (vec3, n) can’t float vec3 because it’s being used in the result and in the computation of n. So we get two loops.

**1** -> 2 -> **3** -> 4
	              |      |
              (    ,    )

zipWithX tends to use X+1 loop counters for stream fusion. There’re only 8 registeres to use on some platforms.

Data Flow Fusion

Slight manual refactor

Split filter into two combinators flag – which contains True or False for each member – and pack – which does the filtering.

Extract the data flow graph

This code generates the data flow graph.

fun vec1 (\s1 -> 
  let s2    = map (+ 1) s1
      flags = map (> 0) s2
  in mkSel flags (\sel ->
  let s3   = pack sel s2
      vec3 = create s3
      n    = fold max 0 s3
  in (vec3, n)))

vec1 :: Vector Int
s1 :: Series k1 Int
s2 :: Series k1 Int
flags :: Sel k1 k2
s3 :: Series k2 Int

Series has a phantom type variable which helps keep track of the code which can be fused into a single loop.

We learn that k1 >= k2

With the flow graph (annotated with operations, etc.), throw away the source.

Schedule the grapch into an abstract loop nest

Abstract loop nest:

loop k1 {

  start: ....
  
  body: ....
  
  inner: ...
  
  end: ...

} yields ...

Start at the front of the data flow graph and add elements of the graph to the nested abstract loop.

Operations go into different places in the nested abstract loop. A fold, for example, allocates and accumulator in start, increments somewhere within body and reads it in end.

Extract implementation from abstract loop nest.

Translate the various bits and pieces of the abstract loop nest data structure into different Haskell combinators.

Implementation

GHC plugin which grabs Core, does data flow compilation and generates Core to give back to GHC.

Some issues in current implementation where LLVM doesn’t realise that writing to the output doesn’t need to reload the start and length numbers.

If your program is first order (argument functions take scalars, not series), non-recursive, synchronous, finite data flow program using out combinators.

Then by construction your program will be compiled correctly by this system.

Liam on CDSL

Liam O’Connor works for NICTA. Instead of talking about something he recently learned, he’s talking about work: CDSL - a restricted functional language for file system verification.

Trying to establish a formal proof of the correctness of a file system driver in an operating system.

Already have an architecture for this sort of problem (from seL4):

Abstract spec - high-level, nondeterministic (followed by an “interesting” proof of relation to ~ 15%)
Low level spec - purely functional (followed by a “largely boring” proof of relation to ~ 30%)
C implementation - efficient.

~ 55% is showing that the other proofs don’t do something stupid; proving invariants all hold.

Ignoring the kernel proper, architecture support, and drivers (another NICTA project), the largest part of the Linux kernel is the fs/ directory; 31 different file systems were supported by the kernel running on some random NICTA server.

There are lots of file systems with, one assumes, quite a lot of common functionality and infrastructure. The goal of the project is not to make a cathedral of a single verified file system, more a factory for churning out numerous file systems. The approach is to use a DSL to generate the low-level spec, proof and implementation. High-level spec and proof are done by hand, so generated outputs need to be readable.

Should

establish key verification properties
compete with efficient C code (imperative, destructive updates, etc.)
be expressive enough to write a file system

But:

doesn’t need to express everything in a file system. Hand-written components could be plugged in to the DSL (and, hopefully, re-used).

Simply-typed lambda calculus

Simple-typed lambda calculus is strongly normalising (you can’t write general recursion, e.g. the Y combinator).

First-order language: lambdas go away, use let binding and restrict to defining top-level functions. Added structural rules for mixing, weakening, ?

Need to do memory management which is safe, expressive (no pass by value, we need the heap), no GC (you’d have to verify it, introduce latency, etc.)

Automatic member management (GC) is too big a burden. Many static automatic memory management is inefficient or unsafe.

What about manual memory management?

let x = allocateData ()
    x' = updateData x
    _ = free x
in x'

But this is terrible! Unsafe, inefficient, etc.

So have a linear type system, throwing away weakening, etc. Forces use of things exactly matching (can’t alloc and not use, doesn’t discharge the new fact). The typing rules require that introduction and elemination be paired.

Linear types means that the elimination operations (e.g. updateDate) are the last to access terms, so they can do destructive updates.

Two interpresations of these semantics:

value semantics: pass by value, no heap, immutability, reasoning.
update semantics: heap, updates, deallocates, implementation.

Linear types allow for both.

But sometimes you want non-linear, pass-by-value (arithmetic operations, etc.):

Unboxed types, ints, small structs
Functions themselves

Allow structural rules (dereliction and contraction) for certain types only. So now we have T_{.} and T_{#} (unboxed and value types).

Buffer interface

make : () -> .Buf
free : .Buf -> ()
length : .Buf -> (#U32, .Buf)

serialise : (.Obj, .Buf) -> (.Obj, .Buf)
deserialise : .Buf -> (.Obj, .Buf)

Non-linear “look but don’t touch” references with *:

make : () -> .Buf
free : .Buf -> ()

length : *Buf -> #U32
serialise : (*Obj, .Buf) -> .Buf
deserialise : *Buf -> .Obj

Use let! construct which is like let but we mark specific variables as read-only within the let clauses and back to linear in the in.

But this is unsafe (read-only can escape the let). Could use regions, but choose not to unless it’s required.

Linear typing breaks some control flow:

let x = alloc ()
in if cond
   then update(x)
   else x

Loops

Hardest, most annoying part of the formalisation of the language.

Built-in loop combinators, map, fold, with, for.

let sum = for (x,y) in fold(arr) with 0
              do (x + y)

let arr', sum = for (x,y) in map(arr) with 0
                    do (x * 2, x + y)

Alas, this is unsafe. Double free, etc. But you can restrict linear types in the loop expression. Then have to make any required linear types into accumulator parms.

Error handling

The return-code convention using in languages like C is pretty bad. Instead, separate statements and expressions.

Statements have three types:

s : \({\bar T_{s}}\)
s : \(fails {\bar T_{f}}\)
s : \({\bar T_{?}} fails {\bar T_{?}}\)

Type of if then else is T_{t} \leastupperbound T_{e}. Lattice join, subtype, etc.

Make let and let! only handle success cases. Force sub-expressions to handle potential errors. Type system forces you to handle your errors and the linear type system forces you to free your resources.

Types

Product and sum types (implemented as structs and tagged unions).

Accessing members of linear records is problematic as you use the record multiple times:

let sum = operation(x.field1, x.field2)

Instead use an open/close structure.

FP-Syd, August 2013

2013-08-28T00:00:00Z

Here are some nodes from the August 2013 meeting of the FP-Syd functional programming group.

Julian Gamble on Simulation Testing in Datomic

Julian Gamble ([@juliansgamble][] on Twitter) gave his first FP-Syd talk with an introduction to simulation testing using [Datomic][].

[@juliansgamble]: http://twitter.com/juliansgamble [Datomic]: http://www.datomic.com

Plug: He’s writing a book called Clojure Recipes which is due out in January 2014.

Simulant – the subject of the talk – is a framework for Datomic database. It’s for simulation testing.

Many types of testing (in something resembling order of popularity):

Unit testing
User acceptance testing
Performance testing
Simulation testing

Simulation testing uses modeling and simulation to “test” systems which are too complex for linear models like unit testing. Generations of simulations:

High school solving maths problems
Stock analysts modelling and analysing companies
Analytics driven audits simulating systems for comparison.
Business scenarios predicting responses to, e.g., market crashes.

Most of these can be done on a piece of paper or on a single machine, but systems which aren’t amenable to such approaches are becoming more common.

Chris Okasaki’s book Purely Functional Data Structures popularised the use of purely functional approaches to data structures through sharing.

Datomic is “a database as a value”. Or, put another way, a database as a persistent data structure. This makes state management easier for, e.g., reproducing problems for bug fixing.

Built on pluggable storage system. Uses a Java-native store locally, can use Amazon Dynamo DB. Writing is done through a single transactor process with querying done directly from the data store.

Simulant is a framework which uses Datomic to help to distribute and scale simulation testing. Assumes that you’ll be modelling agents and actions – which are stored in the Simulant schema – and additonal model details stored in your own schema. Uses git too, to keep track of version of the simulation changing over time.

Process

Develop a Datomic schema for your model. This will be used to record the generic details of the simulation – the actions performed by the agents – and the domain specific details.
Set the model parameters (stocks/prices, etc. or ants/food/world size)
Make statistical assertions about the system. These will be verified against the data recorded during the simulation.

There are more details to this, but they flew past and I couldn’t get them down.

Why Datomic?

Being persistent (in the “persistent data structures” sense), Datomic makes it far easier to review old data from older simulations, add additional statistical assertions, etc. without having to jump through the many and varied hoops you’d need for, e.g., a relational database.

I’m not sure how true a comparison this is, given that Datomic forces all writes to the database through the single transactor. A similar architecture with a relational database could quite easily use a single transactor to enforce timestamp consistency on data being recorded. I must be missing something.

Applicability

Non-trivial system with multiple agents.
Datomic’s database as value, thing.
Where you have statistical assertions to be evaluated.

Shane Stephens on Web Animations

Works on the web animations specification for the W3C. Unifies SVG and CSS animations on the web.

The web animations specification defines a Javascript API which looks something like this:

new Animation(
	document.getElementById('hello'),
	[ {"left" : "200px"},
	  {"left" : "400px", "height" : "100px"}],
	1
);

This talk isn’t about “generating a functional API for web animations” but he thought it was two weeks ago. He tried to generate bindings, but failed. Instead, it’s a discussion about the attempt and the result.

I think there might be animations of yak shaving involved.

Haskell to JS compilers

There are quite a few functional languages which target Javascript and they all, in Shane’s opinion, hate the web.

Utrecht Haskell Compiler JavaScript backend

The UHC Javascript backend has little documentation, claims to “compile most of Hackage” and provides an FFI to interact with “native” Javascript code.

The barrier between Haskell and Javascript is the problem. Everything on the web “platform” is exposed with APIs in Javascript. Having a UHC-JS generate a blob of HTML and CSS and Javascript stuff is pretty hard to compose with other web-ish things.

There’s a big impedance mismatch between Haskell and Javascript.

Elm

Elm is a functional reactive programming language which compiles to Javascript. Lots of documentation, an online editor, and it already has animations.

But Elm is another “replace the world” abstraction.

Roy

Roy has a much saner approach, largely just syntactice sugar around Javascript:

Javascript functions are available
Roy types are almost Javascript “types”

But no ADTs, etc. Because JS is pretty shitty with no recursion, etc.

krazy

So with no “good” existing languages he started his own language called krazy.

The current implementation is a PEG parser and interpreter in Javascript.
Functional types are Javascript types (lists, for example, really are Javascript arrays).
Supports ADTs, HOFs, pattern matching, etc.
JS interop “constrained” by type assertions.
Will probably add record with optional, structural typing.

Animations

Back to the web animations API.

The web animations specification has side-effect free constructors for animations, effects, timing groups, etc.

This could be exposed to library authors and used as an interface or to generate an interface automatically? I’m not sure.

[Thomas Sewell][ts] on learnings about SAT

Thomas Sewell

Survey: who can name an NP-complete problem?

NP-complete problems can be solved by a non-deterministic machine but the solutions can be checked by a deterministic machine. In essence, they are very hard to solve but easy to check.

Circuit satisfiability can be encoded in SAT.

The SAT problem attempts to assign values to logical variables in a formula in conjunctive normal form and produces either a set of assignments (if the formula is satisfiable) or “no” (if there is no assignment).

The DPLL algorithm is pretty naive and does lots of backtracking.

The CDCL algorithm – discovered in the 90s – increased the size of viable problems to millions of variables. Instead of having to “re-learn” the same pieces of information repeatedly when backtracking, the Conflict Driven Clause Learning algorithm tracks the “cause” of a clause you learn and, when a contradiction is derived, it learns the inverse of it’s parent.

E.g.

If we reach contratiction, and the parents are \(x_{1}\), \(\neg x_{2}\), \(x_{12}\). Then we need to learn \(\neg x_{1} \vee x_{2} \vee \neg x_{12}\) as at least one of the assumptions are false, so the negation of their disjunction must hold.

Learnings

Competitions - progress
Fast propagation - a modern SAT solver needs a very efficient implementation of the propagation algorithm.
Locality - solvers make decisions “near” previous decisions. Need a heuristic to find “nearby” variables for choice.
Phases - alternate between phases focussed on SAT and un-SAT phases.
Pruning - prune the database of clauses periodically to speed propagation.
Glue - Not sure what this means?
Rewriting - preprocessing the problem into an equisatisfiable problem. Make the problem “better”, works well as a first step. Useful on problems like CPUs problems.

Lots of problems have nice and/or useful SAT encodings.

NP-complete problems were, in the not too distant past, primarily useful as a polite “no” for managers. (You can’t have your cake and eat it too.)

SAT with Proofs

Some solvers produce a resolution proof.

Reverse Unit Propagation of a proof is a services of clauses that can be learned by unit propagation only. The conflict clauses of a CDCL solver in the order they are learned form a RUP proof.

DRUP adds clause deletion, to speed up unit propagation.

Having useful proofs with rewriting is complex. Checking that a SAT proof for a rewritten problem is tricky; dealing with the rewriting (incorporating it into the proof and validating the rewriting is often as complex as the SAT problem itself, etc.)

Motivations

Have some SMT proofs and would love to check them in HOL4 or Isabelle/HOL. Satisfiability Modulo Theories (SMT) incorporates SAT as part of it. HOL4 and Isabelle/HOL are highly trusted but very slow. Using SMT/SAT to solve a problem quickly and Isabelle/HOL to replay and verify the result should result in a fast, trusted proof.

There are SAT replay tools that do this sort of thing, but they were all pretty or extremely slow. Turns out millions of variables are hard in more traditional tools.

FP-Syd, July 2013

2013-07-24T00:00:00Z

Accelerate with foreign functions

More on accelerate. Looking at using it with other GPGPU frameworks. Two distinct problems:

Calling CUDA C programs from Accelerate.
Calling Accelerate from CUDA C programs.

Calling CUDA C from Accelerate

Smooth Life is Conway’s Game of Life generalised to confinuous domains.

Lots of variants (magic numbers). This was smooth life “L”.

Gliders can move in any direction.

Relies on FFT as part of in implementation. Write own FFT in Accelerate or use cuFFT library.

The cuFFT library wants pointers to GPU memory as parameters to its function, but Accelerate is high-level (no pointers into GPU memory). So added a new operation:

foreignAcc :: (Arrays arr, Arrays res, Foreign ff)
           => ff arr res -- ^ The foriegn code
           -> (Acc arr -> Acc res) -- ^ The pure equivalent
           -> Acc res
           -> Acc res

Each backend needs to provide its own instance of Foreign; subclass of Typable2 to avoid the expression problem (so that backends can use their own types for the foreign stuff).

Use an “abstract” monad CIO, which is like IO but has a few new operations: allocateArray, devicePtrsOfArray, peekArray, pokeArray. Here “abstract” is just “private parts are private”.

doFFT :: Acc (Array DIM2 Complex)
      -> Acc (Array DIM2 Complex)
doFFT arr = foreignAcc (CuForeign foreignFFT)
                       pureFFT
                       arr
    where
      pureFFT arr = ... a slow, pure Accelerate FFT ...
      foriegnFFT arr = ...

You can nest calls to foreignAcc in the pure branch to offer implementations for multiple backends.

Calling Accelerate programs from C

foreignAccModule is a piece of Template Haskell magic which, when a module is compiled, generates a C header file for the module.

Additional Template Haskell functions exportAfun1, exportAfun2, etc. are used to export specific functions (the dotp function in this example).

Generates two C functions from dotp:

dotp_compile(...) compiles an Accelerate program (the dotp program).
dotp_run(...) executes a compiled Accelerate program (the dotp program).

How Mark writes Haskell

Demo of how “I” write Haskell.

If you use vim you are bad.

Programming has two cultures:

Tools-oriented cultures like Java, with lots of IDEs, etc.
Language-oriented cultures like Haskell.

ghcmod? is a tool for Haskell; supports emacs and vim. Does recompilation and test on save, integrates with Hoogle (insert module import statements).

Ruby tool called guard. DSL to watch file system changes and do various things pass code through compiler for fast feedback about compiler errors. Run it in a window beside emacs and see things happen as you save!

The doctest library allows you to write quickcheck properties in haddock comments.

Similar: hdevtools for vim. Has a persistent server.

Ben talking about DDC

Note to self: the way you’ve captured the typing rules in this section is truly horrible. Please remember LaTeX and figure out how you want to render it with Pandoc & Hakyll.

Pushing to make DDC be something like LLVM for functional programming languages; a generally applicable core language.

Typing application

\[\large\frac{ \Gamma \vdash M :: t_{1} \rightarrow t_{2} \qquad \Gamma \vdash N :: t_{1} }{ \Gamma \vdash M N :: t_{2} }\]

Evaluation

M reduces to a value (an abstraction).
N reduces to a value
Substitution N into M.
M[N/x] is new value to reduce.

Is there something in the typing rule which represents all four of these stages? Not really. (From me: should it be linear logic)

Perhaps adding effects to types.

\[\large\frac{ \Gamma \vdash M :: t_{1} \rightarrow t_{2} ; e1 \qquad \Gamma \vdash N :: t_{1} ; e2 }{ \Gamma \vdash M N :: t_{3} ; e_{1} \vee e_{2} \vee e_{3} }\]

Where \(\vee\) is a lattice join:

\(e_{1} \vee e_{2} = e_{2} \vee e_{1}\)

\(e_{1} \vee e_{1} = e_{1}\)

Now there’s something in the typing rule which represents each phase:

Is e1
Is e2
Is e3?
Is the join of them?

Monads

People who think Haskell is cool might have heard of this thing called a monad.

return :: a -> m a
bind :: m a -> (a -> m b) -> m b

There’s no good way to abstract over monads: each m must be a single monad. You can use monad transformers, but they suck.

An effect system allows you to write something like

\[\large foo :: Int \xrightarrow{State \vee IO} Int\]

instead of choosing between options like:

\[\large foo :: Int \rightarrow \text{State Int}\]

\[\large foo :: Int \rightarrow \text{IO Int}\]

\[\large foo :: Int \rightarrow \text{StateT s IO Int}\]

Impact on kinds

Adding effects forces us to change the kind of the ->. In Haskell:

(->) :: * -> * -> *

In an effectful language, every arrow has an effect component:

(->) :: * -> * -> Effect -> *

Could maybe (and Ben’s PhD thesis does) add a Pure effect to non-effectful arrows.

It also changes the nature of application (just substitution) and mixing the “other stuff” in.

Value-only languages

What if we remove substitution (can only apply values to values):

\[\large\frac{ \Gamma \vdash v_{1} :: t_{1} \xrightarrow{e_{1}} t_{2} ; \bot \qquad \Gamma \vdash v_{2} :: t_{1} ; \bot }{ \Gamma \vdash v_{1} v_{2} :: t_{2} ; e_{2} }\]

(Note, if you haven’t already, that an effect of \(\bot\) is pure.)

So:

\[\large foo :: Int \rightarrow \text{S (State}\vee\text{IO) Int}\]

Here S is a suspended computation which, when invoked, will perform some actions and return an integer.

Introducing a suspended computation:

\[\large\frac{ \Gamma \vdash M :: t_{1} ; e_{1} }{ \Gamma \vdash suspend M :: S e_{1} t_{1} ; \bot }\]

Running a suspended computation:

\[\large\frac{ \Gamma \vdash M :: S e_{1} t; e_{2} }{ \Gamma \vdash run M :: t ; e_{2} \vee e_{1} }\]

Also need:

\[\large\frac{ \Gamma, x : t_1 \vdash M :: t_2 ; \bot }{ \Gamma \vdash (\lambda (x:t_1) . M) :: t_1 \rightarrow t_2 ; \bot }\]

\[\large\frac{ \Gamma \in x:t_1 }{ \Gamma \vdash x :: t_1 ; \bot }\]

And application \[\large\frac{ \Gamma \vdash M :: t_1 \rightarrow t2; e_1 \qquad \Gamma \vdash N :: t_1 ; e_2 }{ \Gamma \vdash M N :: t_2 ; e_1 \vee e_2 }\]

Claim that this system is better than the original effect system which forces us to add an effect to the arrow kind (our arrow has the original pure type) and also the Haskell approach with monads.

Use suspend within a lambda abstraction.

\lambda(x:T1). suspend ....

It’s obvious where to insert suspend as abstration bodies must be pure.

suspend and run are syntactic; generated by compiler for source language.

Q&A

Constructive logics have judgements like this (M terminates and proves A is true):

\[\large\Gamma \vdash M :: \text{A true}\]

Also have a judgement like this (if M terminates, it proves A):

\[\large\Gamma \vdash M :: \text{A lax}\]

And:

\[\large\frac{ \Gamma \vdash M :: \text{A lax} }{ \Gamma \vdash box M :: \Box \text{A true} }\]

(See PL summer school video?)

See also the Haskell Symposium paper (papers!) doing extensible effects in Haskell.

Yow! Conference

There’s a potential for members to get a discounted ticket (group rate). Talk to Jed about it.