FP-Syd, June 2013 - Passing Curiosity

FP-Syd in June 2013 had talks about implementing cellular automata in Haskll (comparing the Repa and Accelerate array processing libraries) and about distributed data structures and systems. I get the feeling there was something else, but I didn’t write notes on it.

Cellular Automata

Tran gave an experience report using array computation libraries in Haskell (Repa and Accelerate) to implement cellular automata. “Falling sand” game: simulate gravity and “alchemical” interactions between elements.

First step is in simulating gravity. Dealing with falling blocks, randomising block. Using “block CA”, with blocks defined by grids (2x2 cells) which alternate between time steps (red grid, then blue grid); this allows you to implement gravity using a single rule [: ] -> [..].

Repa and Accelerate both have concept of stencil convolutions. Use them to implement CA rules. Stencil has a shape (the neighbourhood) and a fold-ish function to process each cell.

Phrase the problem in terms of array computation.
Repa: slap Repa functions onto standard Haskell code.
Accelerate: EDSL means you can’t do lots of Haskell stuff (little things like pattern matching).

Repa has Gloss integration. Hmm.

Code is on GitHub. Called falling-turnip?

Conflict-free replicated data types, consensus protocols and the cloud

Andrew Frederick Cowie

twitter.com/afcowie

AfC on #haskell

Cloudy stuff means there’s never only one of things these days; everything is distributed (or will, hopefully, need to be distributed).

Streaming I/O: iteratee, conduits, io-streams, pipes all provide abstractions for processing data in a single thread in a single process.

Pipes explicitly talks about clients and servers (ends of the pipelines) but this is all just structuring computations within a single thread. This doesn’t really help us do anything interesting to build distributed systems.

AWS regions and availability zones; no SLA unless your app spans availability zones.

CAP theorem: consistency, availability and resilience to network partition.

Two generals: attack succeeds iff both attack at the same time. No solution can guarantee coordination in the face of unreliable messaging.

Jepsen blog posts about testing distributed databases.

Computers are slow.

Author mention “CRDTs” a few times (difficult to Google; did you mean “credit”?); original meant something like Convergent and Commutative Replicated Data Types, now Conflict-free Replicated Data Types. Paper in 2007 from INRIA. Is it possible to arrange things so that distributed states will always converge? Yes, joint semi-lattices.

Paper proposes a protocol and implementations of various abstractions on top of it: counters, registers, sets (grow only - G-Set, add & remove - 2P-Set, observer-remove - OR-Set), graphs.

Global invariants can’t be enforced over the whole system without synchronisation; eventual consistency allows you to accept changes which, after merge, break the invariant.

Consensus algorithms

Paxos algorithm (Lamport. 1998. The Part-Time Parliament), peer-to-peer consensus algorithm; most people don’t bother trying to implement a full peer-to-peer consensus system.

ZooKeeper elects a leader.

Raft. There is just one leader and the point of the algorithm is to ensure that the leader’s log is the most correct. Leaders have terms? Consensus is something like “3 nodes agree”.

Ceph - large distributed file system. Lots of nodes, three of them are special “monitors” which maintain the cluster map. It’s pretty hard to build a file system if your objects (disk blocks) change underneath you; Ceph have focussed on the consistency needed to build a file system.

Rather than maintain an index (which doesn’t scale), Ceph places blocks according to a layout algorithm. CRUSH(). Paxos to elect monitors.

All this stuff needs very fast interconnect, i.e. a single data centre.

Amazon’s SQS:

Reliable delivery of all messages.
Delivery is not ordered. (Not much of a “queue” is it?)
Delivery is guaranteed at least once.

We’re now back where we started: workers that recieve messages need to be able to figure out whether the message needs processing (at least once), etc.

Queues like this are good for signalling, but solutions like CRDTs will help manage the state we still need to track.

Conclusion

Idempotence is everything, see FP for salvation.

Q&A

Make state an Albion group, commutative monoid and this stuff comes largely for free.

Cloud Haskell duplicating Erlang model, but has tight coupling in the wrong places, needs same ABI (version) on all nodes; are we on our way back to a single data centre?

NoSQL vs RDBMS. Now settling down to small transactional world and larger non-transactional world.

Taking functional approaches (git’s similarity persistent data structures, log journaled databases, etc.)

This post was published on June 26, 2013 and last modified on January 26, 2024. It is tagged with: event, haskell, fp-syd, meetup, functional programming, cellular automata, array programming, distributed systems.