<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Passing Curiosity: Posts tagged distributed systems</title>
    <link href="https://passingcuriosity.com/tags/distributed-systems/distributed-systems.xml" rel="self" />
    <link href="https://passingcuriosity.com" />
    <id>https://passingcuriosity.com/tags/distributed-systems/distributed-systems.xml</id>
    <author>
        <name>Thomas Sutton</name>
        
        <email>me@thomas-sutton.id.au</email>
        
    </author>
    <updated>2018-11-15T00:00:00Z</updated>
    <entry>
    <title>Work allocation in Kafka Streams</title>
    <link href="https://passingcuriosity.com/2018/work-allocation-kafka-streams/" />
    <id>https://passingcuriosity.com/2018/work-allocation-kafka-streams/</id>
    <published>2018-11-15T00:00:00Z</published>
    <updated>2018-11-15T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>This is mostly an exercise in writing things to help remembering them myself.
You’re better off referring to <a href="https://docs.confluent.io/current/streams/introduction.html">Confluent’s Kafka Streams documentation</a>
or <a href="https://medium.com/@andy.bryant/kafka-streams-work-allocation-4f31c24753cc">this blog post by Andy Bryant</a>.</p>
<h2 id="kafka">Kafka</h2>
<p><a href="http://kafka.apache.org/">Apache Kafka</a> is a “distributed streaming platform”. <em>Messages</em> with keys
and values are written to <em>topics</em> (“queues”, if that helps to think about
them). Each topic is divided (when it’s created) into a number of <em>partitions</em>.
It’s topic partitions are the unit of work in a Kafka cluster: at any given
time a single cluster node is responsible for processing the messages for a
given topic.</p>
<p>Applications which read from a Kafka topic can also be distributed - each
partition can be consumed (“read”) by a different worker. The collection of
workers cooperating to process a topic form a <em>consumer group</em>. Kafka’s
consumer group API helps to assign the work (i.e. the partitions) to the
available workers.</p>
<h2 id="kafka-streams">Kafka Streams</h2>
<p><a href="http://kafka.apache.org/documentation/streams/">Kafka Streams</a> is a library for building streaming data processing
applications on top of Kafka. Streams applications are just normal Java
programs which can be deployed, monitored, and managed just like any other
Java program: as many instances as you start will self-organise and cooperate
to share the available work between them. This makes scaling Streams
applications very straightforward: just start or kill some instances (assuming
there are work units that can be re-/allocated).</p>
<p>A Kafka Streams application is described by a <em>topology</em> – essentially a
directed acyclic graph with nodes representing each source, sink, and
processing step. Each topology can be split into <em>subtopologies</em> with nodes
which interact only with other nodes in the same subtopology. Because the
nodes in a subtopology only interact with each other, the subtopologies can
be executed in parallel without any coordination required. The collection
of subtopologies together with the collection of partitions in the input
topics for each subtopology will define the collection of <em>stream tasks</em>
that can be distributed across the workers in the Streams applications.</p>
<p>The first phase in executing a topology analyses it and the Kafka cluster
and determines the units of work that must be scheduled:</p>
<ol type="1">
<li><p>Partition the topology into subtopologies.</p></li>
<li><p>For each subtopology, check that the input topics have the same key
configuration and the same number of partitions. This ensures that
corresponding records from each of the input topics will be processed
by the same stream task, allowing them to be joined, etc.</p></li>
<li><p>For each subtopology, generate one stream task to read from each set of
corresponding partitions in the input topics. If subtopology 1 reads
from topics A, B, and C and they are configured with 3 partitions then
this will result in three stream tasks “1_0”, “1_1”, and “1_2”.</p></li>
</ol>
<p>Note that the collection of stream tasks generated from a topology is static:
both the graph in a topology and the number of partitions in a Kafka topic
are fixed at creation. The next phase allocates the stream tasks to be executed
by application instances.</p>
<ol start="4" type="1">
<li><p>Each instance executes a number of stream threads determined by its
configuration. Each stream thread is a more or less independent worker
able to process one or more stream tasks.</p></li>
<li><p>Each stream thread will connect to the Kafka cluster using the consumer
group API. The Kafka cluster and the Streams application instances will
cooperate to allocate the available work to the available workers. From
the application’s perspective this means allocating stream tasks to stream
threads and from the Kafka cluster’s perspective this is topic partitions
to consumers (and it just happens to be the case that we’ll co-allocate
partitions of certain topics to the same workers).</p></li>
</ol>
<p>With all this done, the Kafka Streams application is able to start processing
messages.</p>]]></summary>
</entry>
<entry>
    <title>FP-Syd, June 2013</title>
    <link href="https://passingcuriosity.com/2013/fpsyd-june-2013/" />
    <id>https://passingcuriosity.com/2013/fpsyd-june-2013/</id>
    <published>2013-06-26T00:00:00Z</published>
    <updated>2013-06-26T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>FP-Syd in June 2013 had talks about implementing cellular automata in Haskll
(comparing the Repa and Accelerate array processing libraries) and about
distributed data structures and systems. I get the feeling there was something
else, but I didn’t write notes on it.</p>
<h3 id="cellular-automata">Cellular Automata</h3>
<p>Tran gave an experience report using array computation libraries in Haskell
(Repa and Accelerate) to implement cellular automata. “Falling sand” game:
simulate gravity and “alchemical” interactions between elements.</p>
<p>First step is in simulating gravity. Dealing with falling blocks, randomising
block. Using “block CA”, with blocks defined by grids (2x2 cells) which
alternate between time steps (red grid, then blue grid); this allows you to
implement gravity using a single rule <code>[: ] -&gt; [..]</code>.</p>
<p>Repa and Accelerate both have concept of stencil convolutions. Use them to
implement CA rules. Stencil has a shape (the neighbourhood) and a fold-ish
function to process each cell.</p>
<ul>
<li><p>Phrase the problem in terms of array computation.</p></li>
<li><p>Repa: slap Repa functions onto standard Haskell code.</p></li>
<li><p>Accelerate: EDSL means you can’t do lots of Haskell stuff (little things like
pattern matching).</p></li>
</ul>
<p>Repa has Gloss integration. Hmm.</p>
<p>Code is on GitHub. Called falling-turnip?</p>
<h3 id="conflict-free-replicated-data-types-consensus-protocols-and-the-cloud">Conflict-free replicated data types, consensus protocols and the cloud</h3>
<p>Andrew Frederick Cowie</p>
<p>twitter.com/afcowie</p>
<p>AfC on #haskell</p>
<p>Cloudy stuff means there’s never only one of things these days; everything is
distributed (or will, hopefully, need to be distributed).</p>
<blockquote>
<p>Streaming I/O: iteratee, conduits, io-streams, pipes all provide <em>abstractions</em>
for processing data in a <strong>single</strong> thread in a <strong>single</strong> process.</p>
</blockquote>
<p>Pipes explicitly talks about clients and servers (ends of the pipelines) but
this is all just structuring computations within a single thread. This doesn’t
really help us do anything interesting to build distributed systems.</p>
<p>AWS regions and availability zones; no SLA unless your app spans availability
zones.</p>
<p>CAP theorem: consistency, availability and resilience to network partition.</p>
<p>Two generals: attack succeeds iff both attack at the same time. No solution can
guarantee coordination in the face of unreliable messaging.</p>
<p>Jepsen blog posts about testing distributed databases.</p>
<blockquote>
<p>Computers are slow.</p>
</blockquote>
<p>Author mention “CRDTs” a few times (difficult to Google; did you mean
“credit”?); original meant something like Convergent and Commutative Replicated
Data Types, now Conflict-free Replicated Data Types. Paper in 2007 from INRIA.
Is it possible to arrange things so that distributed states will <em>always</em>
converge? Yes, joint semi-lattices.</p>
<p>Paper proposes a protocol and implementations of various abstractions on top of
it: counters, registers, sets (grow only - G-Set, add &amp; remove - 2P-Set,
observer-remove - OR-Set), graphs.</p>
<p>Global invariants can’t be enforced over the whole system without
synchronisation; eventual consistency allows you to accept changes which, after
merge, break the invariant.</p>
<h4 id="consensus-algorithms">Consensus algorithms</h4>
<p>Paxos algorithm (Lamport. 1998. The Part-Time Parliament), peer-to-peer
consensus algorithm; most people don’t bother trying to implement a full
peer-to-peer consensus system.</p>
<p>ZooKeeper elects a leader.</p>
<p>Raft. There is just one leader and the point of the algorithm is to ensure that
the leader’s log is the most correct. Leaders have terms? Consensus is
something like “3 nodes agree”.</p>
<p>Ceph - large distributed file system. Lots of nodes, three of them are special
“monitors” which maintain the cluster map. It’s pretty hard to build a file
system if your objects (disk blocks) change underneath you; Ceph have focussed
on the consistency needed to build a file system.</p>
<p>Rather than maintain an index (which doesn’t scale), Ceph places blocks
according to a layout algorithm. <code>CRUSH()</code>. Paxos to elect monitors.</p>
<p>All this stuff needs very fast interconnect, i.e. a single data centre.</p>
<p>Amazon’s SQS:</p>
<ul>
<li><p>Reliable delivery of all messages.</p></li>
<li><p>Delivery is not ordered. (Not much of a “queue” is it?)</p></li>
<li><p>Delivery is guaranteed <em>at least</em> once.</p></li>
</ul>
<p>We’re now back where we started: workers that recieve messages need to be able
to figure out whether the message needs processing (at least once), etc.</p>
<p>Queues like this are good for signalling, but solutions like CRDTs will help
manage the state we still need to track.</p>
<h4 id="conclusion">Conclusion</h4>
<p>Idempotence is everything, see FP for salvation.</p>
<h4 id="qa">Q&amp;A</h4>
<p>Make state an Albion group, commutative monoid and this stuff comes largely for
free.</p>
<p>Cloud Haskell duplicating Erlang model, but has tight coupling in the wrong
places, needs same ABI (version) on all nodes; are we on our way back to a
single data centre?</p>
<p>NoSQL vs RDBMS. Now settling down to small transactional world and larger
non-transactional world.</p>
<p>Taking functional approaches (git’s similarity persistent data structures, log
journaled databases, etc.)</p>]]></summary>
</entry>

</feed>
