Let us assume for a second that the BirdWatch application needed to handle more load than a single server could handle. The current version could not just be run as multiple instances because then each one would establish a connection to Twitter, of which there is supposed to be only one per application. It would work to split the application into a TwitterClient part and a user-facing controller part, of which multiple instances could run as needed. How do we connect these separate parts of the application though?
One possible approach is using an HTTP stream between the parts as well, basically using the TwitterClient application as a hub for delivering the Twitter stream to multiple instances as needed. While this works, it is not an elegant solution: how do I handle reconnects? How do I even detect them? I’d rather not deal with this.
An Akka cluster should work. This approach seems more promising, as all the messaging elements are already there. But this does not offer the best possible flexibility, as it requires all participants to use Akka / the JVM / the same version of Scala. I’d rather not limit myself to one technology stack if not absolutely necessary.
I would prefer a solution that is completely agnostic of the technology each building block of the whole application uses. I might want to run statistics using numpy in the future or whatever. This should be possible without much glue code. HTTP is obviously technology independent but it falls short due to the reconnect issues.
Turns out there is a great solution for polyglot applications: ZeroMQ, a socket toolbox offering bindings for 30+ languages. Unlike broker-centered JMS or RabbitMQ, ZeroMQ is a messaging library, not a full messaging solution. There are no brokers; instead we get access to TCP sockets (fast) that we can use to build complex communication patterns. I cannot say it any better than this:
What ZeroMQ does is create an API that looks a lot like sockets, and feels the same, but gives you the messaging styles you actually want. By simply specifying the type of socket when you call zmq_socket you can have multicast, request/reply, and many other styles.
Please check out these articles for more in-depth information about ZeroMQ:
- ZeroMQ an introduction, by Nicholas Piël
- ZeroMQ: Modern & Fast Networking Stack, by Ilya Grigorik
- The Appeal and Controversy of ZeroMQ, by Pieter Hintjens
Let us put ZeroMQ to practical use. First thing to do is to install ZeroMQ. One thing to note is that the current Scala bindings require ZeroMQ version 2. On a Mac with homebrew installed you can do this (or refer to the ZeroMQ instructions):
For demonstration purposes I will publish and consume all messages from within the same application. I’m actually working on a more sophisticated version of the BirdWatch application that uses ZeroMQ between different applications running in separate JVMs, but more about that another time. For now I will split the TwitterClient class into separate TweetsPublisher and TweetsConsumer classes within the same application and let them communicate using ZeroMQ publish/subscribe topics. Check out this branch on GitHub.
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
This additional layer of indirection opens up a wide range of possibilities. Scaling becomes straightforward, we can attach pretty much as many of the client-facing controller applications (once split up) to the Tweet publishing application, without even having to individually configure them. Have them all point to the same publishing socket, spread the load using for example nginx and you’re done.
We can also swap individual parts of the application for better ones. I personally do not like the current approach to consuming the Twitter Streaming API as used in TweetsPublisher.scala and I would like to replace it with the Twitter Hosebird Client (hbc). If folks at Twitter have developed this for usage in their own projects, I have no doubt they can do this much better than my simple reconnect strategy possibly could. Last time I checked, hbc was not compatible with Scala version 2.10 used in Play 2.1 though, but thanks to ZeroMQ, the library can be run in its native habitat (Java application without having to worry about which version of Scala is used in some embedded library) and publish Tweets onto a ZeroMQ socket. The TweetsConsumer then would only have to point to another socket address. Anyone experienced with using ZeroMQ in a Java application interested in writing this client?