Last week, I drew a picture of how I wanted to break apart a monolithic application and instead run different parts of the application in separate processes / separate JVMs. The idea was to have a single client for the connection to the Twitter Streaming API and the persistence of the received Tweets in ElasticSearch, plus multiple machines to serve WebSocket connections to the client. For the communication between the processes, I picked Redis Pub/Sub because its model of communication appears to suit the requirements really well. As cute as the drawing may be, I prefer code (plus drawing), so I took the previous monolith apart over the weekend and put Redis in between for communication. It worked really well and here’s how.
Okay, it wasn’t a total surprise to see how well it worked. After all, I started using the Component library together with core.async for exactly this reason a few weeks ago. I wanted the freedom to only ever having to put stuff on conveyor belts and not having to think about how a thing got where it needs to go, or even where it needs to go at all.
Redis Pub/Sub with Carmine
I chose Pub/Sub over a queue because I wanted to fan-out messages to multiple clients. Any connected processes are only supposed to be fed with data during their uptime, with no need to store anything for when they aren’t connected. For interfacing with Redis from Clojure, I then chose Peter Taoussanis’s carmine client and it turned out to be a great choice.
Let’s look at some code. First of all, I am using a component that provides a send channel and a receive channel. It can be reused on either side of the Pub/Sub connection (or for bidirectional communication, of course):
1 2 3 4 5 6
This channels component can now be wired into other components. Here’s the component on the publisher side:
1 2 3 4 5 6 7 8
Here, we are creating a configuration map and start a send loop with this configuration for the “matches” topic. Here’s that loop:
1 2 3 4 5 6
This go-loop consumes all messages that come in on send-chan channel and publishes them on topic for the specified configuration conn.
Here’s the other side of the communication with the component subscribing to the same topic. The channels component stays the same. The component itself looks a little different:
1 2 3 4 5 6 7 8 9 10
Just like for the publisher side, there’s the configuration map. Next, we subscribe to a topic and hold on to the returned listener so that we can unsubscribe from the topic and shut it down later when the component is shut down1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Performance of Redis
Redis does a lot with very little CPU utilization. In a non-scientific test, I fired up 50 JVMs (on four machines) subscribing to the topic on which the TwitterClient publishes tweets with matched percolation queries. Then I changed the tracked term from the Twitter Streaming API to “love”, which reliably maxes out the rate of tweets permitted. Typically, with this term I see around 60 to 70 tweets per second. With 50 connected processes, 3000 to 3500 tweets were delivered per second overall, yet the CPU utilization of Redis idled somewhere between 1.7% and 2.3%.
I’m glad I got around to the process separation last weekend. It was fun to do and gives me confidence to proceed with the design I have in mind. Very little had to change in order to break the application apart, thanks to Component and core.async. In one of my next articles, I will describe the Docker configuration for running a TwitterClient container, a couple of containers with the client-serving JVMs connecting over Redis, a container with Redis itself and another container with nginx for load-balancing, plus a few containers for running an ElasticSearch cluster. Subscribe to the newsletter or follow me on Twitter if you want to be informed once the next article is out. The code of the fully functioning application is on GitHub. Let me know if you run into any issues when trying it out.
Cheers and have a great weekend, Matthias
P.S. This series of articles is now continued in this book:
The beauty of the component library is that during development, we can stop a component and restart it after reloading the code. This takes much less time than completely reloading the application. Watch Stuart Sierra’s talk for more information on the component library. I also created a transcript of this talk if you prefer reading.↩