Matthias Nehlsen

Software, Data, and Stuff

Inspect - println no more

TL;DR: I realized how much I disliked the process of logging data structures to the console and then trying to find them and repeatedly commenting out and uncommenting println statements. So I decided to fix that.

Last Friday, I was thinking about how to approach the next task in the BirdWatch application. I came up with the idea to do aggregations on data from ElasticSearch on the server side and then send the aggregate on the wire over WebSockets to a ClojureScript client. In order to build up the aggregate data structures, I would need to inspect what I was building during development. But I didn't have a good way to do that.

Sure, println statements can be useful when reasoning about parts of an application. Just leaving them in there is not feasible though, particularly when running the application under any kind of load. Finding a logged item on the console can easily be like trying to find a needle in a haystack. For example in BirdWatch, my current configuration receives around 50 tweets per second. I am also using this configuration for the development process. Do I want to log this relatively large tweet data structure every time, 50 times a second? Certainly not, as that would make it difficult to find everything else. Occasionally, though, I want to see one of those or any other kind of data structure used in the system.

So what do we usually do? I constantly found myself commenting out and uncommenting println/log statements and then restarting the application. That's not fun at all and a huge waste of time.

Here's my idea of how to do it better: what if we can leave all those log statements in there and send them to a subsystem that can easily be integrated into any existing Clojure application and takes care of the presentation? This subsystem as a default doesn't do anything with the logged data, which makes it very cheap to process even higher loads1. The subsystem then exposes a web application that allows clients to see the next n of a certain event type (or combinations thereof). Here's how that looks like, I have already implemented it:

When you connect a new client, as a default you currently get the next 10 of every message type known to the system. On the left, there is a table that shows how many items per each type are left. On the right of the table, there are buttons that toggle if the respective data type is to be shown or not. The next button will increment the number for each selected type by whatever is in the field.

You can click on the image for a live version. In this simple example, there are a couple of message types:

  • :interval-put/every-millisecond - emitted every millisecond by a go-loop
  • :interval-put/every-second - emitted every second by a go-loop
  • :interval-put/every-five-seconds - emitted every five seconds by a go-loop
  • :interval-put/every-ten-seconds - emitted every ten seconds by a go-loop
  • :interval-put/every-minute - emitted every minute by a go-loop
  • :ws/event-in - emitted when there's an incoming message on a WebSocket connection (such as a new connection)

Using the inspect library is super simple. All you need to do is add inspect to the dependencies in your project.clj:

[com.matthiasnehlsen/inspect "0.1.1"]

Then, you probably want to import it where you need it:

[com.matthiasnehlsen.inspect :as inspect :refer [inspect]]

Next, you want to start it once, from anywhere:

(inspect/start)

That's all; you can now use it, for example:

(inspect :interval-put/every-second {:msg "every second"})

There is no need to make the event types known to inspect, other than just passing namespaced keywords. Inspect will then learn the event types itself.

Here is the full example application (minus the namespace definition, optional port configuration and REPL reload; click on the link on the upper right of the code block for the entire file):

(defn interval-put-loop
  "put msg on chan every interval milliseonds"
  [interval msg-type msg]
  (go-loop [] (<! (timeout interval)) (inspect msg-type msg) (recur)))

(interval-put-loop     1 :interval-put/every-millisecond {:msg "every millisecond"})
(interval-put-loop  1000 :interval-put/every-second {:msg "every second"})
(interval-put-loop  5000 :interval-put/every-five-seconds {:msg "every five seconds"})
(interval-put-loop 10000 :interval-put/every-ten-seconds {:msg "every ten seconds"})
(interval-put-loop 60000 :interval-put/every-minute {:msg "every minute"})

(defn -main [& args]
  (pid/save "example.pid")
  (pid/delete-on-shutdown! "example.pid")
  (log/info "Application started, PID" (pid/current))
  (inspect/start))

inspect sample application

Inspect internals

The inspect library internally makes use of the component library, Sente for the WebSockets connection and Reagent for building a reactive UI on top of ReactJS. I won't go into details today, but I probably will soon.

Conclusion

Okay, this is it for today. This project is less than a week old and I am still in the middle of exploring the problem space. In particular, I want to start using it everywhere in my BirdWatch project so that I can get a better idea of how I want to interact with the UI. I think inspect, among other things, makes for a useful executable documentation of a system, where you can easily find out how the data in your system looks like. I would have loved to have that in the past when I was new to a project.

Despite this project being so young, there is no reason why you couldn't try it already. Maybe you have ideas on what you'd like to see in there. There's a good chance I can implement other good ideas over the upcoming weekend. Either discuss here in the comments or open an issue on GitHub.

Over the next weeks, I will show how to use the above inside a real application. Also, I might write an article about how inspect works internally. Of course the code is on GitHub. It's pretty short; the entire thing including the UI is only around 300 lines of Clojure and ClojureScript code.

Curious what's next? You can subscribe to the newsletter or follow me on Twitter and I will let you know when the next article is out.

Cheers and have a great weekend,
Matthias


  1. In the default case when no client wants messages, the extra burden should be minimal. There shouldn’t even be significant GC pressure as the evaluated immutable data structures exist already and are structurally shared. In the example application, I am hitting the system with over 1,000 messages per second and the extra load to match the messages with connected clients is about 3-5% (out of a total of 800% on my Retina MacBook). But this would have to be verified before you use it in production, obviously.

© 2022 Matthias Nehlsen