Matthias Nehlsen

Software, Data, and Stuff

RIP grandma; meo progress & beta version

A couple of weeks ago I introduced meo, the intelligent journal that my beloved grandma inspired around two years ago. Since that blog post, she passed away, following a stroke and subsequent coma. Those weeks have been tough, and I miss my grandma a lot. It helped me quite a bit though to be working on meo, something that she inspired and that will be part of her legacy. There were multiple occasions recently when I might have given up working on meo otherwise, throw in the towel, and look for another hobby. Yes, I still like Clojure, but this code base that I created has made me feel like an idiot way too often recently.

But instead of complaining, let me just give you an update on where I made progress, and what I was struggling with. While grandma was still in the hospital, I played around with the Mapbox API and a map view that allows zooming into areas with recorded photos, and then see which photos were taken there. This is how that looks like, for the photos I took at EuroClojure 2016, and their respective whereabouts:

Heatmap

That's working pretty well, but I can't include that in a free version of meo as then I would be paying for your usage of the mapbox API, and I will not do that. Not sure yet what to do with this feature. Maybe something similar can be created with OpenStreetMap or Google maps? Or you can create your own Mapbox token. I'm open for ideas, and pull requests are welcome.

Then I took issue with me calling meo an intelligent journal and realizing that it's not particularly intelligent thus far, so it clearly needed the integration of a neural network, right? So I learned some Tensorflow and created a simple feedforward network for predicting which story an entry might belong to. I wanted to power the story select field by a top ten of the results from asking the network for a prediction. That works well enough, with an accuracy of over 90% for the matching story being in the top ten predicted stories. That task was fun, but at the same time, it was bullshit at this point. Rather, it was that kind of a shiny object that an intelligent journal should really protect me from pursuing by making me stick to my plans - and hold myself responsible for the things I have actually committed to. I'll get back to neural networks within meo at some point, but I'm also hoping to find collaborators who are interested in some machine learning inside a data-driven journal and want to help make this useful.

Then I had a really annoying issue with the ClojureScript client inside then Electron renderer process somehow disconnecting from the Clojure backend, or rather all processing getting stuck, and it would only work again after completely closing the electron application, and reopening it, as a simple refresh in the developer tools in Electron would not help. I think I eventually found the problem by using the YourKit profiler, which showed me a deadlock related to logging.

Yourkit

I am not entirely sure what happens, but I know that when I use timbre with the default configuration merged with mine, and then when multiple threads tried to log at the same time, they seem to compete over stdout, blocking the entire application. Not sure if that is to blame on timbre, or because of log4j also being in there from other libs, but for me, it was definitely a big WTF moment, and now everything is working again, but without logging to the terminal, which is weird. Please let me know if this sounds at all familiar to you, and what you did about it. I created an issue for it.

Then I was trying some simple refactoring and noticed that it had become very unwieldy to work with the project, as persistence & retrieval related code was sprinkled all over the codebase - and totally ad-hoc. I had wanted something like GraphQL in mind for a long time, but when I looked at Clojure implementations early last year, I found Lacinia and did not understand how to use it. Now I looked at it again earlier this month and decided to finally make the switch. Overall it's really nice to have a language for describing how returned data looks like, and then fetch exactly what you need - no more and no less. Before it was really tripping me up what to fetch when, and then mostly underfetching a little, and sometimes overfetching so much that it would slow down the entire application.

So I refactored the code base to use Lacinia for all data retrieval in meo. Mutations may come later at some point, or maybe not. Using Lacinia started very smoothly as long as I was interacting with my development instance. But then with my actual dataset of 91K entries and 820K words, it started getting pretty slow, and I initially found it difficult to figure out why.

One of the unexpected things was that keys have to be in snake case. So I did a transform-keys from the camel-snake-kebab library, and that turned out to be a pretty dumb idea, as for some cases it took over 600ms to transform the initial data structure to give it to Lacinia as required. But there was no other way, except for migrating my entire append log (roughly 170K lines, where each line is a new version of an entry). That's what I ended up doing since it's way better to do this once as opposed to on every request. The migration worked fine, but it's also weird to not have the same case everywhere. So if you wonder why data is using snake case, it's because the GraphQL spec requires it, and by proxy Lacinia as well.

Then, I wanted GraphQL queries to execute in parallel and did not understand how to do that for a surprisingly long time until I finally figured out a way that works. I initially thought I could call execute with a few queries in parallel, say inside say a few futures, and then have those run independently. Unexpectedly, though, they did not run independently, but rather sequentially - which I found odd, because what does that thing actually synchronize on at all? I learned that I need to implement async resolvers, plus assign a thread pool. But now that is figured out, it's running smoothly. I just feel like the code around data retrieval is still way too complicated, and I'm looking for collaborators who want to help me clean it up.

Oh and then I tried reviving the packaging for Windows 10. If you suspect a rant now, you're wrong, to my own surprise. After installing cygwin, it has been running very smoothly, and I have published tens of versions of meo into an S3 bucket without a glitch. On Linux, however, setting up a virtual machine for publishing AppImage files was way more of a nightmare, with Electron relying on global libraries, and new versions of it that weren't available in Ubuntu, and so on. Eventually, it all worked out though, and here are the installers:

All of these provide auto-update functionality, which can be accessed through "Check for Updates" in the application menu. In addition, checks for a newer version run once every 24 hours.

About using meo to document the process: it's quite charming to have contemporary witness reports for all those things that were bugging me. That's because I document the entire process of whatever I am working on, including screenshots. That makes it super nice to look up all this stuff, rather than having to rely on memory. And then, yeah, keep me accountable. The stuff that I am working on is my life, for the number of hours that I spend, plus polluting other areas of my life when I cannot leave grief where it belongs. Doing a brain dump into a journal entry isn't so bad for that, and then telling yourself that you can stop worrying stuff, as all can be picked from writing next time.

But even better have a process in place to look at the amount of frustration in your life, try everything you can for actively changing the situation from the inside. And if that does not work, know when to quit. I have had that way too many times in my life that a fucked up situation had just normalized, and become my reality, instead of the necessary change. An intelligent journal should really help and support you in a situation like that. Meo isn't doing that yet, at least not to the extend that it could, but that is where I want to go with it.

Please try out the appropriate links if you like, and let me know what you think. The entirety of the functionality will certainly not be obvious, and I haven't gotten around to writing a manual yet. But let me know where questions arise, and ideally create issues on GitHub for those. I will try to answer everything that comes up. Thanks & until next time.

© 2022 Matthias Nehlsen