20180505

Notifications from Spark on an Apple Watch (via IFTTT)

This week I have been working a lot with a relatively large dataset on a Spark shell. It was a graph with 1 billion nodes and 2 billion edges that I wanted to analyse with GraphFrames (the successor of GraphX on Spark). This is quite large: before running the graph algorithms I did some exploratory analysis, and each step took at least 10 minutes. Checking stage/task progress bars or generated analysis plans is only interesting for the first fee... I wanted a way to get a subtle notification when the process finished. This way I could work on something else while the process is doing its thing, and I could come back for the next step as soon as the data is ready.

I have most of my notifications deactivated, though. No emails, Twitter, WhatsApp. Nothing shows on my Mac screen, very few show on my phone or watch, and only a handful are allowed to vibrate (none to make a sound). What could I do to get a notification? Even if I overrode some of my settings, I needed something that either could work cross-device or could make my watch vibrate, since it’s the only device I have always with me.

Well, IFTTT can actually do that. IFTTT is a service to plumb external services with iOS/Android devices, to build workflows. It also has a very handy webhook you can use as a trigger for workflows. And the IFTTT app can send notifications, to the phone or watch. Ticks all the boxes.

To use the webhook (a POST endpoint) from Scala I used a library I had never used before: scalaj-http It seems very convenient for these quick-and-dirty “make a request” in a program that includes no other http library.

When I have some action I expect to run for a while, I’ll finish the command with ; notif(“Process X finished”) This way, when the command finishes my wrist will gently buzz and I’ll know I can go back to work more on it.

It is worth noting that this would also work for a long running bash or sbt command (I’m looking at you, Spark test suite), or compiling boost or anything else that can, basically, run curl against an endpoint at the end of the process.

By the way, to run a spark shell with this library, use
spark-shell --packages org.scalaj:scalaj-http_2.11:2.3.0
Remember that multiple packages are separated by commas in case you also happen to, you know, use GraphFrames.

Oh, and if you happen to want notifications after an sbt task, you can use the sbt-ifttt plugin.
Written by Ruben Berenguel

20180429

Modifying the agnoster theme for zsh

Even though I have been a long time user of oh-my-zsh on zsh (moved from plain bash to zsh like 10 years ago), I have been very minimal on my use of its theme capabilities. I have used the default theme forever: robbyrussell. But recently I was showing my friend @craftycoder the tweaks I have on my system (fzf, autojump, etc) and he showed me this theme, agnoster. It had several pieces I liked:
  • Powerline-style prompt
  • Git status
  • Virtualenv detection
But, I wasn’t sold on some of the default decisions, so I decided to completely tweak and remove stuff I didn’t need. You can have a look here.

What did I want to modify?

  • Too long branch names. Looks very nice with master but is a bit more troublesome with feature/SAS-4028/kubernetes/poc
  • I don’t care that much about the path. Current directory is enough
  • Usually I like knowing in which git project I am in better

Path

I played around with several options to make the path look as I would like. I started with the shrink-path Zsh plugin, but I didn’t totally like how it looked. I cobbled together a bit of awk to get 2 or 3 characters out of each piece of the path instead, it didn’t look much better but was taking much more space. Ended up with just current dir, this is excellent actually.

Branch name

Paths at work are of the form {kind}/{ticket number}/{description}. I don’t want to know all the pieces. Kind is any of feature, hotfix (very rarely) or occasionally might be something else. It could be shortened to just a few letters. In general, I like seeing the full ticket number (no specific reason). I don’t need to know the full description, it can be shortened to 4 or 5 characters. Awk to the rescue. I love awk. With it I reduced the branch names to what I wanted, additionally I wrote a small checker that makes Spark’s style pull request naming also be shortened. You can check the awk approach here.

Git project

Knowing the project and current folder is everything I need to know. If I need to know more, I just pwd.

Error under last command, root, background processing

I don’t like my prompt to change shape. I changed the error reporting to switch colours of the current directory to red. And dropped background processes and root reporting, since I’m never root on my machine, and never run anything in the background unless it needs to be globally.

User information

I’m usually pretty sure which user I am, so... Removed it, I prefer a shorter prompt if possible.

Some helpers

I added a couple of helpers, reusing some of the code/ideas: jump to Github and jump to JIRA. You can see them in the repository.
Written by Ruben Berenguel

20180415

How does the 'in' keyword work in Python?

A subtitle to this post could be More yak shaving

A few days go I played a bit with a naive implementation of Bloom filters in Python. I wanted to time them against just checking whether a field is in a set/collection. I found something slightly puzzling: it looked like in worked too fast for smaller lists. And I wondered: maybe small lists are special internally, and allow for really fast lookups? Maybe they have some internal index? This raised the question: how does in find stuff in sequences?
Written by Ruben Berenguel