04 Dec 2020

tl;dr - I’m learning a lot at work and want to get better at communicating about it all. Enter weeknot.es !

  • I wrote a bunch of SQL! We have some intense JSONB in postgres, and want to analyse the bejeesus out of it. jsonb_each has been super handy for turning jsonb into rows, which makes it a bit easier to work with <3, especially in older postgreses which don’t have all of the fancy filtering syntax. For example, we can do this:

To illustrate my point, say we’ve got a table(dndparty) that maps names to dnd party details, and all the details are stored as jsonb. Ideally we’d probs decompose the jsonb as soon as you wanted to do anything remotely fancy with it, and/or we had a good sense of the data structure being stable, but let’s say we can’t for Reasons.

You might notice that the party details are a bit…specific. They’re based on the Comic Relief DnD streams, which you can watch via twitch.tv/dnd and you can give them money via comicrelief.com/dnd <3

partyname (varchar) partydetails (jsonb)
dnd2 {“groos lee”: {“hp”: 10, “charClass”: “sorcerer”}, “squirreldalf”: {“hp”: 30, “charClass”: “druid”}, “big hoof energy”: {“hp”: 25, “charClass”: “bard”}, “timothy dickspanner”: {“hp”: 25, “charClass”: “paladin”}}
dnd {“orya stork”: {“hp”: 30, “charClass”: “rogue”}, “bloodcurdle”: {“hp”: 50, “charClass”: “necromancer”}, “silky mooncup”: {“hp”: 23, “charClass”: “ranger”}, “john butthammer”: {“hp”: 5, “charClass”: “cleric”}}

We’d quite like to add up the HP of the parties, and we have that info, so we should be able to do some SUM shenanigans and add it up, right? I originally tried something like this:

1
2
3
4
5
6
7
    SELECT 
        partyname, 
        SUM((partydetails->>'hp')::integer) 
    FROM  
        dndparty 
    GROUP BY 
        partyname;

which gets you this:

partyname sum
dnd2 [EMPTY]
dnd1 [EMPTY]

The problem is that we need to sum the values of the map, and we’re only able to access the key-value pairs with the -> style syntax. We can use jsonb_each to transform the map into rows (handily cross joined with the rest of the row that they’re part of) like this:

1
2
3
4
5
SELECT  
    partyname,k,v 
FROM 
    dndparty, 
    jsonb_each(partydetails) AS t(k,v);

which gets us something like this:

partyname k v
dnd2 groos lee {“hp”: 10, “charClass”: “sorcerer”}
dnd2 squirreldalf {“hp”: 30, “charClass”: “druid”}
dnd2 big hoof energy {“hp”: 25, “charClass”: “bard”}
dnd2 timothy dickspanner {“hp”: 25, “charClass”: “paladin”}
dnd1 orya stork {“hp”: 30, “charClass”: “rogue”}
dnd1 bloodcurdle {“hp”: 50, “charClass”: “necromancer”}
dnd1 silky mooncup {“hp”: 23, “charClass”: “ranger”}
dnd1 john butthammer {“hp”: 5, “charClass”: “cleric”}

Now we can use our pre-existing SQL mad skillz from the first query to :

  • get the HP value from our new v column with v ->> 'hp' , using ->> not -> so that we get a text value back, and not just more jsonb
  • cast that value to an int with ::integer
  • use SUM to add them all up
  • grouping by partyname so re-aggregate back the rows we spread out with the jsonb_each

…giving us a working query \o/

1
2
3
4
5
6
7
8
    SELECT 
        partyname, 
        SUM((v->>'hp')::integer) 
    FROM 
        dndparty, 
        jsonb_each(partydetails) AS t(k,v) 
    GROUP BY 
        partyname;
partyname sum
dnd2 90
dnd1 108
  • I also wrote….yet more SQL for handling bulk data as part of a pipeline, involving shenanigans with CTEs, functions that needed null handling, getting them to play nicely with sequelize etc.

  • We’ve also been thinking about scale challenges; how to understand them, how to measure them

  • Lastly, I’ve been thinking about how to make sense of a massive codebase! With a large system, especially one spread across a lot of repos, it can be tough to reason about what’s going on. We have some tests, documentation and friendly dev’s brains to pick, but I’m also keen to understand what else we can do. We’re exploring centralied ADRs to provide some background to our architectural decisions (thus, err, the name :D) and I’ve also been looking at whether VSCodeTour is a useful way of recording some of the flows.

OK, that’s my first ever weeknotes! Merry weekend! <3