So I’m working on something I have some percentile values for each month over the past ten years. I need to write an endpoint so that my javascript, which will eventually be some D3 histograms, can easily grab that data.

The way my tables are set up, there aren’t any zero values. If I have zero occurences for a given month, there is simply no corresponding row. This means that for my histograms, I need to fill in the missing months. And I start thinking about how I should do that: server side or in the browser, looping over all the possible months, checking to see if we have data etc.

There isn’t anything particularly difficult about this, aside from the fact that we don’t particularly enjoy touching dates. It is just kind of tedious and possibly easy to get wrong.

And then suddenly I think: SQL!

After all, we are basically just taking a database query, munging it and sending it down the road.

Here is the query the way I wrote it the first time:

select tm.month as month, p.percentile as percentile
from tag_month_percentiles p
join tag_month tm on tm.id = p.tag_month_id
join tag t on tm.tag_id = t.id
where tm.tag_id = ?
order by tm.month asc

What I needed was an outer join to produce lines for all those “missing” months.

After fussing around a little, here is what I ended up with:

select tm.month as month, IfNull(peep.percentile, 0) as percentile
from tag_month tm
left join
     (select tm2.month, p.percentile
     from
     tag_month_percentiles p
     join tag_month tm2 on tm2.id = p.tag_month_id
     where tm2.tag_id = ?) as peep
on tm.month = peep.month
group by tm.month
order by tm.month asc

I don’t do this often enough for it to be second nature but in the end it is pretty simple. The original query becomes a subquery and we do a left join against the entire table. All the empty months, that disappeared with the original inner join, reappear as nulls. And we use IfNull (this is MySQL - not my fault) to replace the nulls with zero in the final results.

Because this is coming directly from the data source, I don’t need to figure out when my list of months should begin or end. All the months are automatically where they should be. I don’t have to think about it anymore.

There is nothing earth shattering here…​ but it does show how with about five lines of code, and the declarative nature of SQL, I avoided writing many more lines elsewhere, and avoided all the possible bugs that could have crept in.

Comments