This may be a secret as I didn’t mention it here that one of my hobbies is economy. The other one is programming, which is kinda obvious. I enjoy teaching as well, and run Coding Dojo in my town, but this is a story for some other time. These three things, programming, economy, and teaching mixed together with an idea might create something interesting.
Few months ago I have decided to start calculating one economic indicator called [Misery Index](https://en.wikipedia.org/wiki/Misery_index_(economics) of Poland. It is simple, easy to understand, and data required to calculate it is available from Polish Bureau of Statistcs(GUS).
After having this idea I created https://jakjestw.pl. For first few months I have updated data by hand. New data is not released often and I could simply check and then apply changes myself. That was fine at first but started being cumbersome. I happen to fancy Elixir and to test it out on something real I decided to create poor man’s static site generator and data parser.
Requirements were ridiculously simple. The app had to fetch data from GUS website, which was in JSON format. Then transform it into something that can be injected into HTML file. Since I was done with manual labour it had to run on Gitlab pipelines.
Easier said than done. My major language is Python what influenced how I chose to model my data. This had been my demise. I picked really poorly, by following gut instinct of a Python programmer. Maps, tuples, and lists were my choice. In Python it does make sense, for such case of data transformation might not be the best but still Dict is a goto structure. My Elixir data looked like this, what a lovely year for Polish wallets by the way.
%{
2015 => [
{12, %{cpi: -0.5}},
{11, %{cpi: -0.6}},
{10, %{cpi: -0.7}},
{9, %{cpi: -0.8}},
{8, %{cpi: -0.6}},
{7, %{cpi: -0.7}},
{6, %{cpi: -0.8}},
{5, %{cpi: -0.9}},
{4, %{cpi: -1.1}},
{3, %{cpi: -1.5}},
{2, %{cpi: -1.6}},
{1, %{cpi: -1.4}}
]
}
My website displays latest number that is calculated each day. It also provides information how the number is calculated by providing both components of the indicator, CPI and unemployment. One last thing is a comparison of last four years by giving data from last month of each year. Not a perfect situation but will do for comparison.
Extracting such information from data structure presented above requires a lot of effort. Lot more than I expected and I have told myself that this is because I’m not fluent in Elixir. After I have finished I realised that it’s not me it is my data structure. Which is my fault, but it’s not me.
That sparked an idea to change my data structure to something that map/reduce
can handle easier. This time with some experience in processing data in pipelines I decided to skip the nested structures and have flat data like list and use proper date object.
[
[~D[2016-12-01], {:unemployment, 8.2}],
[~D[2016-11-01], {:unemployment, 8.2}],
[~D[2016-10-01], {:unemployment, 8.2}],
[~D[2016-09-01], {:unemployment, 8.3}],
[~D[2016-08-01], {:unemployment, 8.4}],
[~D[2016-07-01], {:unemployment, 8.5}],
[~D[2016-06-01], {:unemployment, 8.7}],
[~D[2016-05-01], {:unemployment, 9.1}],
[~D[2016-04-01], {:unemployment, 9.4}],
[~D[2016-03-01], {:unemployment, 9.9}],
[~D[2016-02-01], {:unemployment, 10.2}],
[~D[2016-01-01], {:unemployment, 10.2}]
]
This is perfect for map/reduce/filter
operations. Saying that code is simpler from my point of view does not makes sense as I spent a lot of time with it. The metric that can be helpful here is
number of added and removed lines. In total I have removed 409 lines while adding 244, that is 165 lines less then before. After removing lines that changed in test we get 82 removed and 67 added, which is around 25% less code doing the same thing. Which is a good news but giving only LOCs could be misleading as lines are not equal. So now code before
def second_page(all_stats) do
Enum.to_list(all_stats)
|> Enum.map(fn {x, data} -> for d <- data, do: Tuple.insert_at(d, 0, x) end)
|> List.flatten()
|> Enum.sort(fn x, y -> elem(x, 0) >= elem(y, 0) && elem(x, 1) >= elem(y, 1) end)
|> Enum.find(fn x -> map_size(elem(x, 2)) == 2 end)
|> elem(2)
|> Map.to_list()
end
And after.
def second_page(all_stats) when is_list(all_stats) do
Enum.drop_while(all_stats, fn e -> length(e) < 3 end)
|> hd
|> tl
end
This is the most striking example from the codebase that illustrates what changes this can involve.
TIL:
My main take from this experience is that mistakes at the start of a project may lead to disastrous consequences later on. The time spent on designing, that includes writing throw away code when doing spikes, is best investment you can make. Think about it before you start.
P.S.
Code is up on Gitlab, feel free to look and comment.