After all these interruptions about schemas and jq we now finally come to the real stuff.
But first a note: all these examples below are broken up into multiple lines, but normally its ok to just write them out all in one line.
Without further ado, let's finally dive in:
Get all active mep Twitter addresses
This is the example we already had in the first installment of this series:
1 2 3 |
|
We can see, that the "query" works on the meps_by_activity
index,
taking only the MEPs that are active
, and if the mep has a Twitter
value in their record its just being reported. Quite simple if you
know what a list comprehension is. We'll do most of our queries as
list comprehensions. The three lines above give a general idea of the
structure, in the 1st line - the output expression - we specify what should be returned from the
record. In the second line we specify where the record m
(the variable) comes
from (the input set). And in the third line we can specify conditions (the predicate) that the variable
must obey in order to be considered in the first line.
Number of Parties of currently active MEPs
The following snippet shows you the beauty of european democracy:
1 2 3 4 |
|
In this example we also see how we can select "sub-records" in the
main MEP record m
. In the third line we select all constituencies
c
of the currently selected MEP m
and in the fourth line we
discard any which are not still ongoing. This weird date in the year
9999 is something we use in Parltrack to signal time intervals that
have not ended yet.
Also notable is that we use {
to start the whole query which creates
a set instead of a list for the results. This is useful since a set
can have each value only once, while a list can have the same value
multiple times. This way we make sure that each party gets only
counted once. Why? well, there is 3 MEPs from the polish Wiosna
party in the parliament, if we would use a list this would count as 3,
in a set only as 1.
One for the civil liberties
Imagine you are an underfundend NGO and there is an important vote coming up in LIBE, how can you find out the phone numbers of all MEPs in LIBE? Easy-peasy:
1 2 3 4 |
|
In this snippet the first line contains the expected result, the full
name of the MEP and the Brussels phone number of the MEP represented
by the object m
. In the second line we select all active MEPs and
associate them with the object m
. In the third line we take all
committees of the currently selected MEP. This is important as in the
fourth line we discard all MEPs, who do not have LIBE in the committee
abbreviation and the ending date of their committee mandate is not
this weird date in the future.
MEPs changing their names
The following example shows two interesting things. First we see one of the big problems with this dataset, MEPs change their names all the time, searching for names is thus errorprone as you might miss stuff that they did under a different name. And second this example shows how to use the historical change log items:
1 2 3 4 5 6 7 8 9 |
|
This is even a bit more complex than the previous ones. Here we have
three nested list comprehensions. And it makes sense to unravel them
from the inside-out. The most important and trivial part is in line 8,
where we take all MEPs and assign them to the object m
again. Then
comes the innermost list comprehension from lines 3-6:
1 2 3 4 |
|
This filter in line 2 takes all historical changes of the currently
selected MEP m
, and in line 3 we assign each change to the object
c
. In line 4 we discard any change which does not affect the full
name of the MEP. In other words this inner loop creates a list of all
name changes for the currently selected MEP.
Zooming out to the whole snippet again, line 7 looks strange.
But if you consider the inner loop as the result
of the loop over the MEPs, then you can see how this line only adds
the currently selected MEP's UserID
to the result. This UserID
is
an important value, it's the MEP unique identifier in the EP, and
parltrack uses it everywhere.
To summarize the two inner loops, we create a list of UserID
s with
the according list of name-changes for the MEP identified by this
UserID
.
The last to unravel are the first two and the last lines. These are meant to do a bit of housekeeping. The last line really just filters out any results from the inner filters that are empty - as in the MEP never changed their name as far as Parltrack is concerned. And the first line sorts the remaining results according to the times a MEP has changed their name.
"Finally this blog post series shows some juicy examples how to dig into this bonanza." - enthuses Adam J. Schumpeter, investigative deputy miner