Parltrack blog - NO-DB Console - Part III

After all these interruptions about schemas and jq we now finally come to the real stuff.

But first a note: all these examples below are broken up into multiple lines, but normally its ok to just write them out all in one line.

Without further ado, let's finally dive in:

Get all active mep Twitter addresses

This is the example we already had in the first installment of this series:

[f"{m['Name']['full']} {','.join(m['Twitter'])}"
 for m in IDXs["meps_by_activity"]['active']
 if 'Twitter' in m]

We can see, that the "query" works on the meps_by_activity index, taking only the MEPs that are active, and if the mep has a Twitter value in their record its just being reported. Quite simple if you know what a list comprehension is. We'll do most of our queries as list comprehensions. The three lines above give a general idea of the structure, in the 1st line - the output expression - we specify what should be returned from the record. In the second line we specify where the record m (the variable) comes from (the input set). And in the third line we can specify conditions (the predicate) that the variable must obey in order to be considered in the first line.

Number of Parties of currently active MEPs

The following snippet shows you the beauty of european democracy:

len({c['party']
     for m in IDXs["meps_by_activity"]['active']
     for c in m['Constituencies']
     if c['end']=='9999-12-31T00:00:00'})

In this example we also see how we can select "sub-records" in the main MEP record m. In the third line we select all constituencies c of the currently selected MEP m and in the fourth line we discard any which are not still ongoing. This weird date in the year 9999 is something we use in Parltrack to signal time intervals that have not ended yet.

Also notable is that we use { to start the whole query which creates a set instead of a list for the results. This is useful since a set can have each value only once, while a list can have the same value multiple times. This way we make sure that each party gets only counted once. Why? well, there is 3 MEPs from the polish Wiosna party in the parliament, if we would use a list this would count as 3, in a set only as 1.

One for the civil liberties

Imagine you are an underfundend NGO and there is an important vote coming up in LIBE, how can you find out the phone numbers of all MEPs in LIBE? Easy-peasy:

[f"{m['Name']['full']} {m['Addresses']['Brussels']['Phone']}"
 for m in IDXs["meps_by_activity"]['active']
 for c in m.get('Committees',[])
 if c['end']=='9999-12-31T00:00:00' and c.get('abbr')=='LIBE']

In this snippet the first line contains the expected result, the full name of the MEP and the Brussels phone number of the MEP represented by the object m. In the second line we select all active MEPs and associate them with the object m. In the third line we take all committees of the currently selected MEP. This is important as in the fourth line we discard all MEPs, who do not have LIBE in the committee abbreviation and the ending date of their committee mandate is not this weird date in the future.

MEPs changing their names

The following example shows two interesting things. First we see one of the big problems with this dataset, MEPs change their names all the time, searching for names is thus errorprone as you might miss stuff that they did under a different name. And second this example shows how to use the historical change log items:

sorted([(len(l), l, mepid)
        for l,mepid in [
                        ([c['data']
                         for changes in m.get('changes',{}).values()
                         for c in changes
                         if c['path']==['Name','full']]
                        , m['UserID'])
                        for m in DBS['ep_meps'].values()]
        if l!=[]])

This is even a bit more complex than the previous ones. Here we have three nested list comprehensions. And it makes sense to unravel them from the inside-out. The most important and trivial part is in line 8, where we take all MEPs and assign them to the object m again. Then comes the innermost list comprehension from lines 3-6:

[[c['data']
 for changes in m.get('changes',{}).values()
 for c in changes
 if c['path']==['Name','full']]

This filter in line 2 takes all historical changes of the currently selected MEP m, and in line 3 we assign each change to the object c. In line 4 we discard any change which does not affect the full name of the MEP. In other words this inner loop creates a list of all name changes for the currently selected MEP.

Zooming out to the whole snippet again, line 7 looks strange. But if you consider the inner loop as the result of the loop over the MEPs, then you can see how this line only adds the currently selected MEP's UserID to the result. This UserID is an important value, it's the MEP unique identifier in the EP, and parltrack uses it everywhere.

To summarize the two inner loops, we create a list of UserIDs with the according list of name-changes for the MEP identified by this UserID.

The last to unravel are the first two and the last lines. These are meant to do a bit of housekeeping. The last line really just filters out any results from the inner filters that are empty - as in the MEP never changed their name as far as Parltrack is concerned. And the first line sorts the remaining results according to the times a MEP has changed their name.

"Finally this blog post series shows some juicy examples how to dig into this bonanza." - enthuses Adam J. Schumpeter, investigative deputy miner