The following snippet is just for warming up, you should be able to
figure this out yourself, if you got the snippets from the previous installment
of this series (hint it has nothing to do with co-authorship):
| {tuple(m.items()) + (VOTE['url'], VOTE['voteid'])
for VOTE in [V
for V in DBS['ep_votes'].values()
if 'votes' in V]
for t in ['+','-','0']
for g in VOTE['votes'].get(t,{'groups':{}})['groups'].values()
for m in g
if 'mepid' not in m}
|
Context: Sadly the EP publishes the plenary votes with the names of
the MEPs only, so it is up to Parltrack to figure out which name maps
to which UserID
. Unfortunately this process is not perfect and there
is gaps. To figure out all the votes and names that we were unable to
resolve the above query should list them all. Some of them are
weird. Definitely material to dig deeper and to maybe to ask questions
to the EP about the stranger ones.
You might notice that this and the following snippets all take some
time, they work on the whole dataset and thus are kind of slowish.
List Unidentified Amendment Authors
The context in this next snippet is similar to the previous one,
Amendments also specify authorship fuzzily. But instead of writing a
list comprehension we actually write the query out as an almost full
function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 | from collections import Counter
unk=Counter()
for am in DBS['ep_amendments'].values():
if len(am.get('meps',[]))!=len(am.get('authors','').split(',')):
authors = {unws(x.strip().lower()) for x in am.get('authors','').split(',')}
for m in am.get('meps',[]):
name = DBS['ep_meps'][m]['Name']['full'].lower()
if name in authors:
authors.remove(name)
continue
for name in list(authors):
mepid=mepid_by_name(normalize_name(name))
if mepid:
authors.remove(name)
if not 'meps' in am: am['meps']=[]
am['meps'].append(mepid)
for a in authors:
unk[a]+=1
print('\n'.join(["%s %s" % (k, v) for v,k in sorted(unk.items(), key=lambda x: x[1])]))
print(sum(unk.values()))
|
There's a few notable things here. We use a Counter
object which is a
convenient way to - you guessed it - count things. We use the parltrack
function unws()
which stands for unwhitespace
- it removes redundant
whitespace from a string. And we use the normalize_name()
parltrack function
when attempting to look up a MEP in the mepid_by_name
index.
This following snippet is not as complex as the previous one, but it
is more interesting in a data-mining-kind of sense. It aggregates all
groups of MEPs that have co-authored an amendment to the
'2016/0280(COD)' dossier, and ranks them by the number of amendents
submitted by this group.
| from collections import Counter
groups = Counter()
for am in IDXs['ams_by_dossier']['2016/0280(COD)']:
group = tuple(sorted({DBS['ep_meps'][mepid]['Name']['full'] for mepid in am['meps']}))
groups[group]+=1
sorted(groups.items(),key=lambda x: x[1])
|
The following query an improved version of the previous query, it also includes
the info which MEP was associated with which political group at the
time of co-authoring the amendment.
| from collections import Counter
groups = Counter()
for am in IDXs['ams_by_dossier']['2016/0280(COD)']:
group = set()
for mepid in am.get('meps',[]):
mep = DBS['ep_meps'][mepid]
group.add((mep['Name']['full'], matchInterval(mep['Groups'], am['date'])['groupid']))
group = tuple(sorted(group))
groups[group]+=1
sorted(groups.items(),key=lambda x: x[1])
|
This one introduces a useful helper-function: matchInterval(list, date)
it takes a list of objects that each has a start
and an end
date,
the function returns the item that matches the date given as the
second parameter to matchInterval()
it also handles open-ended
intervals where the end
is set to the year 9999
.
The following query is a variation of the above, but instead of
focusing on only one dossier, this creates an all-time ranking:
1
2
3
4
5
6
7
8
9
10
11
12 | from collections import Counter
groups = Counter()
for am in DBS['ep_amendments'].values():
group = set()
for mepid in am.get('meps',[]):
if not mepid: continue
mep = DBS['ep_meps'][mepid]
if not mep: continue
group.add((mep['Name']['full'], matchInterval(mep['Groups'], am['date']).get('groupid','???')))
group = tuple(sorted(group))
groups[group]+=1
sorted(groups.items(),key=lambda x: x[1])
|
The next variant adds the MEPs country and a weight to each MEP based on the number of days
being in office at the time of running this query - a kind of seniority weight
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | from collections import Counter
groups = Counter()
for am in DBS['ep_amendments'].values():
group = set()
for mepid in am.get('meps',[]):
if not mepid: continue
mep = DBS['ep_meps'][mepid]
if not mep: continue
group.add((mep['Name']['full'],
matchInterval(mep['Groups'], am['date']).get('groupid','???'),
matchInterval(mep['Constituencies'], am['date']).get('country','???'),
sum(((datetime.now() if c['end'] == '9999-12-31T00:00:00'
else datetime.strptime(c['end'], u"%Y-%m-%dT%H:%M:%S"))
-datetime.strptime(c['start'], u"%Y-%m-%dT%H:%M:%S")).days
for c in mep['Constituencies'])))
group = tuple(sorted(group))
groups[group]+=1
sorted(groups.items(),key=lambda x: x[1])[-150:]
|
Our next-to-last - and quite complex - snippet turns the whole perspective
around, and gives us a list of all co-authorship groups MEP Axel Voss has
been a member of:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | from collections import Counter
meps = Counter()
for am in DBS['ep_amendments'].values():
if not 96761 in am.get('meps',[]): continue
for mepid in am.get('meps',[]):
if not mepid: continue
mep = DBS['ep_meps'][mepid]
if not mep: continue
meps[(mep['Name']['full'],
matchInterval(mep['Groups'], am['date']).get('groupid','???'),
matchInterval(mep['Constituencies'], am['date']).get('country','???'),
sum(((datetime.now() if c['end'] == '9999-12-31T00:00:00'
else datetime.strptime(c['end'], u"%Y-%m-%dT%H:%M:%S"))
-datetime.strptime(c['start'], u"%Y-%m-%dT%H:%M:%S")).days
for c in mep['Constituencies']))] += 1
sorted(meps.items(),key=lambda x: x[1])
|
This last one eliminates Axel Voss from the query and creates an all-time-best-of top30 list of amendment writers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 | from collections import Counter
# get weighted stats on amendment authors
def getmmd(mepid, am): #get mep metadata
mep = DBS['ep_meps'][mepid]
if not mep: return
return (mep['Name']['full'],
matchInterval(mep['Groups'], am['date']).get('groupid','???'),
matchInterval(mep['Constituencies'], am['date']).get('country','???'),
sum(((datetime.now() if c['end'] == '9999-12-31T00:00:00'
else datetime.strptime(c['end'], u"%Y-%m-%dT%H:%M:%S"))
-datetime.strptime(c['start'], u"%Y-%m-%dT%H:%M:%S")).days
for c in mep['Constituencies']))
meps = Counter()
for am in DBS['ep_amendments'].values():
for mepid in am.get('meps',[]):
if not mepid: continue
tmp = getmmd(mepid, am)
if not tmp: continue
name, group, country, days = tmp
meps[(name,group,country,days)] += 1
stats={}
for mep, cnt in meps.items():
days=mep[3]
stats[mep] = (cnt/float(days), cnt, days)
sorted(stats.items(),key=lambda x: x[1])[-30:]
|
And this concludes our little series on using the parltrack no-db console to
dig up trivia and other facts of questionable utility from the parltrack data.
"What a ride! There is so many nuggets in this dataset, it really helps uncovering obscure details of the european wurst-maschinery." wraps up this series prof. Uriah Xavier Deinhof, deputy-elect for explosive strip-mining.