0

I am calling an API and getting following data structure:

{u'query': {u'pages': {u'120714': {u'ns': 0, u'pageid': 120714, u'revisions': [{u'size': 985}], u'title': u'Daniel Nannskog'}}, u'userinfo': {u'anon': u'', u'id': 0, u'name': u'2620:0:862:101:0:0:2:4'}}}

What I want is to get the size out from this data structure, I know how to extract the data from here but the problem is at the time of extraction I don't know the key(120714) after pages, for example:

lets assign this to a variable = d
>>> d
{u'query': {u'pages': {u'120714': {u'title': u'Daniel Nannskog', u'ns': 0, u'pageid': 120714, u'revisions': [{u'size': 985}]}}, u'userinfo': {u'anon': u'', u'id': 0, u'name': u'2620:0:862:101:0:0:2:4'}}}
>>> d['query']['pages']['120714']['revisions']
[{u'size': 985}]
>>> 

But how can I get to size without knowing the value of the second level key prior to extraction?

4
  • I don't understand the problem. Are you trying to find all d['query']['pages'][foo]['revisions']['size'] for all pages foo`? Commented Mar 16, 2013 at 4:53
  • 1
    Maybe you're just missing the word "don't"? "… at the time of extraction I don't know the key (120714) after pages…"? Commented Mar 16, 2013 at 5:01
  • Also, revisions holds a list, not a single value, so… you can't get the size, because there may be multiple values. Do you want all of them? The first? The longest? The total? Commented Mar 16, 2013 at 5:04
  • Sorry initial question was missing the word "don't" Commented Mar 16, 2013 at 5:13

2 Answers 2

2

The question isn't very clear, but I'll try to guess at what you're trying to do, and hopefully even if I guessed wrong it will show you the answer.

You don't know what pages you have. But you know that, whatever pages you have, you want the size of them. In other words, you want to access all of the values of pages, whatever keys those values have.

That's exactly what dict.values does:

sizes = [page['revisions'][0]['size'] for page in d['query']['pages'].values()]

If you don't understand the list comprehension, let's break it down:

pages = d['query']['pages']
# {u'120714': {u'ns': 0, u'pageid': 120714, 
#              u'revisions': [{u'size': 985}], u'title': u'Daniel Nannskog'}}
every_page = pages.values()
# [{u'ns': 0, u'pageid': 120714,
#   u'revisions': [{u'size': 985}], u'title': u'Daniel Nannskog'}]
sizes = []
for page in every_page:
    # {u'ns': 0, u'pageid': 120714,
    #  u'revisions': [{u'size': 985}], u'title': u'Daniel Nannskog'}
    sizes.append(page['revisions'][0]['size'])

Notice that I'm only picking the first revision. If you want all of the revisions' sizes, or the largest, or the sum of them, or the latest, or something else, it's not too hard to modify.

The same thing applies to the pages. If you only want the first page, or the largest, or the sum of sizes across pages, or whatever, you can change things there too.

For example, if you know there's only one page with only one revision, the whole thing reduces to:

size = d['query']['pages'].values()[0]['revisions'][0]['size']
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for a great explanation, yes now everything came to mind with list comprehension. Thanks again.
I wrote the api call just to get one records so for your solution I added following since I know i am getting only one record. sizes = [page['revisions'][0]['size'] for page in d['query']['pages'].values()][0]
If you're guaranteed to have only one page, you don't even need the list comprehension, because it'll just be a list of one value. Just do d['query']['pages'].values()[0]['revisions'][0]['size']. (The main reason I included the ['revisions'][0]['size'] bit was as a hint to this, but obviously it wasn't a very good hint… Sorry about that.)
2

If you are saying that the key 120714 is unknown, then if there is only a single key under d['query']['pages'], you do this:

e = d['query']['pages']
key = e.keys()[0]
print e[key]['revisions']

It looks like this:

>>> d = {u'query': {u'pages': {u'120714': {u'title': u'Daniel Nannskog', u'ns': 0, u'pageid': 120714, u'revisions': [{u'size': 985}]}}, u'userinfo': {u'anon': u'', u'id': 0, u'name': u'2620:0:862:101:0:0:2:4'}}}
>>> e = d['query']['pages']
>>> key = e.keys()[0]
>>> print e[key]['revisions']
[{u'size': 985}]

3 Comments

Why e[e.keys()[0]] instead of the much simpler e.values()[0]?
how do i get the size from e.values()[0]?
@Null-Hypothesis: e.values()[0]['revisions'] is the exact same thing as key = e.keys()[0] followed by e[key]['revisions'], so you get it the same way: e.values()[0]['revisions'][0]['size'].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.