0

I want to send a REST request to the Flickr API. The response looks like this (XML):

This XML file does not appear to have any style information associated with it. The 
document tree is shown below.

<rsp stat="ok">
<photos page="1" pages="974001" perpage="250" total="243500161">

<photo id="123" owner="1234" secret="123" server="1" farm="4" 
title="DSC01316" ispublic="1" isfriend="0" isfamily="0" views="0" tags="" 
latitude="47.825188" longitude="11.300722" accuracy="16" context="0" 
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0" 
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

<photo id="123" owner="123" secret="123" server="1" farm="3" 
title="DSC01351" ispublic="1" isfriend="0" isfamily="0" views="0" tags="" 
latitude="47.825263" longitude="11.300891" accuracy="16" context="0" 
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0" 
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

and so forth...

What I want python to do is parsing the website for the words photo ID, Owner, Title etc. and extract the information and save it into a mysql database (set that already up with phpadmin).

For better understanding: I have this table where the first row is my classification and the second row is the extracted data from the example.

Photo ID    Owner    Secret    Server    Farm    Title    ispublic    isfriend    isfamily    ....
123         1234     123       1         4       DSC01316 1           0           0      

I started off with that to extract the information. It does not work though...

import xml.etree.ElementTree as ET
import requests

url="https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description%22"
page=requests.get(url)
data = page.text
root = ET.fromstring(data)
for x in root.Element.get('photo'):
    test = x.get('Photo ID', 'Owner', 'Secret' , 'Server' , 'Farm' , 'Title' , 'ispublic' , 'isfriend' , 'isfamily')
print (test)

#does not work. it says: AttributeError: 'Element' object has no attribute 'Element'

Any ideas? I am just looking for a hint, I want to write it myself! Note that I am relatively new to python and a link to a documentation site wont work for me. i have too less knowledge for that. I will need a little further explanation. Thanks!

1 Answer 1

1

BeautifulSoup4 makes you easier to parse xml / http documents. Try below code after installing package via pip install beautifulsoup4.

from bs4 import BeautifulSoup

xml = "..."
soup = BeautifulSoup(xml)

for photo in soup.find_all('photo'):
    print(photo.attrs['title'])

Then you'll get,

DSC01316
DSC01351

Check out http://www.crummy.com/software/BeautifulSoup/bs4/doc/ for more information.

Sign up to request clarification or add additional context in comments.

4 Comments

I thought bs is just for html, not for xml? trying your code gives me a invalid syntax error and highlights the photo.attrs, precisely photo. I am pretty sure I had this problem a couple of times already...
And bs4 describes itself, 'Beautiful Soup is a Python library for pulling data out of HTML and XML files.' :)
Right, I should have known that. Trying to call this url gives me an error message: I update my post with it.
Oh hang on. I know why that error message is coming up. I solve it and try it again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.