Parsing Website (XML) for specific with Python and save to mysql

Question

I want to send a REST request to the Flickr API. The response looks like this (XML):

This XML file does not appear to have any style information associated with it. The 
document tree is shown below.

<rsp stat="ok">
<photos page="1" pages="974001" perpage="250" total="243500161">

<photo id="123" owner="1234" secret="123" server="1" farm="4" 
title="DSC01316" ispublic="1" isfriend="0" isfamily="0" views="0" tags="" 
latitude="47.825188" longitude="11.300722" accuracy="16" context="0" 
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0" 
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

<photo id="123" owner="123" secret="123" server="1" farm="3" 
title="DSC01351" ispublic="1" isfriend="0" isfamily="0" views="0" tags="" 
latitude="47.825263" longitude="11.300891" accuracy="16" context="0" 
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0" 
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

and so forth...

What I want python to do is parsing the website for the words photo ID, Owner, Title etc. and extract the information and save it into a mysql database (set that already up with phpadmin).

For better understanding: I have this table where the first row is my classification and the second row is the extracted data from the example.

Photo ID    Owner    Secret    Server    Farm    Title    ispublic    isfriend    isfamily    ....
123         1234     123       1         4       DSC01316 1           0           0

I started off with that to extract the information. It does not work though...

import xml.etree.ElementTree as ET
import requests

url="https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description%22"
page=requests.get(url)
data = page.text
root = ET.fromstring(data)
for x in root.Element.get('photo'):
    test = x.get('Photo ID', 'Owner', 'Secret' , 'Server' , 'Farm' , 'Title' , 'ispublic' , 'isfriend' , 'isfamily')
print (test)

#does not work. it says: AttributeError: 'Element' object has no attribute 'Element'

Any ideas? I am just looking for a hint, I want to write it myself! Note that I am relatively new to python and a link to a documentation site wont work for me. i have too less knowledge for that. I will need a little further explanation. Thanks!

lqez · Accepted Answer · 2014-07-16 01:37:33Z

1

BeautifulSoup4 makes you easier to parse xml / http documents. Try below code after installing package via pip install beautifulsoup4.

from bs4 import BeautifulSoup

xml = "..."
soup = BeautifulSoup(xml)

for photo in soup.find_all('photo'):
    print(photo.attrs['title'])

Then you'll get,

DSC01316
DSC01351

Check out http://www.crummy.com/software/BeautifulSoup/bs4/doc/ for more information.

edited Jul 16, 2014 at 1:37

answered Jul 16, 2014 at 1:30

lqez

3,0685 gold badges28 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

four-eyes Over a year ago

I thought bs is just for html, not for xml? trying your code gives me a invalid syntax error and highlights the photo.attrs, precisely photo. I am pretty sure I had this problem a couple of times already...

lqez Over a year ago

And bs4 describes itself, 'Beautiful Soup is a Python library for pulling data out of HTML and XML files.' :)

four-eyes Over a year ago

Right, I should have known that. Trying to call this url gives me an error message: I update my post with it.

four-eyes Over a year ago

Oh hang on. I know why that error message is coming up. I solve it and try it again

Collectives™ on Stack Overflow

Parsing Website (XML) for specific with Python and save to mysql

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related