2

I have to retrieve the data from xml file and has to enter into the database. There was no error when I run my python file but the data is not entering into the database. I am unable to find where I went wrong. It would be very helpful if anyone could help me.

Here is my python code,

from xml.etree import ElementTree
import mysql.connector

dom = ElementTree.parse('profile.xml')

ticker = dom.findall('TICKER')
name = dom.findall('NAME')
address = dom.findall('ADDRESS')
phone = dom.findall('PHONE')
website = dom.findall('WEBSITE')
sector = dom.findall('SECTOR')
industry = dom.findall('INDUSTRY')
full_time = dom.findall('FULL_TIME')
bus_summ = dom.findall('BUS_SUMM')

ticker_list = [t.text for t in ticker]
name_list = [t.text for t in name]
add_list = [t.text for t in address]
phn_list = [t.text for t in phone]
site_list = [t.text for t in website]
sec_list = [t.text for t in sector]
ind_list = [t.text for t in industry]
emp_list = [t.text for t in full_time]
sum_list = [t.text for t in bus_summ]

db = mysql.connector.Connect(host = 'localhost', user = 'root', password ='root' , database = 'nldb_project')
cur = db.cursor()
query = "INSERT INTO profiles(`prof_ticker`,`name`,`address`,`phonenum`,`website`,`sector`,`industry`,full_time`,`bus_summ`) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)"

sqltuples = [(t,n,a,p,s,sec,i,e,su) for t,n,a,p,s,sec,i,e,su in zip(ticker_list,name_list,add_list,phn_list,site_list,sec_list,ind_list,emp_list,sum_list)]
cur.executemany(query,sqltuples)

I am using python 3.6.5 version.

Here is my xml code,

<?xml version="1.0"?>
<collection shelf = 'profile'>
<INFO>
    <TICKER>AAPL</TICKER>
    <NAME> Apple Inc.</NAME>
    <ADDRESS>1 Infinite Loop;Cupertino, CA 95014;United State</ADDRESS>
    <PHONE>408-996-1010</PHONE>
    <WEBSITE>http://www.apple.com</WEBSITE>
    <SECTOR>Technology</SECTOR>
    <INDUSTRY>Consumer Electronics</INDUSTRY>
    <FULL_TIME>100,000</FULL_TIME>
    <BUS_SUMM>Apple</BUS_SUMM>
    <SOURCE>https://finance.yahoo.com/quote/AAPL/profile?p=AAPL</SOURCE> 
</INFO>
<INFO>
    <TICKER>T</TICKER>
    <NAME> AT and T Inc.</NAME>
    <ADDRESS>208 South Akard Street;Dallas, TX 75202;United States</ADDRESS>
    <PHONE>210-821-4105</PHONE>
    <WEBSITE>http://www.att.com</WEBSITE>
    <SECTOR>Communication Services</SECTOR>
    <INDUSTRY> Telecom Services</INDUSTRY>
    <FULL_TIME>254,000</FULL_TIME>
    <BUS_SUMM>at and t</BUS_SUMM>
    <SOURCE>https://finance.yahoo.com/quote/T/profile?p=T</SOURCE>
</INFO>
<INFO>
    <TICKER>IBM</TICKER>
    <NAME>International Business Machines Corporation</NAME>
    <ADDRESS>1 New Orchard Road;Armonk, NY 10504;United States</ADDRESS>
    <PHONE>914-499-1900</PHONE>
    <WEBSITE>http://www.ibm.com</WEBSITE>
    <SECTOR>Technology</SECTOR>
    <INDUSTRY> Information Technology Services</INDUSTRY>
    <FULL_TIME>366,600</FULL_TIME>
    <BUS_SUMM>ibm</BUS_SUMM>
    <SOURCE>https://finance.yahoo.com/quote/IBM/profile?p=IBM</SOURCE>
</INFO>
<INFO>
    <TICKER>TWTR</TICKER>
    <NAME>Twitter,Inc.</NAME>
    <ADDRESS>1355 Market Street;Suite 900;San Francisco, CA 94103;United States</ADDRESS>
    <PHONE>415-222-9670</PHONE>
    <WEBSITE>http://www.twitter.com</WEBSITE>
    <SECTOR>Technology</SECTOR>
    <INDUSTRY>Internet Content Information</INDUSTRY>
    <FULL_TIME>3,372</FULL_TIME>
    <BUS_SUMM>twitter</BUS_SUMM>
    <SOURCE>https://finance.yahoo.com/quote/TWTR/profile?p=TWTR</SOURCE>
</INFO>
<INFO>
    <TICKER>TSLA</TICKER>
    <NAME>Tesla,Inc.</NAME>
    <ADDRESS>3500 Deer Creek Road;Palo Alto, CA 94304;United States</ADDRESS>
    <PHONE>650-681-5000</PHONE>
    <WEBSITE>http://www.tesla.com</WEBSITE>
    <SECTOR>Consumer Cyclical</SECTOR>
    <INDUSTRY>Auto Manufacturers</INDUSTRY>
    <FULL_TIME>37,543</FULL_TIME>
    <BUS_SUMM>tesla</BUS_SUMM>
    <SOURCE>https://finance.yahoo.com/quote/TSLA/profile?p=TSLA</SOURCE>
</INFO>
<INFO>
    <TICKER>PYPL</TICKER>
    <NAME>PayPal Holdings, Inc.</NAME>
    <ADDRESS>2211 North First Street;San Jose, CA 95131;United States</ADDRESS>
    <PHONE>408-967-1000</PHONE>
    <WEBSITE>http://www.paypal.com</WEBSITE>
    <SECTOR>Financial Services</SECTOR>
    <INDUSTRY>Credit Services</INDUSTRY>
    <FULL_TIME>18,700</FULL_TIME>
    <BUS_SUMM>paypal</BUS_SUMM>
    <SOURCE>https://finance.yahoo.com/quote/PYPL/profile?p=PYPL</SOURCE>
</INFO>
</collection>
2
  • could u please provide your xml input? and also i do see `(quote) is missing for full_time Commented May 3, 2018 at 11:38
  • I've addded my xml code @chakri Commented May 3, 2018 at 12:41

2 Answers 2

1

The reason behind your issues

You have 18 lists which all are empty, and that is why you do not see any effect on the database after insertion.

I am unable to find where I went wrong.

Your problems emanate from the misunderstanding on how to use findall():

Element.findall() finds only elements with a tag which are direct children of the current element.

So let me take ticker as an example:

ticker = dom.findall('TICKER')

What is dom? It simply points to the root element of your XML tree which is collection, in your case:

>>> dom
<Element 'collection' at 0x7f5e24a42e10>

Now ask yourself: which are the direct elements of collection? You see there are 6 INFO direct children elements and absolutely no TICKER element.

>>> infos = dom.findall('INFO')
>>> len(infos)
6

So when you run ticker = dom.findall('TICKER'), you are simply looking for direct children of collection which are called TICKER, and since there is no one, your list ticker is empty.

>>> ticker = dom.findall('TICKER')
>>> ticker
[]

So later in your code, when you run this: ticker_list = [t.text for t in ticker] you are simply looping over an empty list, and you got nothing from nothing, I mean:

>>> ticker_list = [t.text for t in tickers]
>>> ticker_list
[]

Now apply this reasoning to the 8 remaining first lists with their corresponding 8 lists in the second part of your code.

How to fix the problems?

How to solve your problem then? Well, if you understood what I explained above, you are already half way to fix your issues. Let us do it:

After loading your XML file into dom, initialize the 9 empty lists you need:

>>> ticker_list = []
>>> name_list = []
>>> add_list = []
>>> phn_list = []
>>> site_list = []
>>> sec_list = []
>>> ind_list = []
>>> emp_list = []
>>> sum_list = []

Then loop over your data after taking in consideration its hierarchy and how findall() works. For example, let us focus on the ticker_list:

>>> dom
<Element 'collection' at 0x7f5e24a42e10>
>>> infos = dom.findall('INFO')
>>> for info in infos:
...     tickers = info.findall('TICKER')
...     for ticker in tickers:
...             ticker_list.append(ticker.text)
... 
>>> ticker_list
['AAPL', 'T', 'IBM', 'TWTR', 'TSLA', 'PYPL']

Now do the same logic for the remaining 8 lists you are looking for:

>>> infos = dom.findall('INFO')
>>> for info in infos:
...     tickers = info.findall('TICKER')
...     for ticker in tickers:
...             ticker_list.append(ticker.text)
...     names = info.findall('NAME')
...     for name in names:
...             name_list.append(name.text)
...     adds = info.findall('ADDRESS')
...     for add in adds:
...             add_list.append(add.text)
...     phns = info.findall('PHONE')
...     for phn in phns:
...             phn_list.append(phn.text)
...     sites = info.findall('WEBSITE')
...     for site in sites:
...             site_list.append(site.text)
...     secs = info.findall('SECTOR')
...     for sec in secs:
...             sec_list.append(sec.text)
...     inds = info.findall('INDUSTRY')
...     for ind in inds:
...             ind_list.append(ind.text)
...     emps = info.findall('FULL_TIME')
...     for emp in emps:
...             emp_list.append(emp.text)
...     sums = info.findall('BUS_SUMM')
...     for sum in sums:
...             sum_list.append(sum.text)

Now your lists have data, and your insertion should work successfully.

Extra note:

Of course, iter would simplify the code much more than when using findall()

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Begueradj by giving such a useful information.
You are welcome. Remember: when you have code that does not work then you can post it on this website only, not on Code Review website (where the code must function correctly and we can provide suggestions on how to improve it)
0

Connecting to the database begins a new transaction, by default. If auto-commit is disabled (usually by default), you should commit to confirm any changes you make, or rollback to discard them.

Simply add

db.commit()

at the and of your code to commit changes.

There might be other errors, it would be better if you provide profile.xml for testing. At least, the quote for one of the fields is missing in the query.

1 Comment

I've added it. Please check it out! @OlegRybalchenko

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.