3

I have a a number of XML files I need to open and then process to produce a large number of rows that are then inserted into several tables in a remote postgress database.

To extract the XML data I am using xml.etree.ElementTree to parse the XML tree and extract elements as needed. While I am doing a number of things, the basic action is to take a specific element, whether String or Integer and place in one of a number of dictionaries.

After some more processing I have a number of dictionaries that I need to insert into my database. For any single xml file I may produce up to 8-10,000 rows (or queries) across 3 tables.

While testing, I was outputting to sql files and then manually running the query. That is obviously not going to work if I have lots of xml files..

I have therefore tried using psycopg2 to automate this process. As I understand it from stack overflow and elsewhere running individual execute functions is painfully slow. Based on This stackoverflow question I have attempted to write code as follows:

QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row) for row in myData)
cur.execute('INSERT INTO DBTABLE' + QueryData)
cur.commit()

where myData is a list of tuples [(a,b,c),(a,b,c),(a,b,c)...] the contents of which are a combination of data extracted by xml.etree.ElementTree and values I have calculated myself.

When I try to actually execute the above code however I get the following error:

TypeError: sequence item 0: expected str instance, bytes found

OK... if I then try to convert my data (each tuple element) to str() however I get:

TypeError: encoding without a string argument

Am I just going about this totally wrong? How can I do what I need? I am using Python3.

ADDITIONAL

I was asked to show an example of the data.

Here is the simplest, it is 3 integer values to put into a table. It is of the form: (document_id,item_index,item_code)

A typical example would be: (937, 138, 681)

My general attempts to convert have been to try:

(str(document_id),str(item_index),str(item_code))

I have also tried going the other way:

(bytes(document_id,'utf-8'),bytes(item_index,'utf-8'),bytes(item_code,'utf-8'))

the latter also raises the error: TypeError: encoding without a string argument

1
  • can we see the data in question and how you're trying to convert it? Commented Sep 24, 2016 at 3:28

3 Answers 3

7

The psycopg documentation states that for cur.mogrify:

The returned string is always a bytes string.

So to use this hack, you just need to decode the result of mogrify back to a string, for example:

QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row).decode('utf-8') for row in myData)
cur.execute('INSERT INTO DBTABLE' + QueryData)

However, as mentioned in this StackOverflow question, the most efficient way to copy large amounts of data is to use COPY. You can do so with any "python file-like object". Here's an example from the psycopg docs:

>>> f = StringIO("42\tfoo\n74\tbar\n")
>>> cur.copy_from(f, 'test', columns=('num', 'data'))
>>> cur.execute("select * from test where id > 5;")
>>> cur.fetchall()
[(6, 42, 'foo'), (7, 74, 'bar')]
Sign up to request clarification or add additional context in comments.

Comments

0

you are missing VALUES after the table name, everything else seems correct:

cursorPG.execute("INSERT INTO test VALUES "+','.join(cursorPG.mogrify('(%s,%s)',x) for x in mydata))

1 Comment

Ok sure, I will add that but it does not fix the issue. the line causing the problem is before the execute. QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row) for row in myData) still throws the TypeError
0

Ok so I have it working... I am however confused as to why my solution worked. I am posting it as an answer but if someone could explain to me what is going on that would be great:

basically this:

QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row) for row in myData)
cur.execute('INSERT INTO DBTABLE' + QueryData)

had to be changed to:

QueryData = b','.join(cur.mogrify(b'(%s,%s,%s)', row) for row in myData)
cur.execute(b'INSERT INTO DBTABLE' + QueryData)

Which strikes me as pretty inelegant.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.