2

Hey guys, i have the following problem: 1 process executes a very large query and writes the results to a file, inbetween the process should update an status to the database.

first thaught: NO PROBLEM, pseudo code:

db = mysqldb.connect()
cursor = db.cursor()
large = cursor.execute(SELECT * FROM VERYLARGETABLE)
for result in large.fetchall():
     file.write(result)
if timetoUpdateStatus: cursor.execute(UPDATE STATUS)

problem: when getting 9 million results the "large = cursor.execute(SELECT * FROM VERYLARGETABLE)" never finishes... i figured out a border at 2 million entrys at 4 columns where the mysql server finished the query after 30 seconds but the python process keeps running for hours... that maybe a bug in the Python MySQLDB library..

SO SECOND TRY: db.query function with db.use_results() and fetch_row():

db = mysqldb.connect()
cursor = db.cursor()
db.query(SELECT * FROM VERYLARGETABLE)
large = large.use_result()
while true:
    for row in large.fetch_row(100000):
        file.write(row)
    if timetoUpdateStatus: cursor.execute(UPDATE STATUS) <-- ERROR (2014, "Commands out of sync; you can't run this command now")

so THIRD TRY was using 2 MySQL connections... which doesnt work, when i open a second connection the first one disappears....

any suggestions??

3 Answers 3

6

Try using a MySQL SSCursor. It will keep the result set in the server (MySQL data structure), rather than transfer the result set to the client (Python data structure) which is what the default cursor does. Using an SSCursor will avoid the long initial delay caused by the default cursor trying to build a Python data structure -- and allocate memory for -- the huge result set. Thus, the SSCursor should also require less memory.

import MySQLdb
import MySQLdb.cursors
import config

cons = [MySQLdb.connect(
    host=config.HOST, user=config.USER,
    passwd=config.PASS, db=config.MYDB,
    cursorclass=MySQLdb.cursors.SSCursor) for i in range(2)]
select_cur, update_cur = [con.cursor() for con in cons]
select_cur.execute(SELECT * FROM VERYLARGETABLE)
for i, row in enumerate(select_cur):
    print(row)
    if i % 100000 == 0 or timetoUpdateStatus:
        update_cur.execute(UPDATE STATUS)
Sign up to request clarification or add additional context in comments.

1 Comment

This is very nice, but you would need another connection for update queries, as use of SSCursor requires the whole resultset to be fetched before more queries can be executed.
4

Try splitting up the "select * from db" query into smaller chunks

index=0
while True:
    cursor.execute('select * from verylargetable LIMIT %s,%s', (index, index+10000))
    records = cursor.fetchall()
    if len(records)==0:
          break
    file.write(records)
    index+=10000
file.close()

Comments

2

Use the LIMIT statement in your big select:

limit = 0
step = 10000
query = "SELECT * FROM VERYLARGETABLE LIMIT %d, %d"
db = mysqldb.connect()
cursor = db.cursor()
while true:
    cursor.execute(query, (step, limit))
    for row in cursor.fetch_all():
        file.write(row)
    if timetoUpdateStatus:
        cursor.execute(update_query)
    limit += step

Code is not tested, but you should get the idea.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.