2

I have tried terminating fields / lines using some long strings, but every time some entries are corrupted.
The BLOB is numpy.array of integers saved as uint32 (numpy.array(intlist, dtype='uint32').tostring()).
Upon unload (numpy.fromstring(blob, dtype='uint32')), numpy reports string size must be a multiple of element size error.

mysql -p somedb -e "LOAD DATA LOCAL INFILE '/tmp/tmpBZRvWK' INTO TABLE `hash2protids` FIELDS TERMINATED BY '....|....' LINES TERMINATED BY '....|....\n'"

EDIT
I've tried INSERT INTO hash2protids (hash, protids) VALUES ('key', LOAD_FILE('/tmp/tmpfile')); (Insert file into mysql Blob), but LOAD_FILE works only with files on the server. In addition, I have 3M rows to insert, so it's way too slow...

EDIT2
For now I'm relying on MySQLdb executemany, as it's:

  • much faster than inserting single rows
  • can load any datatype

Code looks something like this:

step = 1000
data = []
cmd = "INSERT INTO sometable (hash, protids) VALUES (%s, %s)"
for (hash, protids) in GENERATOR:
    pblob = numpy.array(protids, dtype='uint32').tostring()
    data.append((hash, pblob))
    if len(data)>step:
        cur.executemany(cmd, data)
        data = []
if data:
    cur.executemany(cmd, data)

Note, I've split cur.executemany every 1000 elements, as I was running into MySQL errors due to 'max_allowed_packet' limit, when all entries were executed at once.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.