0

I know this has been touched on several times, but I cannot seem to get this working. I am writing a python program that will take in an sqlite3 database dump file, analyse it and recreate it using a database migration tool (called yoyo-migrations)

I am running into an issue with blob data in sqlite3 and how to correctly format it.

Here is a basic explanation of my programs execute - read in dump file, separate into CREATE statements, INSERT statements and other - generate migration files for CREATEs - generate a migration file for each tables inserts - run the migration to rebuild the database ( except now it is built off of migrations)

Basically I was given a database, and need to get it under control using migrations. This is just the first step (getting the thing rebuilt using the migration tool)

Here is the table creation of the blob table:

CREATE TABLE blob_table(
    blockid INTEGER PRIMARY KEY,
    block blob
)

I then create the migration file:

#
# file: migrations/0001.create_table.py
# Migration to build tables (autogenerated by parse_dump.py)
#

from yoyo import step
step('CREATE TABLE blob_table( blockid INTEGER PRIMARY KEY, block blob);')

Note that I just write that to a file, and then at the end run the migrations. Next I need to right a "seed" migration that inserts the data. This is where I run into trouble!

# here is an example insert line from the dump
INSERT INTO blob_table VALUES(765,X'00063030F180800FE1C');

So the X'' stuff is the blob data, and I need to write a python file which INSERTs this data back into the table. I have a large amount of data so I am using the execute many syntax. Here is what the seed migration file looks like (an example):

#
# file: migrations/0011.seed_blob_table.py
# Insert seed data for blob table
#

from yoyo import step
import sqlite3

def do_step(conn):
    rows = [
        (765,sqlite3.Binary('00063030303031340494100')),
        (766,sqlite3.Binary('00063030303331341FC5150')),
        (767,sqlite3.Binary('00063030303838381FC0210'))
    ]
    cursor = conn.cursor()
    cursor.executemany('INSERT INTO blob_table VALUES (?,?)', rows)

# run the insert
step(do_step)

I have tried using sqlite3.Binary(), the python built-in buffer(), both combinations of the two as well as int('string', base=16), hex() and many others. No matter what I do it will not match up with the database from the dump. What I mean is:

If I open up the new and old database side by side and excute this query:

# in the new database, it comes out as a string
SELECT * FROM blob_table WHERE blockid=765;
> 765|00063030303031340494100

# in the old database, it displays nothing
SELECT * FROM blob_table WHERE blockid=765;
> 765|

# if I do this in the old one, I get the x'' from the dump
SELECT blockid, quote(block) FROM blob_table WHERE blockid=765;
765|X'00063030303031340494100'

# if I use the quote() in the new database i get something different
SELECT blockid, quote(block) FROM blob_table WHERE blockid=765;
765|X'303030363330333033303330... (truncated, this is longer than the original and only has digits 0-9

My end goal is to rebuild the database and have it be identical to the starting one (from which the dump was made). Any tips on getting the blob stuff to work are much appreciated!

1 Answer 1

1

The buffer class is capable of handling binary data. However, it takes care to preserve the data you give to it, and '00063030303031340494100' is not binary data; it is a string that contains the digits zero, zero, zero, six, etc.

To construct a string containing binary data, use decode:

import codecs
blob = buffer(codecs.decode(b'00063030303031340494100', 'hex_codec'))
Sign up to request clarification or add additional context in comments.

2 Comments

Works perfectly. Thank you so much!
buffer has been replaced with memoryview in Python 3.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.