0

I have a text file, containing around 662,000 lines. I want to move each line in this text file into my database using sqlite3. Each line has two components, a key and a company name. Company name goes in one column, and a key in another.

Code below:

def input_txt_to_db():
    with open(our_txt_file) as f:
        for line in f:
            # Format each line 
            curr_comp_name = str(line.rsplit(':')[0])
            curr_comp_key = str(line.rsplit(':')[-2])
            # Create object of the line
            curr_comp = Company(curr_comp_name, curr_comp_key)
            # Insert company is a self-made method, listed below
            insert_company(curr_comp)

def insert_company(comp):
    """

    :param comp: Company (object)
    :return: None
    """
    with conn:
        conn_cursor.execute("INSERT INTO companies VALUES "
                            "(:name, :key)",
                            {'name': comp.name,
                             'key': comp.key
                             })

Now this all works, and I've check the db to see, and it uploaded properly. However, once it gets to say 60k lines, it crashes. It gives me some error like, OS error, or something like that. Also note, I have more than enough space for this db.

1 Answer 1

1

This does not seem to be the most efficient way to upload data anyway. How about uploading data portion by portion with executemany?

def insert_companies(comps):
    with conn:
        conn_cursor.executemany("INSERT INTO companies VALUES (?, ?)", comps)

We have to redefine the main function a bit. Let's get rid of the objects, we don't need them now anyway, right?

def input_txt_to_db():
    with open(our_txt_file) as f:
        batch = list()
        # How many companies do we dump to db at once?
        batch_size = 2000
        for line in f:
            # Format each line 
            curr_comp_name = str(line.rsplit(':')[0])  # why str? it should be string as it is
            curr_comp_key = str(line.rsplit(':')[-2])
            # Create object of the line
            batch.append((curr_comp_name, curr_comp_name))
            # Insert company is a self-made method, listed below
            if len(batch) == batch_size:
                 insert_companies(batch)
                 batch = list()
        # something may be still pending
        if batch:
            insert_companies(batch)

Try it, it should work. If you give some more information on the error that occurs, it might help as well, because now there isn't enough context to definitively answer your question.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, this is something I was looking for. I'll try it tomorrow and get back to you.
This worked perfectly. Batching is exactly what I was looking for, thank you again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.