0

I have a big data set into MySQL (users, companies, contacts)? about 1 million records.

And now I need to make import new users, companies, contacts from import file (csv) with about 100000 records. I records from file has all info for all three essences (user, company, contacts). Moreover on production i can't use LOAD DATA (just do not have so many rights :( ).

So there are three steps which should be applied to that data set. - compare with existing DB data - update it (if we will find something on previous step) - and insert new, records

I'm using php on server for doing that. I can see two approaches:

  • reading ALL data from file at once and then work with this BIG array and apply those steps.
  • or reading line by line from the file and pass each line through steps

which approach is more efficient ? by CPU, memory or time usage

Can I use transactions ? or it will slow down whole production system ?

Thanks.

2
  • I don't think you need to find the most efficient method for doing this. For 100K records, it will take at most 20 - 30 seconds and you probably won't need to insert those records again... Commented May 14, 2012 at 8:42
  • are you kidding ? i have implemented 1st approach and it takes so much time, i'm sure you can't imagine how ling it is working on ;) Commented May 14, 2012 at 10:52

3 Answers 3

2

CPU time/time there won't be much in it, although reading the whole file will be slightly faster. However, for such a large data set, the additional memory required to read all records into memory will vastly outstrip the time advantage - I would definitely process one line at a time.

Sign up to request clarification or add additional context in comments.

3 Comments

Agreed. And use transactions if atomicity is required.
but in this case transaction should be started before and finished after line was in used, right ?
@user1016265 Depends what you are doing. If certain rows refer to other rows within the same data set you probably want to wrap all lines in single transaction, or at least group rows that refer to each other in a single transaction (you would probably need at least a two-pass approach for this). If there are no references to the same table and no circular foreign keys, one transaction per row would probably be acceptable.
0

Did you know that phpMyAdmin has that nifty feature of "resumable import" for big SQL files ?

Just check "Allow interrupt of import" in the Partial Import section. And voila, PhpMyAdmin will stop and loop until all requests are executed.

It may be more efficient to just "use the tool" rather than "reinvent the wheel"

3 Comments

How i can import something with help phpMyAdmin into three different tables from one single import file ?
@user1016265 phpMyAdmin will try to create tables and even a database, but it cannot decide when the table 'users' ends and when the table 'companies' begins. See 3.18 When I import a CSV file that contains multiple tables, they are lumped together into a single table.
i know that, but you solution can't work in my case. Thank you
0

I think, 2nd approach is more acceptable:

  1. Create change list (it would be a separate table)
  2. Make updates line by line (and mark each line as updated using "updflag" field, for example)
  3. Perform this process in background using transactions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.