0

I am trying to import huge .csv file into the database. It is Drupal database, I am importing products into the node, content_type_ and uc_products tables. But this is not so relevant as due to the performance optimization I stopped using Drupal functions node_load and/or node_save and I insert/update records directly via SQL commands.

Scenario:

  1. Read whole csv file from local server (approx. 40MB) into the variable (to prevent accessing the file per each record)
  2. Parse it to array per line
  3. Parse every line to fields

  4. Check if the record in database exists (1 SQL select, compare particular field)

  5. If exists, update it (1 SQL update)
  6. If does not exists, insert it (3 inserts, 1 update)

    6.1. If record has image code and it is different from the one in the database, download (curl) the new image file from external server and save it at the local server.

  7. 1 additional delayed insert to the log table

When I run this script, i got 502 Proxy error (Reason: Error reading from remote server) after approx. 10k imported records and the page stops loading. However, the import continues (new records are still being added into tables). This stops after 20k-26k (tried several times) records.

Two questions: What's that Proxy error and why am I getting one? How to optimize my script even more? Is there some other, usual way to import huge amount of data into the MySQL?

Thanks a lot.

2
  • possible duplicate of Import tool Excel CSV Commented Jan 11, 2012 at 14:10
  • 1
    For just the SQL stuff, I had to do something similar 2 days ago - over 600,000 records from a CSV file. What I did was write a script to convert the CSV to INSERT statements (in your case, you could use conditional inserts either using a WHERE clause or EXISTS). I ended up with a 47m SQL text file that I then used with mysql -uuser -ppass dbname <file.sql. Commented Jan 11, 2012 at 14:17

2 Answers 2

0

I think that you have max_request_body_size limit on your squid proxy(or whatever). You should contact proxy administrator to allow bigger POST request body, or divide to smaller parts of CSVs and repeat step for each one.

Or maybe you can check this post : https://serverfault.com/questions/185894/proxy-error-502-reason-error-reading-from-remote-server-with-apache-2-2-3-de

Sign up to request clarification or add additional context in comments.

Comments

0

This sounds like a proxy server in between you and the php code (very possibly the reverse proxy on the server if that is your setup) is giving up on waiting for a response from your script.

While you can probably adjust those timeouts, personally I would try decoupling very long actions like this from the http lifecycle. Not a php developer so don't know what the common pho approaches are.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.