3

I have a php script that is reading a remote CSV file, and adding products to a database based on the contents of the CSV file. At present there are about 2800 lines (products) but the script keeps stopping at line 1388.

The code I used is as follows:

while(($data = fgetcsv($fopen, 0, ",")) !== false):
  //stuff is done here...
endwhile;

I have set the php memory limit to 64M and even tried 128M. I also set the max_execution_time to 60mins. I have also tried altering the code as follows:

while(($data = fgetcsv($fopen, 1000, ",", '\r')) !== false):
  //stuff is done here...
endwhile;

That DID result in more lines being parsed, BUT the data was then incorrect, i.e. image columns were becoming description columns etc. I assume that has to do with adding \r as my line ending. I tried \n, no luck. Lastly, I also added the auto_detect_line_endings as true in the ini.

Can anyone suggest reasons as to why my data is being cut short?

Regards, Simon

EDIT

I have noticed something interesting. I have a MySQL insert on each line that is looped over in the above code. Now, the last record in my database is the FIRST row in the CSV file, does this mean the file is being parsed from the last line up??

These seem to be the rows at or near the break:

W-3066,  I Love Love Cheap And Chic,     Moschino, 3.4 oz,EDT Spray,Women,,"Introduced by the design house of Moschino, I love love has a blend of grapefruit, orange, lemon, red currant, tea rose, cinnamon leaves, musk, cedar and tonka wood. It is recommended for daytime wear.",http://www.perfume-worldwide.com/products/Women/Final/W-3066large.jpg,0,0,0,8011003991457
W-3070,  Adidas Floral Dream,            Adidas,   1.7 oz,EDT Spray,Women,,"Introduced in 2008, the notes are bergamot, lily, rose, tonka bean and vanilla.",http://www.perfume-worldwide.com/products/Women/Final/W-3070large.jpg,0,0,0,3412244310024
W-3071,  Adidas Fruity Rhythm,           Adidas,   1.7 oz,EDT Spray,Women,,"Introduced in 2008, the notes are black currant, raspberry, cyclamen, freesia and musk.",http://www.perfume-worldwide.com/products/Women/Final/W-3071large.jpg,0,0,0,3412244510004

SOLUTION

As it turns out, it worked out a lot better for me to copy the file to my server, and work off the copy. The steps I followed are as follows:

  • I read the contents of the remote file using file_get_contents()
  • I then used iconv() function to re-encode data to UTF-8
  • I made a temp file using fopen(), fwrite() and fclose() functions, contents of the file was the encoded data above
  • I set the permissions of the file to 0750 using the chmod() function
  • I then applied the fgetcsv() function to my temp file
  • Did all that needed to be done
  • Deleted the temp file once done, using unlink() function

That did the trick. So, I suspect half the issue was actually the remote server timing out, and the other half encoding issues.

Thank you to everyone for all the nudges in the right direction

5
  • It seems that your CSV can contain raw binary image data (into the "image columns" you are talking about)... do you confirm that ? Commented Apr 7, 2011 at 10:41
  • @Frosty - No, the image column is simply a code, e.g w-12345 Commented Apr 7, 2011 at 10:51
  • Regards your edit, we would really need to see a sample of the CSV, we can't fix something if we don't know the cause. Commented Apr 7, 2011 at 11:03
  • Source code is at pastebin.com/fLngbWYu Commented Apr 7, 2011 at 11:29
  • There is no need for the [SOLVED] Prefix, if an answer is marked as correct it is distinguished as a different colour. Commented Apr 8, 2011 at 9:44

2 Answers 2

2

Firstly i have some questions for you:

  • What is on line 1388, 1388 and 1389
  • Is there any errors being outputted
  • When you reach the final line, do you get an ($data[0] === null)

You information regarding the memory limit would probably not be the issue that's causing it, as fgetcsv reads a single line per iteration, there is only ever 1 line's worth of data in the memory at one time.

Within your lop if your keep placing data into an array, or concatenating them together. this may cause a memory leak but you would have to show more in depth code

A CSV File has to be pretty structured for the fgetcsv to be able to parse it correctly, some rules to remember when using CSV Files:

  • The first line must always be the column names
  • All other lines are the data lines:
    • Each element should be separated by a ,
    • If a element contains a space or a comma,'\n','\r','\r\n', it should be wrapped in double quotes

An example of a valid CSV File should be like so:

id, firstname, lastname, age, profile_description
0,  Robert,    Pitt,     22,  "this string has spaces, and has a comma"

You should validate the the structure is correct, if it is not correct then you should fix this until the parse is able to read the data correctly, you can then cleanly place the data into a new CSV File taking care of all the little incorrect structures.

Sign up to request clarification or add additional context in comments.

9 Comments

@Robert - I will take a look at that line, as well as output the data of the last line to see if it is null, should it be? The problem is I have no control over the data, the CSV file is dynamically generated every night and populated from database records. I will take a look now and revert to you
If the first element of the array is null, this signifies that fgetcsv was unable to parse the line due to some syntax issue such as the ones described above, if you can please supply the 3 lines stated so we can take a look at them.
@Robert - Three lines are above, I am waiting for the php script to finish running after which I will paste output of last parsed line. Thanks for all the help
Seem's that the CSV File is fine too me, it parses 2.6K Lines fine so it must be something else causing the issue try changing the errors to On >> ini_set("display_errors", "On");
@robert - Interestingly enough, when I opened the CSV file in OopenOffice.org and saved as CSV again, it parsed all the lines, but data is still erratic. I wonder if perhaps it's an encoding issue?
|
2

is the file correctly formated? have you tried to open the file it in some csv reader in which you can specify delimiters and end lines)? Judging by this:

That DID result in more lines being parsed, BUT the data was then incorrect, i.e. image columns were becoming description columns etc

I would assume that data maybe is corrupted (i.e. some description had comma, endline, etc) It happneds if data is generated dynamically and not formatted correctly.

open in txt editor as well (i.e notepad++) and see how that goes/looks..

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.