1

how do you parse a csv file into a mysql database with php, From bits I have read from php documents I am not sure how or if its possible to add conditions to the lines that you are reading into the database so that only values that meet the condition are passed into the database.

for instance I currently have a csv file that looks like this

0001D,5879
0001E,0521
0001F,4587
0001G,2220
0001H,2482
0001I,9087
0001J,2255
0001K,2247

the codes before the comma are supposed to only be hex values, what I would like to do is exclude any line (so hex value and its code) that dose not represent a hex code (0-9)-(A-F)... any idea on how the could be accomplished?

7
  • 1
    Why don't you just use something like sed to preprocess the file and then use MySQL's built-in functionality to load the resulting CSV? Commented Jun 27, 2011 at 3:06
  • You might find some help getting started from the answers in this thread, PHP not reading the entire CSV file. Commented Jun 27, 2011 at 3:20
  • hrmm, I think its related to an experience thing.. I'm still new to this stuff so obviously have come to this conclusion purely on a lack of experience... :( do you have an example or explanation of how I could attempt to achieve my goal with your process? Commented Jun 27, 2011 at 3:22
  • @C.Johns may I ask which version of PHP you're using? The 5.2.x branch or the 5.3.x branch? Commented Jun 27, 2011 at 3:31
  • I am using MAMP 10.6 which runs PHP 5.2.13 & 5.3.2.. according to the website mamp.info/en/mamp/index.html Commented Jun 27, 2011 at 3:40

2 Answers 2

0

You're far better off doing this without PHP if possible. Use mysqlimport to import the file, and then use:

-- assuming you use "ID" as your first column like the example 
-- on the mysqlimport page
DELETE FROM <tablename> WHERE CONV(CONV(ID, 16, 10), 10, 16) != 
                              TRIM(LEADING '0' FROM ID);

So, what's going on?

  1. First you've imported everything into a table.
  2. Then, you've looked through all of the columns which are supposed to be hex. If they cannot be converted to Hex and back (something which is only true of valid Hex numbers, everything else is truncated) and match the trimmed version of themselves (leading 0's need to be accounted for), they are not valid.

If you must use PHP, I would personally simply continually use MySQL's INSERT and I'd still call the above DELETE. I'll bet money that it's faster.

Sign up to request clarification or add additional context in comments.

4 Comments

okay cool, thanks for your suggestion, this was my original train of thought also however being open to suggestion I had my mind changed but with all the hassle I am going through trying to figure this out its starting to become more apparent that the way you have described to do things feels a little more nature and to a degree logical.
Just a quick update, this did't work, it deleted alot of numbers that wernt hex but also ones that were hex... however I do think there is a way to achieve the required outcome through this convention... just need to conform more to the type of codes I have in my database...... they range from 00001-10C1R, its a massive database which makes use of the whole alphabet where I only want to include numbers that conform to hex 0-9 a-f... I will keep looking.
well it kinda works. but drops a ton of hex codes off the list.. need to review this in-depth. I think the TRIM(LEADING '0'... is doing something I need to fix. almost there however.
okay so they both do the same thing... I guess its personal preference however this is easier as it sorts everything out in the database there is no fluffing around conforming everything in php and passing it over. again thanks very much.
0

This code uses the fgetcsv function to parse out one line at a time from a csv file, putting each delimited field into an array, and returning that array. With your example csv, the array returned will only contain two fields.

Next, after checking to make sure the return value from the fgetcsv isn't false or null, the code checks to see if the first field of the array contains any non-hex characters in the range [G-Z] using preg_match. If it doesn't, then it will put the two fields of the array (representing a line from you csv file) into a new array, which you can operate on later.

<?php
$fileName = "test.csv";
$hexLines = array();

if (($file = fopen($fileName, "r")) !== false)
{
    while (($line = fgetcsv($file, 0, ",")) !== false)
    {
        if (!empty($line))
        {
            // IF first field only has hex chars
            if (preg_match('/^[0-9A-F]+$/', $line[0]) === 1) 
            {
                $hexLines[] = array($line[0], $line[1]);
            }
        }
    }
    fclose($file);
}
print_r($hexLines);

By the way, the above code will work in both PHP versions 5.2.x and 5.3.x. If you were 100% sure that you were going to be using 5.3, I would've written code using SplFileObject instead, since it's object-oriented, and maybe a little easier to use than the above functions.

6 Comments

okay cool thanks man, give me abit of time before I accept your answer as I want to run through and understand exactly whats going on. Thanks for your reply.
I take issue with this line: if (preg_match('/[G-Z]+/', $line[0]) != 1). You should be more aggressive with your checks and whitelist instead of blacklist. You're better with a test for 0-9A-F.
@C.Johns you might want to check out cwallen's answer too, he might be right.
@cwallen you have to forgive me, I'm a little slow, I don't understand right away why what you suggest is better, is that something you can explain to me?
@cwallen never mind, I can see now how what you suggest is more safe, I'll update my answer to reflect that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.