1

I'm trying my best to learn PHP and hack things out myself. But this part has me stuck.

I have two CSV files with hundreds of rows each.

CSV 1 looks like this:

name, email, interest

CSV 2 looks like this:

email only

I'm trying to write a script to compare the two files looking for duplicates. I only want to keep the duplicates. But as you can see, CSV 2 only contains an email. If an email in CSV 1 DOES NOT EXIST in CSV 2, then the row containing that email in CSV 1 should be deleted.

The end result can either overwrite CSV 1 or create a fresh new file called "final.csv"... whatever is easiest.

I would be grateful for the help.

I tried something along these lines with no luck:

egrep -v $(cat csv2.csv | tr '\n' '|' | sed 's/.$//') csv1.csv

and

grep -v -f csv22.csv csv1.csv >output-file

cheers,

marc

3
  • I just added a couple of examples that I tried with no luck to my original post. Commented Jan 24, 2014 at 0:22
  • 1
    well that's not php unless you're running something like exec() ... Commented Jan 24, 2014 at 0:24
  • Being newer to this, I am not aware of the PHP equivalent to grep... but maybe that's not the best method in my case. I will be running this PHP script via cron on a daily basis. Commented Jan 24, 2014 at 0:25

2 Answers 2

2

Here is a script that will loop through both files and output a 3rd file where email addresses in file2 are found in file1.

if (($file3 = fopen("file3.csv", "w")) !== FALSE) {
  if (($file1 = fopen("file1.csv", "r")) !== FALSE) {
    while (($file1Row = fgetcsv($file1)) !== FALSE) {
      if (($file2 = fopen("file2.csv", "r")) !== FALSE) {
        while (($file2Row = fgetcsv($file2)) !== FALSE) {
          if ( strtolower(trim($file2Row[0])) == strtolower(trim($file1Row[1])) )
            fputcsv($file3, $file1Row);             
        }
        fclose($file2);
      }
    }
    fclose($file1);
  }
  fclose($file3);
}

Couple of notes:

  • You may need to provide some additional arguments to fgetcsv, depending on how your csv is structured (e.g. delimiter, quotes)
  • Based on how you listed the contents of each file, this code reads the 2nd column of file1, and the 1st column of file2. If that's not really how they are positioned, you will need to change the number in the bracket for $file1Row[1] and $file2Row[0]. Column # starts at 0.
  • Script is current set to overwrite if file3.csv exists. If you want it to append instead of overwrite, change the 2nd argument of the $file3 fopen to "a" instead of "w"

Example:

file1.csv:

john,[email protected],blah
mary,[email protected],something
jane,[email protected],blarg
bob,[email protected],asdfsfd

file2.csv

[email protected]
[email protected]

file3.csv (generated)

mary,[email protected],something
bob,[email protected],asdfsfd
Sign up to request clarification or add additional context in comments.

4 Comments

wow, this is awesome... so generous I will spend some time today testing it out and let you know. so awesome!
So far this isn't working. I even tried using two single column CSV files just in case the commas were throwing things off. What happens is that the output file3.csv is being written to but it's not writing anything. I can tell it's being written by looking at the modified date after each run. But no data. So the script appears to run, just not grabbing the data and adding it to file3... any thoughts?
@MarcB my only suspicion is that, as i pointed out, you will probably need to supply additional arguments to fgetcsv based on how your csv files are actually formatted.
Would you be available as a paid consultant? I can't find your contact details. Much thanks.
0

Solved! The problem was with Mac line breaks. Look at the code below to see the additions at the beginning and end of the code to fix that problem. Thank you Crayon Violent for all of your help!

ini_set('auto_detect_line_endings',TRUE);
if (($file3 = fopen("output.csv", "w")) !== FALSE) {
  if (($file1 = fopen("dirty.csv", "r")) !== FALSE) {
    while (($file1Row = fgetcsv($file1)) !== FALSE) {
      if (($file2 = fopen("clean.csv", "r")) !== FALSE) {
        while (($file2Row = fgetcsv($file2)) !== FALSE) {
          if ( strtolower(trim($file2Row[0])) == strtolower(trim($file1Row[1])) )
            fputcsv($file3, $file1Row);             
        }
        fclose($file2);
      }
    }
    fclose($file1);
  }
  fclose($file3);
}
ini_set('auto_detect_line_endings',FALSE);

5 Comments

aha, yeah that'd do it too. glad you sorted it!
fyi you shouldn't need that last line. setting something with ini_set()will only affect stuff that happens within the specific script when it's run. IOW it doesn't make any permanent changes in your core php.ini file or anything, nor does it affect any other script(s) that happen to be running at the same time.
Thanks again... how would I contact you for paid work? I have other things coming up soon.
LoL well.. I do have a full time job and all that; haven't really freelanced in years. I just like helping out on sites like these to pay it forward and keep my knowledge current. Having said that.. I'm not opposed to at least evaluating offers that might come my way, and possibly taking them on if they are short and sweet enough. You can drop me a line at crayonviolent at phpfreaks.com or crayonviolent at gmail.com if you feel inclined.
cheers! have a great weekend and thank you again for your kindness.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.