0

I have a csv file with headers that sometimes have extra fields in a certain row. This is because there was a comma in the text field that was not escaped.

Is there a way to remove a row before converting into array?

Sample csv file:

CUST_NUMBER,PO_NUMBER,NAME,SERVICE,DATE,BOX_NUMBER,TRACK_NO,ORDER_NO,INV_NO,INV_AMOUNT
757626003,7383281,JACK SMITH,GND,20180306,1,1Z1370750453578430,2018168325,119348,70.70
757626003,7383282,GERALD SMITH, JR.,GND,20180306,1,1Z9R67670395033411,2018168326,119513,63.72
757626003,7383233,SCOTT R SMITH,GND,20180306,1,1Z1370750982624042,2018168329,119349,39.33

As you can see, row 3 has an extra field because Gilbert, JR. has a comma in the text field without being escaped which puts the JR. part of the name in the SERVICE column and knocks the GND field outside of the SERVICE column into a column without a heading.

I want to remove the entire row when the row has more fields than there are headers.

After the row is removed I will convert the remaining csv into an array with something like this.

<?
    $csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));

    $keys = array_shift($csv);

    foreach ($csv as $i => $row) {
        if(count($keys) == count($row)){
            $csv[$i] = array_combine($keys, $row);
        }
    }
?>
6
  • just unset($csv[$i]); when you find a bad row. Commented Mar 7, 2018 at 7:19
  • I tried that. It does not seem to work with the rest of my code. Commented Mar 7, 2018 at 7:20
  • That makes no sense? do you neex to re-index the array? $csv = array_values($csv); Commented Mar 7, 2018 at 7:21
  • I don't think I need to re-index. Show me how you would go about un-setting, because when I tried it, it did not work. I could not figure out how to unset the bad row. Commented Mar 7, 2018 at 7:25
  • @Scuzzy you are right. Adding the else part to the if statement with an unset on the $csv[$i] worked. The rest of my code didn't work because I needed to add another if statement somewhere else to run only if $csv[$i] is not empty. Commented Mar 7, 2018 at 7:39

3 Answers 3

1

As suggested by @Scuzzy unset the bad row

<?php
    $csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));

    $keys = array_shift($csv);

    foreach ($csv as $i => $row) {
        if(count($keys) == count($row)){
            $csv[$i] = array_combine($keys, $row);
        }
        else unset($csv[$i]);
    }
?>
Sign up to request clarification or add additional context in comments.

3 Comments

Yes that is what I did.
This removes the row. I have tested it by printing the array
Yes it works. I just have other code below it that didn't work right without an if statement when the row is unset.
1
<?php

$data=<<<DATA
NUMBER,NAME,SERVICE
7375536,Ron,GND
7369530,RANDY,GND
7383287,Gilbert, JR.,GND
7383236,SCOTT,GND
DATA;

$data = array_map('str_getcsv', explode("\n", $data));
$keys = array_shift($data);
$data = array_filter($data, function($v) {
    return count($v) == 3;
});

var_export($data);

Output:

array (
0 => 
array (
    0 => '7375536',
    1 => 'Ron',
    2 => 'GND',
),
1 => 
array (
    0 => '7369530',
    1 => 'RANDY',
    2 => 'GND',
),
3 => 
array (
    0 => '7383236',
    1 => 'SCOTT',
    2 => 'GND',
),
)

To use the column headings as keys:

$data = array_map(function($v) use ($keys) {
    return array_combine($keys, $v);
}, $data);

4 Comments

How would you edit your code so that the column headings are the keys in the array.
@Mike, you can go through each value and add the headings with array_combine. Added example above.
CUST_NUMBER,PO_NUMBER,NAME,SERVICE,DATE,BOX_NUMBER,TRACK_NO,ORDER_NO,INV_NO,INV_AMOUNT 757626003,7383281,JACK SMITH,GND,20180306,1,1Z1370750453578430,2018168325,119348,70.70 757626003,7383282,GERALD SMITH, JR.,GND,20180306,1,1Z9R67670395033411,2018168326,119513,63.72 757626003,7383233,SCOTT R SMITH,GND,20180306,1,1Z1370750982624042,2018168329,119349,39.33
in the above sample csv, how could the row actually be fixed instead of removed? Sometimes names have commas in them. GERALD SMITH, JR. Based on my comment stackoverflow.com/questions/49146116/…, is it possible to fix it?
1

Using array_filter allows you to remove the items you don't want by a callback. This version uses the $keys array as the test (same as you use), passing this into the callback using use...

$csv = array_map("str_getcsv", file("books.csv",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);

$output = array_filter($csv, function($row) use ($keys) {
    return count($row) == count($keys);
});
$output = array_values($output);
print_r($output);

So each row which doesn't have the same number of columns is removed.

I've just added the array_values() call to re-index the array.

If you could generate the file with surrounding quotes, this problem wouldn't be there...

NUMBER,NAME,SERVICE
7375536,Ron,GND
7369530,RANDY,GND
7383287,"Gilbert, JR.",GND
7383236,SCOTT,GND

You could surround any text field with quotes of your choice to make sure this isn't a problem in the future.

Alternative...

$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));

$keys = array_shift($csv);
$out = array();
foreach ($csv as $row) {
    if(count($keys) == count($row)){
        $out[] = array_combine($keys, $row);
    }
}

Last update: Just while I'm waiting to go out, tried the following. This tries to fix the data, so you get all the rows out of the file...

$out = array();
foreach ($csv as $row) {
    if(count($keys) != count($row)){
        $row = array_merge(array_slice($row, 0, 2),
                [implode(",", array_slice($row, 2, count($row)-9))],
                array_slice($row, count($row)-7));
    }
    $out[] = array_combine($keys, $row);
}

9 Comments

I don't create the csv, so no control about escaping the commas in the text fields. Your solution is good but it does not add the column headings as the keys in the array.
Difficult when you have to process data this way, I've added an alternative which does index the elements by the keys, just adds it to a new array rather than maintaining the old one.
Yes thank you. You're alternative is pretty much what I already was using.
Just a last update - last version of the code should fix the data rather than ignore it.
good try on the last update but this is what it output for the array that had the extra comma, 100 => false, and it gave this warning at the top, Warning: array_combine(): Both parameters should have an equal number of elements I don't see how it is possible to fix the broken row when I will never know when there will be a broken row and in what field the extra comma or comma's will be in.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.