csv file filtering

Question

I have a .csv file with a header row like so;

headerA,headerB,headerC
bill,jones,p
mike,smith,f
sally,silly,p

I'd like to filter out any records with the f value in the headerC column.

Can I do that with sed or awk?

Note that CSV may contain embedded line breaks, so any pure line-based solution might do wrong things with certain inputs. Furthermore, quoted values may pose problems with plenty of naïve approaches to the problem. — Joey
– Joey, Commented Jun 22, 2011 at 12:34
@Joey, right. Usual recommendation is to use a language with a dedicated CSV library, such as Perl — glenn jackman
– glenn jackman, Commented Jun 22, 2011 at 12:39

Zsolt Botykai · Accepted Answer · 2011-06-23 06:29:46Z

8

If header does not contains only f at the third columns name:

sed '/,f$/d' FILE

will do (deletes every line from the input if it ends with ,f).

If it has, I'd go with:

sed -n -e '1p;/,[^f]$/p' FILE

(Does not print anything by default (-n) but the 1st line must 1p, and if the lines are ends with other char than f... Note: this will not work, if the 3rd columnc contains more than one char.)

And an awk one:

awk -F, 'NF == 1 ; NF > 1 && $3 != "f"' FILE

(This always prints the first line (NF == 1 is true, then default action, which is print $0, then the next condtitions are checking if we had got over the 1st line, and the 3rd field is not f then default action...)

HTH

edited Jun 23, 2011 at 6:29

answered Jun 22, 2011 at 12:30

Zsolt Botykai

52k14 gold badges90 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

glenn jackman Over a year ago

Your second sed solution will break if the 3rd column contains >1 char. Better to stick with the 1st sed or awk as it implements the requirements more precisely (delete line if "f")

Zsolt Botykai Over a year ago

According to the "specification": "I'd like to filter out any records with the f value in the headerC column." So it's correct IMO.

glenn jackman Over a year ago

If the 3rd column contains "ab", that does not match /,[^f]$/ so it will be filtered.

Zsolt Botykai Over a year ago

You were right @glennjackman if the 3rd column is longer than 1 char, it will not be printed, updating the desc.

Michael Lowman · Accepted Answer · 2011-06-22 12:27:38Z

3

well, if you know that headerC is always in the third column, the following sed command would work:

sed -r '/[^,]+(,[^,]+){1},f/ d' < file.csv > filefiltered.csv

And the following awk command does the same:

awk 'BEGIN {FS=","} {if($3 != "f") print}' file.csv

If you don't know headerC is always in a particular column it gets a little more tricky. Does this work?

answered Jun 22, 2011 at 12:27

Michael Lowman

3,0881 gold badge22 silver badges37 bronze badges

2 Comments

glenn jackman Over a year ago

The awk command can be simplified: awk -F, '$3 != "f"' file.csv

Michael Lowman Over a year ago

@glenn it can indeed. but I never bothered to look up if -F was a gnu extension or not, so i just went with the safest. I'll take that to mean it isn't :)

Neppord · Accepted Answer · 2012-12-13 15:04:46Z

2

grep works, look at example.

grep ",.*,.*f" << EOF
headerA,headerB,headerC
bill,josef,p
mike,smith,f
sally,silly,p
EOF

outputs:

mike,smith,f

answered Dec 13, 2012 at 15:04

Neppord

435 bronze badges

1 Comment

Taylored Web Sites Over a year ago

Nice, clean and quick (ps. don't need the final .*)

Fredrik Pihl · Accepted Answer · 2011-06-22 12:34:34Z

1

A bit unclear, is this what you are asking for?

$ awk -F, '{ if($3 == "f")print}' input
mike,smith,f

With a header and formatted using column

$ awk -F, '{ if (NR == 1)print}{if($3 == "f")print}' input | column -t -s,
headerA  headerB  headerC
mike     smith    f

edited Jun 22, 2011 at 12:34

answered Jun 22, 2011 at 12:29

Fredrik Pihl

45.9k7 gold badges89 silver badges133 bronze badges

Comments

bagavadhar · Accepted Answer · 2011-06-22 12:29:30Z

-2

no need for sed or awk, this can be done with more simpler commands like cut and grep piped together like this

cut -d"," -f 3| grep -i f

I am assuming the delimiter is coma and Column c is thrid one. if it si not change the values above appropriately. And i have used grep with i option so that it ignore case. If you want to match only lowercse f or upppercase f then remove the i option and change it accordingly.

answered Jun 22, 2011 at 12:29

bagavadhar

1298 bronze badges

1 Comment

glenn jackman Over a year ago

That will only output values from the 3rd field, not the whole line.

Collectives™ on Stack Overflow

csv file filtering

5 Answers 5

4 Comments

2 Comments

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

2 Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related