5

I have a .csv file with a header row like so;

headerA,headerB,headerC
bill,jones,p
mike,smith,f
sally,silly,p

I'd like to filter out any records with the f value in the headerC column.

Can I do that with sed or awk?

2
  • 2
    Note that CSV may contain embedded line breaks, so any pure line-based solution might do wrong things with certain inputs. Furthermore, quoted values may pose problems with plenty of naïve approaches to the problem. Commented Jun 22, 2011 at 12:34
  • 1
    @Joey, right. Usual recommendation is to use a language with a dedicated CSV library, such as Perl Commented Jun 22, 2011 at 12:39

5 Answers 5

8

If header does not contains only f at the third columns name:

sed '/,f$/d' FILE

will do (deletes every line from the input if it ends with ,f).

If it has, I'd go with:

sed -n -e '1p;/,[^f]$/p' FILE

(Does not print anything by default (-n) but the 1st line must 1p, and if the lines are ends with other char than f... Note: this will not work, if the 3rd columnc contains more than one char.)

And an awk one:

awk -F, 'NF == 1 ; NF > 1 && $3 != "f"' FILE

(This always prints the first line (NF == 1 is true, then default action, which is print $0, then the next condtitions are checking if we had got over the 1st line, and the 3rd field is not f then default action...)

HTH

Sign up to request clarification or add additional context in comments.

4 Comments

Your second sed solution will break if the 3rd column contains >1 char. Better to stick with the 1st sed or awk as it implements the requirements more precisely (delete line if "f")
According to the "specification": "I'd like to filter out any records with the f value in the headerC column." So it's correct IMO.
If the 3rd column contains "ab", that does not match /,[^f]$/ so it will be filtered.
You were right @glennjackman if the 3rd column is longer than 1 char, it will not be printed, updating the desc.
3

well, if you know that headerC is always in the third column, the following sed command would work:

sed -r '/[^,]+(,[^,]+){1},f/ d' < file.csv > filefiltered.csv

And the following awk command does the same:

awk 'BEGIN {FS=","} {if($3 != "f") print}' file.csv

If you don't know headerC is always in a particular column it gets a little more tricky. Does this work?

2 Comments

The awk command can be simplified: awk -F, '$3 != "f"' file.csv
@glenn it can indeed. but I never bothered to look up if -F was a gnu extension or not, so i just went with the safest. I'll take that to mean it isn't :)
2

grep works, look at example.

grep ",.*,.*f" << EOF
headerA,headerB,headerC
bill,josef,p
mike,smith,f
sally,silly,p
EOF

outputs:

mike,smith,f

1 Comment

Nice, clean and quick (ps. don't need the final .*)
1

A bit unclear, is this what you are asking for?

$ awk -F, '{ if($3 == "f")print}' input
mike,smith,f

With a header and formatted using column

$ awk -F, '{ if (NR == 1)print}{if($3 == "f")print}' input | column -t -s,
headerA  headerB  headerC
mike     smith    f

Comments

-2

no need for sed or awk, this can be done with more simpler commands like cut and grep piped together like this

cut -d"," -f 3| grep -i f

I am assuming the delimiter is coma and Column c is thrid one. if it si not change the values above appropriately. And i have used grep with i option so that it ignore case. If you want to match only lowercse f or upppercase f then remove the i option and change it accordingly.

1 Comment

That will only output values from the 3rd field, not the whole line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.