Filtering rows based on column values of csv file

Question

I have a dataset with 1000 rows and 10 columns. Here is the sample dataset

A,B,C,D,E,F,
a,b,c,d,e,f,
g,h,i,j,k,l,
m,n,o,p,q,r,
s,t,u,v,w,x,

From this dataset I want to copy the rows whose has value of column A as 'a' or 'm' to a new csv file. Also I want the header to get copied.

I have tried using awk. It copied all the rows but not the header.

awk '{$1~/a//m/ print}' inputfile.csv > outputfile.csv

How can I copy the header also into the new outputfile.csv?

Thanks in advance.

There are literally a plethora of questions out there that are identical. The only difference is the condition or the field separator here, here, here, here, ... . This clearly shows that there is something wrong with this forum. Questions like this should not be answered anymore in an answer, but more in a comment. Why is there no duplicate for this? — kvantour
– kvantour, Commented Sep 30, 2019 at 7:42

RavinderSingh13 · Accepted Answer · 2019-09-30 05:49:55Z

2

Considering that your header will be on 1st row, could you please try following.

awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^a$|^m$/' Input_file > outputfile.csv

OR as per Cyrus sir's comment adding following:

awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^(a|m)$/' Input_file > outputfile.csv

OR as per Ed sir's comment try following:

awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv

Added corrections in OP's attempt:

Added FS and OFS as , here for all lines since lines are comma delimited.
Added FNR==1 condition which means it is checking 1st line here and printing it simply, since we want to print headers in out file. It will print very first line and then next will skip all further statements from here.
Used a better regex for checking 1st field's condition $1 ~ /^a$|^m$/

edited Sep 30, 2019 at 5:49

answered Sep 29, 2019 at 15:13

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Cyrus Over a year ago

Slightly trimmed down: awk 'BEGIN{FS=","} NR==1 || $1~/^(a|m)$/' Input_file > outputfile.csv

Ed Morton Over a year ago

More concisely: awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv

RavinderSingh13 Over a year ago

@EdMorton, Thank you sir for letting me know, added this solution in my post now, cheers.

potong · Accepted Answer · 2019-09-30 08:13:33Z

2

This might work for you (GNU sed):

sed '1b;/^[am],/!d' oldFile >newFile

Always print the first line and delete any other line that does not beging a, or m,.

Alternative:

awk 'NR==1 || /^[am],/' oldFile >newFile

edited Sep 30, 2019 at 8:13

answered Sep 29, 2019 at 19:16

potong

59.3k6 gold badges55 silver badges92 bronze badges

Comments

Cyrus · Accepted Answer · 2019-09-29 16:50:43Z

1

With awk. Set field separator (FS) to , and output current row if it's first row or if its first column contains a or m.

awk 'NR==1 || $1=="a" || $1=="m"' FS=',' in.csv >out.csv

Output to out.csv:

A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,

edited Sep 29, 2019 at 16:50

answered Sep 29, 2019 at 16:40

Cyrus

90.2k15 gold badges112 silver badges173 bronze badges

Comments

Ed Morton · Accepted Answer · 2019-09-29 23:04:18Z

1

$ awk -F, 'BEGIN{split("a,m",tmp); for (i in tmp) tgts[tmp[i]]} NR==1 || $1 in tgts' file
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,

answered Sep 29, 2019 at 23:04

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Comments

HelpfulHound · Accepted Answer · 2019-09-29 15:14:53Z

-1

It appears that awk's default delimiter is whitespace. Link

Changing the delimiter can be denoted by using the FS variable:

awk 'BEGIN { FS = "," } ; { print $2 }'

answered Sep 29, 2019 at 15:14

HelpfulHound

3261 gold badge2 silver badges9 bronze badges

4 Comments

RavinderSingh13 Over a year ago

IMHO this is not taking are of conditions mentioned by OP neither it is having condition for printing first line.

HelpfulHound Over a year ago

You're not wrong. This is just copied and pasted from the documentation shown here. I'm not familiar enough with awk to do more than check the obvious, like delimiters and table shape. Your solution checks all the boxes.

RavinderSingh13 Over a year ago

Not an issue, we all are here to learn.I just thought in case you missed it so mentioned it, cheers, have a great day ahead.

shellter Over a year ago

@HelpfulHound : I'd recommend posting such information as a comment to a question rather than an answer. Answers are expected to deal with th main issues described in the OP. (You acknowledge that was not your intent). Others are likely to downvote your answer (because it's not an answer) thereby costing you valuable reputation points. A well written comment, so please keeping reading, commenting and answering as appropriate. Welcome aboard! Good luck to all.

Collectives™ on Stack Overflow

Filtering rows based on column values of csv file

5 Answers 5

3 Comments

Comments

Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related