4

I have a dataset with 1000 rows and 10 columns. Here is the sample dataset

A,B,C,D,E,F,
a,b,c,d,e,f,
g,h,i,j,k,l,
m,n,o,p,q,r,
s,t,u,v,w,x,

From this dataset I want to copy the rows whose has value of column A as 'a' or 'm' to a new csv file. Also I want the header to get copied.

I have tried using awk. It copied all the rows but not the header.

awk '{$1~/a//m/ print}' inputfile.csv > outputfile.csv

How can I copy the header also into the new outputfile.csv?

Thanks in advance.

1
  • 2
    There are literally a plethora of questions out there that are identical. The only difference is the condition or the field separator here, here, here, here, ... . This clearly shows that there is something wrong with this forum. Questions like this should not be answered anymore in an answer, but more in a comment. Why is there no duplicate for this? Commented Sep 30, 2019 at 7:42

5 Answers 5

2

Considering that your header will be on 1st row, could you please try following.

awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^a$|^m$/' Input_file > outputfile.csv

OR as per Cyrus sir's comment adding following:

awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^(a|m)$/' Input_file > outputfile.csv

OR as per Ed sir's comment try following:

awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv

Added corrections in OP's attempt:

  1. Added FS and OFS as , here for all lines since lines are comma delimited.
  2. Added FNR==1 condition which means it is checking 1st line here and printing it simply, since we want to print headers in out file. It will print very first line and then next will skip all further statements from here.
  3. Used a better regex for checking 1st field's condition $1 ~ /^a$|^m$/
Sign up to request clarification or add additional context in comments.

3 Comments

Slightly trimmed down: awk 'BEGIN{FS=","} NR==1 || $1~/^(a|m)$/' Input_file > outputfile.csv
More concisely: awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv
@EdMorton, Thank you sir for letting me know, added this solution in my post now, cheers.
2

This might work for you (GNU sed):

sed '1b;/^[am],/!d' oldFile >newFile

Always print the first line and delete any other line that does not beging a, or m,.

Alternative:

awk 'NR==1 || /^[am],/' oldFile >newFile

Comments

1

With awk. Set field separator (FS) to , and output current row if it's first row or if its first column contains a or m.

awk 'NR==1 || $1=="a" || $1=="m"' FS=',' in.csv >out.csv

Output to out.csv:

A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,

Comments

1
$ awk -F, 'BEGIN{split("a,m",tmp); for (i in tmp) tgts[tmp[i]]} NR==1 || $1 in tgts' file
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,

Comments

-1

It appears that awk's default delimiter is whitespace. Link

Changing the delimiter can be denoted by using the FS variable:

awk 'BEGIN { FS = "," } ; { print $2 }'

4 Comments

IMHO this is not taking are of conditions mentioned by OP neither it is having condition for printing first line.
You're not wrong. This is just copied and pasted from the documentation shown here. I'm not familiar enough with awk to do more than check the obvious, like delimiters and table shape. Your solution checks all the boxes.
Not an issue, we all are here to learn.I just thought in case you missed it so mentioned it, cheers, have a great day ahead.
@HelpfulHound : I'd recommend posting such information as a comment to a question rather than an answer. Answers are expected to deal with th main issues described in the OP. (You acknowledge that was not your intent). Others are likely to downvote your answer (because it's not an answer) thereby costing you valuable reputation points. A well written comment, so please keeping reading, commenting and answering as appropriate. Welcome aboard! Good luck to all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.