Split CSV and add headers and index column with awk

Question

I am trying to split a large CSV into smaller files based on date using awk. I have the essential command working although it returns a "too many open files" error. So I've read to close the file, yet the command as written closes it after only one row is written to each file.

awk -F' ' '{close($1".csv")}{print > ($1".csv")}' 2015full.csv

Additionally I would like to add a header row to each split file and an index column. My data looks like:

2015full.csv

2015-12-24 18:20:57 -87.2788204 36.5984675 0
2015-12-24 18:20:42 -87.2784049 36.597298699999996 0
2015-12-24 18:20:26 -87.274402 36.5932405 0
2015-12-23 18:20:10 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:40 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:21 -87.25762519999999 36.572330400000006 0

And I'm trying to get:

2015-12-24.csv

num date time lon lat
1 2015-12-24 18:20:57 -87.2788204 36.5984675
2 2015-12-24 18:20:42 -87.2784049 36.597298699999996
3 2015-12-24 18:20:26 -87.274402 36.5932405

2015-12-23.csv

num date time lon lat
1 2015-12-23 18:20:10 -87.25762519999999 36.572330400000006
2 2015-12-23 18:19:40 -87.25762519999999 36.572330400000006
3 2015-12-23 18:19:21 -87.25762519999999 36.572330400000006

I may have the right pieces below:

awk -F' ' 'NR==1{print “num”, $0; “date”, $1; “time”, $2; “lon”, $3; “lat”, $4; next}{print (NR-1), $0}{close($1".csv")}{print > ($1".csv")}' 2015full.csv

but they are not in an order that creates a working command for my purposes. Does anyone have a suggestion for me? Thanks!

If the file is sorted, don't close the file until you get to a new date. If it's not sorted, consider sorting it. — jas
– jas, Commented Dec 28, 2017 at 23:37

RavinderSingh13 · Accepted Answer · 2017-12-28 23:44:02Z

3

Though you are closing the files in backend but when $1's value will change at that time previous $1's valued .csv file will be still opened in backend so rather than closing them on each line, try following and let me know if this helps you(considering that your Input_file is sorted as per shown sample, if not then we have to sort it by 1st column and pipe it to awk command).

awk -F' ' 'prev!=$1{close(prev".csv")}{print > ($1".csv");prev=$1}' 2015full.csv

EDIT: I could see you need a header in each $1 output file, if so then following code may help you in same too.

awk -F' ' 'prev!=$1{close(prev".csv");print "num date time lon lat" > ($1".csv")}{print > ($1".csv");prev=$1}' 2015full.csv

edited Dec 28, 2017 at 23:44

answered Dec 28, 2017 at 23:36

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Luteser Over a year ago

Works well, but I didn't draw enough attention in the original post to the fact that I need an index column also...

Ed Morton · Accepted Answer · 2017-12-29 00:27:45Z

0

awk '
    BEGIN { hdr = "num" OFS "date" OFS "time" OFS "lon" OFS "lat" }
    $1!=prev { close(out); out=$1".csv"; print hdr > out; idx=0; prev=$1 }
    { print ++idx, $0 > out }
' 2015full.csv

answered Dec 29, 2017 at 0:27

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

2 Comments

Charlie Fish Over a year ago

While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.

yivi Over a year ago

While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.

once · Accepted Answer · 2017-12-29 02:10:44Z

0

if not using awk

for i in $(cut -d ' ' -f1 2015full.csv|uniq);do grep -w $i 2015full.csv|nl -w1 -s ' ' |sed "1i num date time lon lat" >$i.csv; done

answered Dec 29, 2017 at 2:10

once

1,3994 gold badges25 silver badges34 bronze badges

Collectives™ on Stack Overflow

Split CSV and add headers and index column with awk

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related