4

I am trying to split a large CSV into smaller files based on date using awk. I have the essential command working although it returns a "too many open files" error. So I've read to close the file, yet the command as written closes it after only one row is written to each file.

awk -F' ' '{close($1".csv")}{print > ($1".csv")}' 2015full.csv 

Additionally I would like to add a header row to each split file and an index column. My data looks like:

2015full.csv

2015-12-24 18:20:57 -87.2788204 36.5984675 0
2015-12-24 18:20:42 -87.2784049 36.597298699999996 0
2015-12-24 18:20:26 -87.274402 36.5932405 0
2015-12-23 18:20:10 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:40 -87.25762519999999 36.572330400000006 0
2015-12-23 18:19:21 -87.25762519999999 36.572330400000006 0

And I'm trying to get:

2015-12-24.csv

num date time lon lat
1 2015-12-24 18:20:57 -87.2788204 36.5984675
2 2015-12-24 18:20:42 -87.2784049 36.597298699999996
3 2015-12-24 18:20:26 -87.274402 36.5932405

2015-12-23.csv

num date time lon lat
1 2015-12-23 18:20:10 -87.25762519999999 36.572330400000006
2 2015-12-23 18:19:40 -87.25762519999999 36.572330400000006
3 2015-12-23 18:19:21 -87.25762519999999 36.572330400000006

I may have the right pieces below:

awk -F' ' 'NR==1{print “num”, $0; “date”, $1; “time”, $2; “lon”, $3; “lat”, $4; next}{print (NR-1), $0}{close($1".csv")}{print > ($1".csv")}' 2015full.csv

but they are not in an order that creates a working command for my purposes. Does anyone have a suggestion for me? Thanks!

1
  • If the file is sorted, don't close the file until you get to a new date. If it's not sorted, consider sorting it. Commented Dec 28, 2017 at 23:37

3 Answers 3

3

Though you are closing the files in backend but when $1's value will change at that time previous $1's valued .csv file will be still opened in backend so rather than closing them on each line, try following and let me know if this helps you(considering that your Input_file is sorted as per shown sample, if not then we have to sort it by 1st column and pipe it to awk command).

awk -F' ' 'prev!=$1{close(prev".csv")}{print > ($1".csv");prev=$1}' 2015full.csv 

EDIT: I could see you need a header in each $1 output file, if so then following code may help you in same too.

awk -F' ' 'prev!=$1{close(prev".csv");print "num date time lon lat" > ($1".csv")}{print > ($1".csv");prev=$1}' 2015full.csv
Sign up to request clarification or add additional context in comments.

1 Comment

Works well, but I didn't draw enough attention in the original post to the fact that I need an index column also...
0
awk '
    BEGIN { hdr = "num" OFS "date" OFS "time" OFS "lon" OFS "lat" }
    $1!=prev { close(out); out=$1".csv"; print hdr > out; idx=0; prev=$1 }
    { print ++idx, $0 > out }
' 2015full.csv

2 Comments

While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
0

if not using awk

for i in $(cut -d ' ' -f1 2015full.csv|uniq);do grep -w $i 2015full.csv|nl -w1 -s ' ' |sed "1i num date time lon lat" >$i.csv; done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.