0

I've got one crappy CSV file with multiple headers inside it. It looks like this:

File1:
    #HEADER COL1 COL2
    data
    data
    data
    #HEADER COL1 COL2 COL3
    data
    data
    data
    data
    data
    #HEADER COL1 COL2 COL3 COL4
    data
    data
    ...

Since I can't load it with pandas due to the in-the-file-headers, I'm looking to split the data at each header to a separate file (or separate data frames in pandas). Is there a way to do this?

This CSV is generated by sensors. If a sensor is added, the header will get a new column. This could also happen in the in-the-file-headers. So it is NOT a solution to delete those headers. (Clean wrong header inside Dataframe with Python/Pandas)

It would be really nice to do it in python/pandas, but I would also be happy with a bash command/script solution.

Expected output:

File1:
        #HEADER COL1 COL2
        data
        data
        data
File2:
        #HEADER COL1 COL2 COL3
        data
        data
        data
        data
        data
File3:
        #HEADER COL1 COL2 COL3 COL4
        data
        data
        ...

Thank you!

3
  • Are you able to show your expected output as well as what you've tried ? Commented Nov 13, 2019 at 15:37
  • Are those HEADER lines always the same? Can you post a realistic input? Commented Nov 13, 2019 at 15:37
  • No the HEADER lines can change (but they all start with "#"). So if I could locate "#", I could split the files Commented Nov 13, 2019 at 15:44

2 Answers 2

1

awk to the rescue!

$ awk '/^#HEADER/{close(FILENAME "_" c); c++} {print > (FILENAME "_" c)}' file

will split input file into file_n parts, where n is the section counter.

Sign up to request clarification or add additional context in comments.

2 Comments

in my case this just worked beautifully! awk '/^#/{close(FILENAME "_" c); c++} {print > (FILENAME "_" c)}' file
last question: how can I export the result file to another folder?
1

With convenient csplit command (to split a file into sections by pattern):

csplit -b %d -f file -z input_file '/#HEADER.*/' '{*}'

Viewing results:

$ head file[0-9]
==> file0 <==
#HEADER COL1 COL2
data
data
data

==> file1 <==
#HEADER COL1 COL2 COL3
data
data
data
data
data

==> file2 <==
#HEADER COL1 COL2 COL3 COL4
data
data
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.