Break Text File into Multiple Pandas Dataframes using Headers from specific lines

Question

I have a single text file that looks like:

Filename and date
Header1   Header2   Header3
data      data      data
data      data      data
data      data      data

Filename2 and date2
Header1   Header2   Header3
data      data      data
data      data      data

..and so on for many blocks of data.

I am using pd.read_csv to open the file.

How do I split this into separate dataframes? The headers for each dataframe will be identical, but the "filename and date" need to be preserved for each one. Also, there is a different number of lines of data for each file.

Used dfs = { k: pd.read_csv(pd.io.common.StringIO('\n'.join(dat)),delim_whitespace=True) for k, *dat in map(str.splitlines, open('my.csv').read().split('\n\n')) }

but am getting

                  Header1 Header2 Header3
Filename and date   0    data    data    data
                    1    data    data    data
                    2    data    data    data
                   Filename2 and date2    
                   1    data    data    data
                   2    data    data    data

where "Filename2" "and" "date"2 are placed under the "Header1" "Header2" "Header3" columns like they are all part of 1 dataframe in 1 dict instead of making several different dataframes in the dict.

piRSquared · Accepted Answer · 2019-07-23 19:56:38Z

1

dictionary comprehension

dfs = {
    k: pd.read_csv(pd.io.common.StringIO('\n'.join(dat)), delim_whitespace=True)
    for k, *dat in map(str.splitlines, open('my.csv').read().split('\n\n'))
}

Proof that it worked

pd.concat(dfs)

                      Header1 Header2 Header3
Filename and date   0    data    data    data
                    1    data    data    data
                    2    data    data    data
Filename2 and date2 0    data    data    data
                    1    data    data    data

answered Jul 23, 2019 at 19:56

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Jonathon Over a year ago

I get for k, *dat in map(str.splitlines, open(outfile).read().split('\n\n')) ValueError: not enough values to unpack (expected at least 1, got 0). Is this because some of the columns have no data where other columns do?

Jonathon Over a year ago

^^^needed to just add another '\n' on the split.

Jonathon Over a year ago

The only problem I see is that it puts all the data into a single dataframe under "filename and date" instead of a dict of multiple dataframes labeled by the different filename/date. Any suggestions?

piRSquared Over a year ago

I've shown one way to split it up. You are only limited by your imagination. Think of how you'd like to have it and I can help you get it there. Usually, an OP (that's you) provides what they expect to see as the final result.

Jonathon Over a year ago

I want to see exactly how you placed it. I assumed that your result would put each data block into a dataframe which would all be placed into a dict. Using your help, the code only makes one dict with "filename and date" as the label and then puts all the data into a dataframe with "filename2 and date2" and so on placed as values under the "Header1" "Header2". I edited my question using your answer to show what the problem is vs what I am looking for.

|

Collectives™ on Stack Overflow

Break Text File into Multiple Pandas Dataframes using Headers from specific lines

1 Answer 1

dictionary comprehension

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

dictionary comprehension

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related