1

I have a single text file that looks like:

Filename and date
Header1   Header2   Header3
data      data      data
data      data      data
data      data      data

Filename2 and date2
Header1   Header2   Header3
data      data      data
data      data      data

..and so on for many blocks of data.

I am using pd.read_csv to open the file.

How do I split this into separate dataframes? The headers for each dataframe will be identical, but the "filename and date" need to be preserved for each one. Also, there is a different number of lines of data for each file.

Used dfs = { k: pd.read_csv(pd.io.common.StringIO('\n'.join(dat)),delim_whitespace=True) for k, *dat in map(str.splitlines, open('my.csv').read().split('\n\n')) }

but am getting

                  Header1 Header2 Header3
Filename and date   0    data    data    data
                    1    data    data    data
                    2    data    data    data
                   Filename2 and date2    
                   1    data    data    data
                   2    data    data    data

where "Filename2" "and" "date"2 are placed under the "Header1" "Header2" "Header3" columns like they are all part of 1 dataframe in 1 dict instead of making several different dataframes in the dict.

1 Answer 1

1

dictionary comprehension

dfs = {
    k: pd.read_csv(pd.io.common.StringIO('\n'.join(dat)), delim_whitespace=True)
    for k, *dat in map(str.splitlines, open('my.csv').read().split('\n\n'))
}

Proof that it worked

pd.concat(dfs)

                      Header1 Header2 Header3
Filename and date   0    data    data    data
                    1    data    data    data
                    2    data    data    data
Filename2 and date2 0    data    data    data
                    1    data    data    data
Sign up to request clarification or add additional context in comments.

7 Comments

I get for k, *dat in map(str.splitlines, open(outfile).read().split('\n\n')) ValueError: not enough values to unpack (expected at least 1, got 0). Is this because some of the columns have no data where other columns do?
^^^needed to just add another '\n' on the split.
The only problem I see is that it puts all the data into a single dataframe under "filename and date" instead of a dict of multiple dataframes labeled by the different filename/date. Any suggestions?
I've shown one way to split it up. You are only limited by your imagination. Think of how you'd like to have it and I can help you get it there. Usually, an OP (that's you) provides what they expect to see as the final result.
I want to see exactly how you placed it. I assumed that your result would put each data block into a dataframe which would all be placed into a dict. Using your help, the code only makes one dict with "filename and date" as the label and then puts all the data into a dataframe with "filename2 and date2" and so on placed as values under the "Header1" "Header2". I edited my question using your answer to show what the problem is vs what I am looking for.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.