9

I have a string:

              C1     C2                       DATE     C4     C5         C6      C7
0            0.0    W04  2021-01-08 00:00:00+00:00      E    EUE         C1     157
1            0.0    W04  2021-01-08 00:00:00+00:00      E    AEU         C1     157
2            0.0    W04  2021-01-01 00:00:00+00:00      E   SADA         H1     747
3            0.0    W04  2021-01-04 00:00:00+00:00      E   SSEA         H1     747
4            0.0    W04  2021-01-05 00:00:00+00:00      E   GPEA         H1     747

It sure looks like a Pandas DataFrame because it comes from one. I need to convert it into a Pandas DataFrame.

I tried the following:

pd.read_csv(StringIO(string_file),sep=r"\s+")

but it messes with the columns and separates the DATE column into 2 columns.

4
  • 6
    Use sep=r"\s\s+" Commented Feb 2, 2021 at 18:23
  • @SayandipDutta thank you, it works on the body, but the header is still messed up. It looks right aligned and one-space-separated Commented Feb 2, 2021 at 18:30
  • 4
    I copied this data and tried reading using StringIO and sep=r'\s\s+', I can't reproduce your problem. Works fine for me. Commented Feb 2, 2021 at 18:32
  • 1
    pandas has read_fwf where you can specify the breakpoints for columns. this looks like a solution for your use case. Commented Feb 6, 2021 at 23:37

1 Answer 1

6

First, recreate the string:

s = """
              C1     C2                       DATE     C4     C5         C6      C7
0            0.0    W04  2021-01-08 00:00:00+00:00      E    EUE         C1     157
1            0.0    W04  2021-01-08 00:00:00+00:00      E    AEU         C1     157
2            0.0    W04  2021-01-01 00:00:00+00:00      E   SADA         H1     747
3            0.0    W04  2021-01-04 00:00:00+00:00      E   SSEA         H1     747
4            0.0    W04  2021-01-05 00:00:00+00:00      E   GPEA         H1     747
"""

Now, you can use Pandas.read_csv to import a buffer:

from io import StringIO
df = pd.read_csv(StringIO(s), sep=r"\s\s+")

From what I can tell, this results in exactly the DataFrame that you are looking for:

Screenshot of resulting DataFrame

You may want to convert the DATE column to datetime values as well:

df['DATE'] = df.DATE.astype('datetime64')
Sign up to request clarification or add additional context in comments.

1 Comment

Something to consider when following this method is that you may run into issues if the column name is longer than the data is being inputted. In that case, you may have to manually add some spaces in the header row in order for the columns to properly align with data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.