2

The pandas DataFrame object has a to_string() method that is called on the __repr__ magic method. Thus when I say x = f'{df}', x is gonna be the string representation of the dataframe df.

How can I retrieve (reconstruct) the dataframe only having x? So I would like a method called get_dataframe_from_string(df: str) -> pd.DataFrame that gets the string and returns the dataframe.

The method should be generic, so it should work with multiindices as well.

2
  • 1
    why do you want to do that? Commented Dec 21, 2021 at 10:07
  • An external dependency raises a warning with having the df's string representation in its message. Id like to catch that warning and log that df. Commented Dec 22, 2021 at 13:00

2 Answers 2

2

TL;DR

Use df.to_csv() instead of df.__str__() and then you can do it.

str(df) won't work

The short answer is: you can't. At least not with pandas' builtin string representation.

The reason is df.__repr__ does not have a (mathematical) inverse function:

import pandas as pd


df = pd.DataFrame.from_dict(dict(x=range(100), y=range(100)))
print(df)
#      x   y
# 0    0   0
# 1    1   1
# 2    2   2
# 3    3   3
# 4    4   4
# ..  ..  ..
# 95  95  95
# 96  96  96
# 97  97  97
# 98  98  98
# 99  99  99

There is no way to know what the rows 5-94 contain.

A solution: df.to_csv

One could come up with hacks to work around it but the only sensible way to do this Imo is to use well-known pandas methods, e.g. to_csv:

str_df = df.to_csv()
print(str_df)
# ,x,y
# 0,0,0
# 1,1,1
# 2,2,2
# 3,3,3

where str_df contains all the data (I truncated the output).

Then you can get your original dataframe back using io and read_csv:

import io

original_df = pd.read_csv(io.StringIO(str_df))
print(original_df)
#     Unnamed: 0   x   y
# 0            0   0   0
# 1            1   1   1
# 2            2   2   2
# 3            3   3   3
# 4            4   4   4
# ..         ...  ..  ..
# 95          95  95  95
# 96          96  96  96
# 97          97  97  97
# 98          98  98  98
# 99          99  99  99

Note that the column Unnamed is the present because we didn't exclude to row names. These can be excluded in df.to_csv.

Sign up to request clarification or add additional context in comments.

Comments

0

pandas basically does this in its read_clipboard function. It's trying to construct a DataFrame from a string text, so you should be able to adopt whatever happens after this line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.