25

I want to convert a Pandas DataFrame into a list of objects.

This is my class:

class Reading:

    def __init__(self):
        self.HourOfDay: int = 0
        self.Percentage: float = 0

I read up on .to_dict, so I tried

df.to_dict(into=Reading)

but it returned

TypeError: unsupported type

I don't want a list of tuples, or a list of dicts, but a list of Readings. Every question I've found so far seems to be about these two scenarios. But I want my own typed objects.

Thanks

0

3 Answers 3

26

Option 1: make Reading inherit from collections.MutableMapping and implement the necessary methods of that base class. Seems like a lot of work.

Option 2: Call Reading() in a list comprehension:

>>> import pandas as pd
>>> 
>>> df = pd.DataFrame({
...     'HourOfDay': [5, 10],
...     'Percentage': [0.25, 0.40]
... })
>>> 
>>> class Reading(object):
...     def __init__(self, HourOfDay: int = 0, Percentage: float = 0):
...         self.HourOfDay = int(HourOfDay)
...         self.Percentage = Percentage
...     def __repr__(self):
...         return f'{self.__class__.__name__}> (hour {self.HourOfDay}, pct. {self.Percentage})'
... 
>>> 
>>> readings = [Reading(**kwargs) for kwargs in df.to_dict(orient='records')]
>>> 
>>> 
>>> readings
[Reading> (hour 5, pct. 0.25), Reading> (hour 10, pct. 0.4)]

From docs:

into: The collections.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

Sign up to request clarification or add additional context in comments.

5 Comments

Your answer macthed perfectly my needs !!! Thank you very much !! Just to explain: I'm trying to convert some dataframes to "object" like formats in order to prepare them to be used as "data" for an OpenOffice template, using the py3o.template library..... By the way, is there a way to automate the class "columns" innitialization ??
This should be marked as the valid answer
@linSESH I no longer use Python and was a beginner when I asked this question. Given how popular this question has become, if you can explain to me why this answer is better than the accepted one, I will happily accept this one instead
@zola25 It proposes 2 solutions, and both are better than the accepted one IMO. The second one is the same but just more elegant.
@linSESH thanks for the input, on reflection I think the most recent answer is the best
21

having data frame with two column HourOfDay and Percentage, and parameterized constructor of your class you could define a list of Object like this:

 class Reading:

   def __init__(self, h, p):
       self.HourOfDay = h 
       self.Percentage = p 

 listOfReading= [(Reading(row.HourOfDay,row.Percentage)) for index, row in df.iterrows() ]  

1 Comment

this more generic approach worked for me stackoverflow.com/a/75420677/98232
14

It would probably be better to initialise the class with arguments, as follows:

 class Reading:
   def __init__(self, h, p):
       self.HourOfDay = h 
       self.Percentage = p 

Then, to create a list of reading, you could use this function, that takes the DataFrame as an argument:

 def reading_list(df:pd.DataFrame)->list:
    return list(map(lambda x:Reading(h=x[0],p=x[1]),df.values.tolist()))

Execution is fast, even with a large dataset.

2 Comments

This is insanely fast!I just switched to this from reading_objects = reading_df.progress_apply(lambda row: Reading(*row.to_list()), axis=1) and I got 4 fold speedup! (progress_apply is apply with tqdm progress bar, and I am still using the tqdm() function around the df.values.tolist(), so I can not be this).
I'm amazed this isn't the accepted answer. Elegant and very fast.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.