1

How do I convert a Python class object that has fields that instantiate other classes to a DataFrame? I tried the following code below but it does not work.

I can get it to work when I take out self.address = Address() and self.agency_contact_info = ContactInfo()

class Address:
    def __init__(self):
        self.address_one = "address 1"
        self.address_two = "P.O. BOX 1"                  

class ContactInfo:
    def __init__(self):
        self.person_name = "Me"
        self.phone_number = "999-999-9999"    

class AgencyRecord:
    def __init__(self):
        self.agency_code = "00"
        self.agency_id = "000"
        self.agency_name = "Some Agency"
        self.address = Address()
        self.agency_contact_info = ContactInfo()            

def create_data():
    data = {}

    for i in range(0, 3):
        alc = AgencyRecord()                    
        data[i] = alc   

    column_list = [
        'agency_code', 'agency_id', 'agency_name', 
        'address_one', 'address_two', 'person_name', 'phone_number'
    ]

    spark.createDataFrame(
        list(data.values()),
        column_list
    ).createOrReplaceTempView("MyTempTable")

1 Answer 1

0

Quoting myself again:

I find it's useful to think of the argument to createDataFrame() as a list of [iterables] where each entry in the list corresponds to a row in the DataFrame and each element of the [iterable] corresponds to a column.


So you need to convert each of your objects into an interable where each element corresponds to the columns in column_list.

I wouldn't necessarily endorse it (there's almost surely a better way), but here is one hacky approach you can take to modify your code accordingly:

You can take advantage of the fact that python objects have a self.__dict__ that you can use to retrieve parameters by name. First, update your AgencyRecord class to pull in the fields from the Address and ContactInfo classes:

class AgencyRecord:
    def __init__(self):
        self.agency_code = "00"
        self.agency_id = "000"
        self.agency_name = "Some Agency"
        self.address = Address()
        self.agency_contact_info = ContactInfo()

        # makes the variables of the contained classes members of this class
        self.__dict__.update(self.address.__dict__)
        self.__dict__.update(self.agency_contact_info.__dict__)

Now we can reference each column in column_list by name for any instance of an AgencyRecord.

Modify the create_data as follows (I've also changed this to return a DataFrame, rather than registering a temp view)

def create_data():
    data = {}

    for i in range(0, 3):
        alc = AgencyRecord()                    
        data[i] = alc   

    column_list = [
        'agency_code', 'agency_id', 'agency_name', 
        'address_one', 'address_two', 'person_name', 'phone_number'
    ]

    values = [
        [data[record].__dict__[c] for c in column_list]
        for record in data
    ]

    return spark.createDataFrame(values, column_list)

Now you can do:

temp_df = create_data()
temp_df.show()
#+-----------+---------+-----------+-----------+-----------+-----------+------------+
#|agency_code|agency_id|agency_name|address_one|address_two|person_name|phone_number|
#+-----------+---------+-----------+-----------+-----------+-----------+------------+
#|         00|      000|Some Agency|  address 1| P.O. BOX 1|         Me|999-999-9999|
#|         00|      000|Some Agency|  address 1| P.O. BOX 1|         Me|999-999-9999|
#|         00|      000|Some Agency|  address 1| P.O. BOX 1|         Me|999-999-9999|
#+-----------+---------+-----------+-----------+-----------+-----------+------------+
Sign up to request clarification or add additional context in comments.

3 Comments

Could simplify data = {i: AgencyRecord() for i in range(3)}
@cricket_007 sure, but I just copied and pasted that part of OP's code.
I know :) Just providing a comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.