How to define empty pandas DataFrame in function?

Question

My method

def myfunc(filename, filepath):
    result_df = pd.DataFrame()
    with open(filename, encoding='utf-8', mode='r') as i:
        data = pd.read_json(i, lines=True)
        result_df.append(data)
        table_from_pandas = pa.Table.from_pandas(result_df)
        pq.write_table(table_from_pandas,filepath)
        return result_df

Pycharm shows

(<class 'NameError'>, NameError("name 'result_df' is not defined"), <traceback object at 0x1135a0500>)

From Python shell,everything works fine. But I need to somehow define my df in advance in order to use my method. This is my code:

if __name__ == '__main__':
    files = os.listdir('/Users/milenko/mario/Json_gzips')
    files = [fi for fi in files if fi.endswith(".gz")]

    my_dict = {'ticr_calculated_2': 'ticr-2.parquet', 'ticr_calculated_3': 'ticr-3.parquet', \
               'ticr_calculated_4': 'ticr-4.parquet', 'tick_calculated_2': 'tick-2.parquet', \
               'tick_calculated_3': 'tick-3.parquet', 'tick_calculated_4': 'tick-4.parquet'}
basic = '/Users/milenko/mario/Json_gzips/'
json_fi = glob.glob("*.json")

for key, value in my_dict.items():
    for f in json_fi:
        if re.match(key, f):
            filepath = basic + value
            myfunc(f, filepath)

How to solve this?

It's a bit hard to understand what's your issue, did you forget to return your result_df by any chance? — Thomas Schillaci
– Thomas Schillaci, Commented Jun 19, 2020 at 9:13

Gustav Rasmussen · Accepted Answer · 2020-06-19 09:24:19Z

1

Here is a small example for how to append data into a empty dataframe. You need to specify column names when defining result_df:

import pandas as pd


def myfunc():
    result_df = pd.DataFrame([], columns = ["a", "b"])
    data = [5, 6]
    df_length = len(result_df)
    result_df.loc[df_length] = data
    return result_df


print(myfunc())

Returning

   a  b
0  5  6

edited Jun 19, 2020 at 9:24

answered Jun 19, 2020 at 9:19

Gustav Rasmussen

4,0394 gold badges32 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Djikii Over a year ago

My problem is a liitle bit different. I will have six different parquet files to write to,and schema should be inferred.

Gustav Rasmussen Over a year ago

This minimal example can be built upon I suppose. after declaring and assigning to the result_df you can set up the context manager for working with the file input as your data source to write into the result_df

Collectives™ on Stack Overflow

How to define empty pandas DataFrame in function?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related