0

My method

def myfunc(filename, filepath):
    result_df = pd.DataFrame()
    with open(filename, encoding='utf-8', mode='r') as i:
        data = pd.read_json(i, lines=True)
        result_df.append(data)
        table_from_pandas = pa.Table.from_pandas(result_df)
        pq.write_table(table_from_pandas,filepath)
        return result_df

Pycharm shows

(<class 'NameError'>, NameError("name 'result_df' is not defined"), <traceback object at 0x1135a0500>)

From Python shell,everything works fine. But I need to somehow define my df in advance in order to use my method. This is my code:

if __name__ == '__main__':
    files = os.listdir('/Users/milenko/mario/Json_gzips')
    files = [fi for fi in files if fi.endswith(".gz")]

    my_dict = {'ticr_calculated_2': 'ticr-2.parquet', 'ticr_calculated_3': 'ticr-3.parquet', \
               'ticr_calculated_4': 'ticr-4.parquet', 'tick_calculated_2': 'tick-2.parquet', \
               'tick_calculated_3': 'tick-3.parquet', 'tick_calculated_4': 'tick-4.parquet'}
basic = '/Users/milenko/mario/Json_gzips/'
json_fi = glob.glob("*.json")

for key, value in my_dict.items():
    for f in json_fi:
        if re.match(key, f):
            filepath = basic + value
            myfunc(f, filepath)

How to solve this?

6
  • How did you get this output? Commented Jun 19, 2020 at 9:09
  • Variables in PyCharm. Commented Jun 19, 2020 at 9:11
  • What line is causing that error message? Commented Jun 19, 2020 at 9:12
  • There is no error in code.Will edit. Commented Jun 19, 2020 at 9:13
  • 1
    It's a bit hard to understand what's your issue, did you forget to return your result_df by any chance? Commented Jun 19, 2020 at 9:13

1 Answer 1

1

Here is a small example for how to append data into a empty dataframe. You need to specify column names when defining result_df:

import pandas as pd


def myfunc():
    result_df = pd.DataFrame([], columns = ["a", "b"])
    data = [5, 6]
    df_length = len(result_df)
    result_df.loc[df_length] = data
    return result_df


print(myfunc())

Returning

   a  b
0  5  6
Sign up to request clarification or add additional context in comments.

2 Comments

My problem is a liitle bit different. I will have six different parquet files to write to,and schema should be inferred.
This minimal example can be built upon I suppose. after declaring and assigning to the result_df you can set up the context manager for working with the file input as your data source to write into the result_df

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.