0

Encountered a strange problem. The following code snippet won't run if I execute the "load_data.py" file. However, when I run the lines in an IDE console it runs without problems. Hard coding the file path would work, and I have read solutions that suggested appending the data path to PYTHONPATH, however I'm not sure this is a good solution because I want to push this to a Docker container.

Can someone help me figure out what seems to be the problem here?

import pandas as pd
from os import path

def _preprocess_data(data_path):

    try:
        ## load data ##
        data = pd.read_json(data_path)

    except ValueError:
         print("File not found. Check the path variable and filename")
         exit()

if __name__ == '__main__':
    print('Preprocessing data...')
    #### Preparation ####
    file_path = path.abspath('load_data.py')  # full path of your script
    dir_path = path.dirname(file_path)  # full path of the directory of your script
    json_file_path = path.join(dir_path, 'data/clean_data.json')  # absolute zip file path
    _preprocess_data(data_path=json_file_path)

5
  • 2
    What does 'does not run' mean? Do you get an error? What happens? Commented Jan 21, 2022 at 11:56
  • What OS are you on? Commented Jan 21, 2022 at 11:58
  • Sorry, am on MacOS. The Error is ValueError Commented Jan 21, 2022 at 12:02
  • Could you print the single path outputs, pls. print(file_path), print(dir_path),print(json_file_path) And pathlib is the more modern way Commented Jan 21, 2022 at 12:06
  • print(file_path) >>> /Users/YalDan/myproject/load_data.py print(dir_path) >>> /Users/YalDan/myproject print(json_file_path) >>> /Users/YalDan/myproject/data/clean_data.json This is the output. The full error message is ValueError: Expected object or value. I tracked it down to being the standard error message when pandas can't find the file path. Opening and parsing the json file in other ways yielded other errors that were also caused by not finding the file. Thanks for the suggestion with pathlib, I will see if that helps Commented Jan 21, 2022 at 12:48

1 Answer 1

1

It looks like the reason your code is not working is because you're executing the code from another directory where load_data.py is not, so the relative path doesn't exist. Try changing your code to

import pandas as pd
from os import path

def _preprocess_data(data_path):

    try:
        ## load data ##
        data = pd.read_json(data_path)

    except ValueError:
         print("File not found. Check the path variable and filename")
         exit()

if __name__ == '__main__':
    print('Preprocessing data...')
    #### Preparation ####
    file_path = path.abspath(__file__)  # full path of your script
    dir_path = path.dirname(file_path)  # full path of the directory of your script
    print(dir_path)
    json_file_path = path.join(dir_path, 'data/clean_data.json')  # absolute zip file path
    _preprocess_data(data_path=json_file_path)

This will get the absolute path of the file being executed. Refer to here for more info on __file__

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This solved it for me in the IDE. Will read about __file__, this is very useful

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.