1

I have created few global variables as shown in my code below. However when I try to use them inside the individual functions, I still get the same error.

Please find my code below

def create_df():
  global sheet_name, sheet_df 
  for s in sheets:
     sheet_name = s
     sheet_df = pd.read_excel(xls, sheet_name=s)
     sheet_df = sheet_df[sheet_df.columns.difference(sheet_df.filter(like='Derived').columns,sort=False)]
     print("Sheet " + str(s) + " is created as a dataframe successfully")
     transform_stage_1_df()

def transform_stage_1_df():
    global sheet_df 
    sheet_df = pd.melt(sheet_df, id_vars='subject_ID', var_name='headers', value_name='dates')
    sheet_df['header_extracted'] = [x.split("Date")[0] for x in sheet_df['headers']]
    sheet_df['day'] = sheet_df['header_extracted'].str.extract('(\d+)', expand=True).astype(int)
    sheet_df = sheet_df[sheet_df.groupby(['subject_ID','header_extracted'])['dates'].transform('count').ne(0)].copy()
    sheet_df = sheet_df.sort_values(by=['subject_ID','day'])
    sheet_df.drop(['header_extracted', 'day'], axis=1, inplace=True)
print("Stage 1 transformation is complete")


if __name__ == '__main__':
    print("Execution Started")
    print("File read successfully")
    sheets = xls.sheet_names
    sheet_name = sheet_df = Non_Cholesterol = None
    dataFramesDict = dict()
    create_df()
    add_units()
    Non_Cholesterol.to_csv('Output.csv')

Based on SO posts, I have already updated the global keyword but still I get the UnboundLocalError: local variable 'sheet_df' referenced before assignment

UnboundLocalError                         Traceback (most recent call last)
 <ipython-input-210-dc2f8412235d> in <module>
      7     sheet_df = None
       8     dataFramesDict = dict()
  ----> 9     create_df()
 10     Non_Cholesterol.to_csv('C:\\Users' + 
 str('Non_cholesterol.csv'),index=None)
 11     print("Export successful")

 <ipython-input-205-c93604f0da4f> in create_df()
  5         sheet_df =sheet_df[sheet_df.columns.difference(sheet_df.filter(like='Derived').columns,sort=False)]
  6         print("Sheet " + str(s) + " is created as a dataframe 
  successfully")
     ----> 7         transform_stage_1_df()

 <ipython-input-206-b59c70018a9b> in transform_stage_1_df()
  1 def transform_stage_1_df():
  ----> 2     sheet_df = pd.melt(sheet_df, id_vars='subject_ID', 
 var_name='headers', value_name='dates')
  3     sheet_df['header_extracted'] = [x.split("Date")[0] for x in 
 sheet_df['headers']]
  4     sheet_df['day'] = sheet_df['header_extracted'].str.extract('(\d+)', 
 expand=True).astype(int)
  5     sheet_df = 
 sheet_df[sheet_df.groupby(['subject_ID','header_extracted']) 
 ['dates'].transform('count').ne(0)].copy()

  UnboundLocalError: local variable 'sheet_df' referenced before assignment
8
  • 1
    you put the global in the wrong place, you need to do the global in the functions Commented Aug 1, 2019 at 10:54
  • You mean whereever I use the variable, I have to use global everywhere? I mean once in whichever function it is being used? Commented Aug 1, 2019 at 10:55
  • once at the top of each function declare the variables you want to survive as global. i want to point out that this is a pretty bad programming practice and unless you have some extremely specific need, don't do this Commented Aug 1, 2019 at 10:56
  • No, still its the same error. Please see the updated code Commented Aug 1, 2019 at 10:59
  • 1
    little bit of both. generally try to avoid using globals because that will very quickly make code unmaintainable. have your functions return what ever it is they changed, or pass mutables for them to change Commented Aug 1, 2019 at 11:12

1 Answer 1

1

You need to initialize the variables at the body of your script. When you say global variable_name it means that you will be accessing the variable outside of the function instead of a local variable named variable_name.

# Initialize the variables first
sheet_name = None
sheet_df = None

def create_df():
  global sheet_name, sheet_df 
  for s in sheets:
     sheet_name = s
     sheet_df = pd.read_excel(xls, sheet_name=s)
     sheet_df = sheet_df[sheet_df.columns.difference(sheet_df.filter(like='Derived').columns,sort=False)]
     print("Sheet " + str(s) + " is created as a dataframe successfully")
     transform_stage_1_df()

def transform_stage_1_df():
    global sheet_df 
    sheet_df = pd.melt(sheet_df, id_vars='subject_ID', var_name='headers', value_name='dates')
    sheet_df['header_extracted'] = [x.split("Date")[0] for x in sheet_df['headers']]
    sheet_df['day'] = sheet_df['header_extracted'].str.extract('(\d+)', expand=True).astype(int)
    sheet_df = sheet_df[sheet_df.groupby(['subject_ID','header_extracted'])['dates'].transform('count').ne(0)].copy()
    sheet_df = sheet_df.sort_values(by=['subject_ID','day'])
    sheet_df.drop(['header_extracted', 'day'], axis=1, inplace=True)
    print("Stage 1 transformation is complete")


if __name__ == '__main__':
   print("Execution Started")
   xls = pd.ExcelFile('C:\\Users\\All.xlsx')
   print("File read successfully")
   sheets = xls.sheet_names
   dataFramesDict = dict()
   create_df()

Sign up to request clarification or add additional context in comments.

3 Comments

Can you comment the full error and which line it throws? I can not access the image sadly.
Updated now. please check
It is supposed to work, if its stated like that. I thought you might be calling a function that contains code for main. Mistake on my part.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.