0

I have a python script that pulls from a 3 rd party API. The script runs for 3 different cities in loop and creates a data frame for each city. Then I transfer the data frame to an excel sheet as a tab. Below is the code.

    sublocation_ids = [
                {
                  "id": 163,
                  "name": "Atlanta, GA"
                },
                {
                  "id": 140,
                  "name": "Austin, TX"
                },
                {
                  "id": 164,
                  "name": "Baltimore, MD"
                } 
             ]
filter_text = "(headline:coronavirus OR summary:coronavirus OR headline:covid-19 OR summary:covid-19) AND categories:{}"

writer = pd.ExcelWriter(excel_path)
    for sub in sublocation_ids:
        city_num_int = sub['id']
        city_num_str = str(city_num_int)
        city_name = sub['name']
        filter_text_new = filter_text.format(city_num_str)
        data = json.dumps({"filters": [filter_text_new], "sort_by":"created_at", "size":2})
        r = requests.post(url = api_endpoint, data = data).json()
        articles_list = r["articles"] 
        articles_list_normalized = json_normalize(articles_list)
        df = articles_list_normalized
        df['publication_timestamp'] = pd.to_datetime(df['publication_timestamp'])
        df['publication_timestamp'] = df['publication_timestamp'].apply(lambda x: x.now().strftime('%Y-%m-%d'))
        df.to_excel(writer, sheet_name = city_name)
        writer.save()

The current issue I am facing is only one tab is getting created in the excel sheet for the first city "Atlanta,GA" I pull the data for from the API. How to create the tab for each and every city in the directory or does my code has any issue?

2
  • i see two possible errors, first where is writer initalised? outside of loop? two your calling writer.save() with every loop thus overwriting the sheet each time. call it at the end of your loop Commented Apr 6, 2020 at 0:53
  • @Datanovice Check above is how the code is currently set. Commented Apr 6, 2020 at 0:55

2 Answers 2

2

See this bit from the df.to_excel() documentation:

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

df2 = df1.copy()
with pd.ExcelWriter('output.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='Sheet_name_1')
    df2.to_excel(writer, sheet_name='Sheet_name_2') 

So you may need to pull writer.save() outside of the loop.

Sign up to request clarification or add additional context in comments.

1 Comment

Sorry I forgot to include I have already defined the object check above edited code.
1

I can't speak for your code as I can't run it 'filter_text' seems to be a function you've written but not included.

essentially you have one of two errors I can see,

first it's not clear where you are initialising the writer object.

2nd you're overwriting the sheet with each loop - move it outside of the loop.

pd.ExcelFile can be used as a context manager - so you need to close/save it.

def close(self):
    """synonym for save, to make it more file-like"""
    return self.save() 

writer = pd.ExcelWriter('file.xlsx')

for sub in sublocation_ids:
    city_num_int = sub['id']
    city_num_str = str(city_num_int)
    city_name = sub['name']
    filter_text_new = filter_text.format(city_num_str)
    data = json.dumps({"filters": [filter_text_new], "sort_by":"created_at", "size":2})
    r = requests.post(url = api_endpoint, data = data).json()
    articles_list = r["articles"] 
    articles_list_normalized = json_normalize(articles_list)
    df = articles_list_normalized
    df['publication_timestamp'] = pd.to_datetime(df['publication_timestamp'])
    df['publication_timestamp'] = df['publication_timestamp'].apply(lambda x: x.now().strftime('%Y-%m-%d'))
    df.to_excel(writer, sheet_name = city_name)

writer.save() # move this after you've finished writing to your writer object.

Sheets as dictionaries

if you're curious of the innards of the class, use .__dict__. on the object so you can see the metadata.

writer = pd.ExcelWriter('file.xlsx')

df.to_excel(writer,sheet_name='Sheet1')
df.to_excel(writer,sheet_name='Sheet2')
print(writer.__dict__)

{'path': 'file.xlsx',
 'sheets': {'Sheet1': <xlsxwriter.worksheet.Worksheet at 0x11a05a79a88>,
  'Sheet2': <xlsxwriter.worksheet.Worksheet at 0x11a065218c8>},
 'cur_sheet': None,
 'date_format': 'YYYY-MM-DD',
 'datetime_format': 'YYYY-MM-DD HH:MM:SS',
 'mode': 'w',
 'book': <xlsxwriter.workbook.Workbook at 0x11a064ff1c8>}

5 Comments

Sorry I forgot to include I have already defined the object check above edited code.
the solution is clear, @error2007s just move the save outside of the loop.
Nope still the same issue only one tab is getting created
Also if this was a loop issue the Tab that the Excel file should have must be Baltimore MD right. But the current Excel file has tab of Atlanta GA @Datanovice
Ok yours is a the correct answer writer.save() needs to be outside loop. I was adding it at other place. It is working perfectly now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.