1

I am new to Google Cloud Storage. In my python code, I have couple of Dataframes and I want to store them in a GCS bucket as a single excel file with multiple sheets. In local directory, I am able to do that with using ExcelWriter. Here is the code for that

writer = pd.ExcelWriter(filename)
dataframe1.to_excel(writer, 'sheet1', index=False)
dataframe2.to_excel(writer, 'sheet2', index=False)
writer.save()

I don't want to save a temp file in local directory and then upload it to GCS.

4
  • Hi Nishant, check out a couple of these threads. They may help stackoverflow.com/questions/36314797/… stackoverflow.com/questions/56596951/…. You also may want to look into using this library pypi.org/project/google-cloud-storage. Commented May 8, 2020 at 14:24
  • Hi @jawsem, Thanks for these links. But that solution is store CSV file into GCS. I want to store output as an excel file. Commented May 9, 2020 at 6:02
  • Looking into the documentation for Python client libraries for GCS I don't see an upload method suitable for your scenario. Perhaps, you could implement the removal of the local .xlsx file once it is uploaded to the GCS. Commented May 13, 2020 at 8:31
  • @DenisT. Thanks for the documentation link. Currently, I am creating local file and then uploading it, but I want to see if there's any way to directly save it in the cloud. Commented Jun 12, 2020 at 6:20

1 Answer 1

3

You can skip the use of gcsfs and directly use the ExcelWriter object with storage client:

import io
import pandas as pd
from google.cloud import storage

storage_client = storage.Client()
bucket = storage_client.bucket('name_of_your_bucket')
blob = bucket.blob('path/to/excel')
with io.BytesIO() as output:
    writer = pd.ExcelWriter(output, engine='xlsxwriter')
    dataframe1.to_excel(writer, sheet_name='sheet1', index=False)
    dataframe2.to_excel(writer, sheet_name='sheet2', index=False)
    writer.save()
    output.seek(0)
    blob.upload_from_file(output, content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")

OLD Answer:

You can instantiate your ExcelWriter() with engine=xlsxwriter and use fs-gcsfs to write the bytes array to excel file on your GCS bucket.

In your case you can do the following:

import io
import pandas as pd
from fs_gcsfs import GCSFS

gcsfs = GCSFS(bucket_name='name_of_your_bucket',
                      root_path='path/to/excel', 
#set a different root path if you wish to upload multiple files in different locations
                      strict=False)
gcsfs.fix_storage()

output = io.BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')

dataframe1.to_excel(writer, sheet_name='sheet1', index=False)
dataframe2.to_excel(writer, sheet_name='sheet2', index=False)

writer.save()
xlsx_data = output.getvalue()

with gcsfs.open('./excel_file.xlsx', 'wb') as f:
  f.write(xlsx_data) 

PS: I had to use strict=False as fs-gcsfs wasn't able to locate the root path (Do check the limitations section in the documentation for fs-gcsfs)

Source: https://xlsxwriter.readthedocs.io/working_with_pandas.html#saving-the-dataframe-output-to-a-string

Sign up to request clarification or add additional context in comments.

1 Comment

Anybody using this going forward should note that the save() method is deprecated and should be replaced with the close() to create a file-like object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.