I am using Azure Databricks and Azure Data Storage Explorer for my operations. I have an excel file of under 30 MB containing multiple sheets. I want to replace the data in one sheet every month when I run this code. The rest of the sheets contain pivot tables that are used for reporting, based on the data sheet. I want to overwrite this sheet alone every month which will automatically refresh the other sheets.
I am completely new to pyspark and Azure. This seems to be possible using pandas and openpyxl, but it does not recognize the file path pointing to Azure Data Lake. So far, from what I have read, it doesn't seem possible to overwrite part of an existing file using pyspark.pandas.DataFrame. I believe I have 2 options:
- Find a way to make pandas recognize adls path.
- Overwrite part of an excel file using pyspark.
Please correct me if I am wrong. I would be grateful for any pointers.