1

Right now my final output is in excel format. I wanted to compressed my excel file using gzip. Is there a way to do it ?

import pandas as pd
import gzip
import re

def renaming_ad_unit():
    with gzip.open('weekly_direct_house.xlsx.gz') as f:
        df = pd.read_excel(f)
        result = df['Ad unit'].to_list()
        for index, a_string in enumerate(result):
            modified_string = re.sub(r"\([^()]*\)", "", a_string)
            df.at[index,'Ad unit'] = modified_string

    return df.to_excel('weekly_direct_house.xlsx',index=False)

2 Answers 2

4

Yes, this is possible.

To create a gzip file, you can open the file like this:

with gzip.open('filename.xlsx.gz', 'wb') as f:
    ...

Unfortunately, when I tried this, I found that I get the error OSError: Negative seek in write mode. This is because the Pandas excel writer moves backwards in the file when writing, and uses multiple passes to write the file. This is not allowed by the gzip module.

To fix this, I created a temporary file, and wrote the excel file there. Then, I read the file back, and write it to the compressed archive.

I wrote a short program to demonstrate this. It reads an excel file from a gzip archive, prints it out, and writes it back to another gzip file.

import pandas as pd
import gzip
import tempfile

def main():
    with gzip.open('apportionment-2020-table02.xlsx.gz') as f:
        df = pd.read_excel(f)
        print(df)

    with tempfile.TemporaryFile() as excel_f:
        df.to_excel(excel_f, index=False)
        with gzip.open('output.xlsx.gz', 'wb') as gzip_f:
            excel_f.seek(0)
            gzip_f.write(excel_f.read())

if __name__ == '__main__':
    main()

Here's the file I'm using to demonstrate this: Link

Sign up to request clarification or add additional context in comments.

4 Comments

you could also use io.BytesIO()
@furas Tried that - got a weird error from Pandas when writing the excel file.
it is strange, I tested it and it works for me (on Linux)
You guys are awesome ! Tested both and it works for me (Windows 10). Really appreciate it !
2

You could also use io.BytesIO to create file in memory and write excel in this file and next write this file as gzip on disk.

I used link to excel file from Nick ODell answer.

import pandas as pd
import gzip
import io

df = pd.read_excel('https://www2.census.gov/programs-surveys/decennial/2020/data/apportionment/apportionment-2020-table02.xlsx')

buf = io.BytesIO()

df.to_excel(buf)

buf.seek(0)  # move to the beginning of file

with gzip.open('output.xlsx.gz', 'wb') as f:
    f.write(buf.read())

Similar to Nick ODell answer.

import pandas as pd
import gzip
import io

df = pd.read_excel('https://www2.census.gov/programs-surveys/decennial/2020/data/apportionment/apportionment-2020-table02.xlsx')

with io.BytesIO() as buf:
    df.to_excel(buf)

    buf.seek(0)  # move to the beginning of file

    with gzip.open('output.xlsx.gz', 'wb') as f:
        f.write(buf.read())

Tested on Linux

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.