1

I have looked at many related answers here on Stackoverflow and this question seems most related How to Transfer Pandas DataFrame to .csv on SFTP using Paramiko Library in Python?. I want to do something similar, however, I want to compress the file when I send it to the SFTP location, so I end up with a .csv.gz file essentially. The files I am working with are 15-40 MB in size uncompressed, but there are lots of them sometimes, so need to keep the fingerprint small.

I have been using code like this to move the dataframe to the destination, after pulling it from another location as a csv, doing some transformations on the data itself:

fileList = source_sftp.listdir('/Inbox/')   
dataList = []
for item in fileList: # for each file in the list...
    print(item)
    if item[-3:] == u'csv':
        temp = pd.read_csv(source_sftp.open('/Inbox/'+item)) # read the csv directly from the sftp server into a pd Dataframe
    elif item[-3:] == u'zip':
        temp = pd.read_csv(source_sftp.open('/Inbox/'+item),compression='zip')
    elif item[-3:] == u'.gz':
        temp = pd.read_csv(source_sftp.open('/Inbox/'+item),compression='gzip')
    else:
        temp = pd.read_csv(source_sftp.open('/Inbox/'+item),compression='infer')
    dataList.append(temp) # keep each 
#... Some transformations in here on the data 
FL = [(x.replace('.csv',''))+suffix # just swap out to suffix
      for x in fileList]
locpath = '{}/some/new/dir/'.format(dest_sftp.pwd)
i = 0     
for item in dataList:
    with dest_sftp.open(locpath + FL[i], 'w') as f:
        item.to_csv(f, index=False,compression='gzip')
    i = i+1

It seems like I should be able to get this to work, but I am guessing there is something being skipped over when I use to_csv to convert the dataframe back and then compress it on the sftp fileobject. Should I be streaming this somehow, or is there solution I am missing somewhere in the documentation on pysftp or pandas?

If I can avoid saving the csv file somewhere local first, I would like to, but I don't think I should have to, right? I am able to get the file in the end to be compressed if I just save file locally with temp.to_csv('/local/path/myfile.csv.gz', compression='gzip'), and after transferring this local file to the destination it is still compressed, so I don't think it has do with the transfer, just how pandas.Dataframe.to_csv and the pysftp.Connection.open are used together.

I should probably add that I still consider myself a newbie to much of Python, but I have been working with local to sftp and sftp to local, and have not had to do much in the way of transferring (directly or indirectly) between them.

0

1 Answer 1

0

Make sure you have the latest version of Pandas.

It supports the compression with a file-like object since 0.24 only:
GH21227: df.to_csv ignores compression when provided with a file handle

Sign up to request clarification or add additional context in comments.

1 Comment

After updating to pandas 0.24.2, it seems I get the same message on the Github entry about 20 days ago by someone else. RuntimeWarning: compression has no effect when passing file-like object as input. I am on Python 2.7 currently, so I may just need to meander thru finding some versions to packages for my Python 3 env. and try something there that doesn't break something else. Although, this person seemed to indicate python 3.7 is having same issue on Github. I actually ran across that github issue notice before, so I assumed it was something else that we were both doing wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.