5

I have a program in which I use Paramiko to get files from SFTP server. Originally I was pulling the file locally with get and then processing through the file by opening the local copy. However, I am trying to avoid the get and just read the file as a stream. This is working fine until I encounter characters that are not UTF-8 - such as <96>. The program gets an exception when this happens. The problem is occurring on the line:

for line in remote_file

So I am not able to get the data from the stream. I have seen mention of decoding and re-encoding but I don't see any way to be able to do this since I am not being given the data by Paramiko.

Is there a Paramiko parameter that says what to do or provides some way to just get the raw data? How do I get around this issue?

Below is the code being processed - the first 3 lines establish the connection. Then I have some code (not shown) where I filter through the directory find a list of files about which I care. The next to last line opens a connection to the file on the SFTP server. The last line is where the error occurs - I have a try block around the whole block of code. When the exception is hit the error that is returned is

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 124: invalid start byte

ftpTransport = paramiko.Transport((FTPSERVER, FTPPORT))
ftpTransport.connect(username=FTPUSERNAME, password=FTPPASSWORD)
sftp = paramiko.SFTPClient.from_transport(ftpTransport)
remote_file = sftp.open(remoteName)
for line in remote_file:

I do not get the UTF-8 error if I do a sftp.get and then open the local file. For now I have changed my code to take that step but would prefer not copying the file locally if I don't have to.

0

1 Answer 1

4

Paramiko assumes that all text files are UTF-8 and uses "strict" decoding (aborting on any error).

To workaround that, you can open the file in "binary" mode. Then, the next(), readline() and similar, will return "binary string", which you can decode using any encoding you like, or decode using UTF-8 ignoring errors:

remote_file = sftp.open(remoteName, "rb")
for line in remote_file:
    print(line.decode("utf8", "ignore"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.