0

I am trying to implement a way to send and receive files using the socket library, but when I run the code, I keep getting the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte".

Sender code:

import os, socket

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("localhost", 9999))

with open("image.jpg", "rb") as file:
    file_size = os.path.getsize("send_file.txt")

    client.send("image.jpg".encode())
    client.send(str(file_size).encode())

    data = file.read()
    client.sendall(data)
    client.send(b"<DATA_END>")

    client.close()

Receiver code:

import socket, tqdm

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("localhost", 9999))
server.listen()

client, addr = server.accept()

file_name = client.recv(1024).decode()
print(file_name)
file_size = client.recv(1024).decode() 
print(file_size)

with open(file_name, "wb") as file:
    file_bytes = b""

    done = False

    progress = tqdm.tqdm(unit="B", unit_scale=True, unit_divisor=1000, total=int(file_size))

    while not done:
        data = client.recv(1024)
        if file_bytes[-10:] == b"<DATA_END>":
            done = True
        else:
            file_bytes += data
        progress.update(1024)

    file.write(file_bytes)

    client.close()
    server.close()
3
  • When writing network code you should always specify the encoding. When you just "image.jpg".encode(), for instance, that could be utf-8 one machine, utf-16 on another and latin1 on yet another. The FF suggests that you are reading a utf-16 BOM, but just a guess there. Put "utf-8" as the encoding for all encode and decode operations then retest. Commented Dec 6, 2022 at 3:53
  • FYI, the default for .encode() is utf-8 since Python 3.2, so not as variable as, say, open. Commented Dec 6, 2022 at 4:42
  • Another reference for sending files: stackoverflow.com/a/59692611/235698 Commented Dec 6, 2022 at 4:47

1 Answer 1

1

You assume that sockets honor send boundaries on receive. But that is not how a stream oriented protocol like TCP works. send may buffer before sending or split a send across multiple low level data link packets. recv just takes the data currently available. notably, you wouldnt' know if you ar are a word boundary for string decode. You need some other mechanism to decide when useful chunks of data have arrived.

A protocol that would work well in your case is to pass a header where each field is separated by newlines. Neither filename or size have a newline, so that's an easy demarcation. The receiver would read up to newline boundaries until it has filename and size, then could read exactly size bytes for the payload.

client.py

import os, socket

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("localhost", 9999))

filename = "image.jpg"

with open(filename, "rb") as file:
    file_size = os.path.getsize(filename)
    # protocol <filename>\n<size>\n<data>
    client.sendall(filename.encode("utf-8"))
    client.sendall(b"\n")
    client.sendall(str(file_size).encode("utf-8"))
    client.sendall(b"\n")
    data = file.read()
    client.sendall(data)
    client.close()

server.py

def recv_to_newline(s):
    buf = []
    while True:
        c = s.recv(1)
        if not len(c):
            # socket closed
            return None
        if c == b"\n":
            return b"".join(buf)
        buf.append(c)

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("localhost", 9999))
server.listen()

client, addr = server.accept()

file_name = recv_to_newline(client).decode("utf-8")
print(file_name)
file_size = int(client.recv(1024).decode())
print(file_size)

error = False
with open(file_name, "wb") as file:
    progress = tqdm.tqdm(unit="B", unit_scale=True, unit_divisor=1000, total=int(file_size))
    while file_size:
        data = client.recv(min(1024, file_size))
        if not data:
            print("Error: Truncated recieve")
            error = True
            break
        file.write(data)    
        progress.update(len(data))
        file_size -= len(data)

if error:
    os.remove(file_name)
    
client.close()
server.close()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.