0

I want to upload a large ZIP file using FastAPI. I need to validate whether the uploaded file is a ZIP file before proceeding. While using the UploadFile class in FastAPI to handle file uploads, I want to first check if the file is indeed a ZIP file. If it is, then I will upload it. If it's not a ZIP file, I want to return an error message stating that the file is not a ZIP file, without fully processing the multipart upload of the large file.

from fastapi.responses import HTMLResponse
import os

app = FastAPI()

UPLOAD_FOLDER = 'uploads/'
ALLOWED_EXTENSIONS = {'csv'}

if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER)

def allowed_file(filename: str) -> bool:
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

@app.post("/uploadfile/")
async def upload_file(file: UploadFile = File(...)):
    if not allowed_file(file.filename):
        raise HTTPException(status_code=400, detail="Invalid file type. Only CSV files are allowed.")
    
    file_location = os.path.join(UPLOAD_FOLDER, file.filename)
    with open(file_location, "wb") as buffer:
        buffer.write(await file.read())

    return {"info": f"File '{file.filename}' uploaded successfully"}


4

1 Answer 1

1

If you can load just the first four bytes, then check if they are 0x50, 0x4b, 0x03, 0x04. If they are, it is very likely a zip file. If not, it is very likely not.

An empty zip file will start with 0x50 0x4b 0x05 0x06. The specification also provides for spanned/split archives whose pieces can start with 0x50 0x4b 0x07 0x08 or 0x50 0x4b 0x30 0x30, though I have never seen such a beast in the wild. Your unzipper probably can't process them anyway.

Just checking the first two bytes for 0x50 0x4b ("PK") would be too weak of a filter.

Sign up to request clarification or add additional context in comments.

1 Comment

Could be please give me example?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.