2

I have a tar file with several files compressed in it. I need to read one specific file (it is in csv format) using pandas. I tried to use the following code:

import tarfile
tar = tarfile.open('my_files.tar', 'r:gz')
f = tar.extractfile('some_files/need_to_be_read.csv')

import pandas as pd
df = pd.read_csv(f.read())

but it throws up the following error:

OSError: Expected file path name or file-like object, got <class 'bytes'> type

on the last line of the code. How do I go about this to read this file?

1 Answer 1

1

When you call pandas.read_csv(), you need to give it a filename or file-like object. tar.extractfile() returns a file-like object. Instead of reading the file into memory, pass the file to Pandas.

So remove the .read() part:

import tarfile
tar = tarfile.open('my_files.tar', 'r:gz')
f = tar.extractfile('some_files/need_to_be_read.csv')

import pandas as pd
df = pd.read_csv(f)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.