7

I need to feed to pandas read_csv the output of one command I execute with Popen.

p = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
pandas.read_csv(stdout,index_col=0)

But obviously stdout is a string and it's interpreted as a path. In the API documentation it says that "any object with a read() method (such as a file handle or StringIO)" can be an input to the read_csv function. How can I get such an object out of the Popen commmand? The final objective is to not write to disk.

Also when I write to disk the contents of stdout I can see that the csv has double quotes for each line

alvarobrandon$ head csvfile.csv
"1507109453,<,java,12447,a3e9c495869d,docker,9.0.4.130,9.0.2.131,9.0.2.131,9.0.4.130,56182,9092,9092,56182,tcp"
"1507109453,<,java,1244,a3e9c495869d,docker,9.0.4.130,9.0.2.131,9.0.2.131,9.0.4.130,56182,9092,9092,56182,tcp"
"1507109453,<,java,12447,a3e9c495869d,docker,9.0.4.130,9.0.2.131,9.0.2.131,9.0.4.130,56182,9092,9092,56182,tcp"

1 Answer 1

5

What you need is to read from stdout and store that data into a file-like StringIO object. Here is a minimal working example.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import io
import subprocess
import pandas

cmd = ('cat', '/tmp/csvfile')
process = subprocess.Popen(cmd, stdout=subprocess.PIPE)
csv = io.StringIO(process.stdout.read().decode())
data = pandas.read_csv(csv, index_col=0)
csv.close()

Hope this helps!

EDIT (output isn't really CSV, so we have to sanitize it a bit before parsing):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import io
import subprocess
import pandas

cmd = ('cat', '/tmp/csvfile')
process = subprocess.Popen(cmd, stdout=subprocess.PIPE)
csv = io.StringIO()
for line in process.stdout:
    csv.write(line.decode().strip('"\n') + '\n')
csv.seek(0)
data = pandas.read_csv(csv, index_col=0)
csv.close()
Sign up to request clarification or add additional context in comments.

5 Comments

to convert from bytes to text
it seems like it's doing something with the \n newline characters because I get a 1847941 rows x 0 columns pandas dataframe, with all the information as an index
that rather seems like a separator problem to me; have a look at the sep parameter for read_csv
I checked that as well with sep=',' but no luck. One hint I think is that when dump the output a csv it appears with double quotes. I will update the question to reflect this
ah, I see. this output is not CSV. if you don't control the command generating this output, you have to take care of it in Python. I'll update the answer accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.