2

In the following program

I want to access/pipe the data from one function in the downstream function.

With the python code something like below:

def main():
data1, data2, data3 = read_file()
do_calc(data1, data2, data3)   

def read_file():
    data1 = ""
    data2 = ""
    data3 = ""

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something....
        data1 += calculated_values

    file2 = open('file2.txt', 'r+').read()
    for line in file1
        do something...
        data2 += calculated_values    

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something...
        data3 += calculated_values

    return data1, data2, data3

def do_calc(data1, data2, data3):
    d1_frame = pd.read_table(data1, sep='\t')
    d2_frame = pd.read_table(data2, sep='\t')
    d3_frame = pd.read_table(data3, sep='\t')

    all_data = [d1_frame, d2_frame, d3_frame]

main()

What is wrong with the given code? looks like panda isn't able to read the input files properly but is printing the values from data1, 2 and 3 to the screen.

read_hdf seems to read the file but not properly. Is there a way to read the data returned from function directly into pandas (without writing/reading into a file).

Error message:

Traceback (most recent call last):

  File "calc.py", line 757, in <module>

    main()

  File "calc.py", line 137, in main

    merge_tables(pop1_freq_table, pop2_freq_table, f1_freq_table)

  File "calc.py", line 373, in merge_tables

    df1 = pd.read_table(pop1_freq_table, sep='\t')

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f

    return _read(filepath_or_buffer, kwds)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read

    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in __init__

    self._make_engine(self.engine)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)

  File "/home/everestial007/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in __init__

    self._reader = _parser.TextReader(src, **kwds)

  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4019)

  File "pandas/parser.pyx", line 665, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7967)

FileNotFoundError: File b'0.667,0.333\n2\t15800126\tT\tT,A\t0.667,0.333\n2\t15800193\tC\tC,T\t0.667,0.333\n2\t15800244\tT\tT,C\......

I would appreciate any explanation.

1

2 Answers 2

3

read_table is expecting a file as input, but you pass a string of data instead of a string with the file location. You could write your data to a file and then read from that file. Assuming the string is already properly formatted:

filename = 'tab_separated_file_1.dat'
with open(filename, 'w') as f:
    f.write(data1)

df1 = pd.read_table(filename, sep='\t')
Sign up to request clarification or add additional context in comments.

3 Comments

I am able to write the data to a file and read it for the downstream function. But, this is not what I want. I want to read the file from stdout which became not possible for me then I resorted to piping data from one function to another. Any other suggestions with using stdout?? Thanks
It look like nrlakin provided a solution using StringIO.
Yes, thats what I had been looking for.
2

As other answers have said, read_table expects a file for input--or, more accurately, a "file-like object". You can use a StringIO object to wrap the data1, data2, and data3 strings in an object that will "behave" like a file when fed to pandas with a few tweaks to your code:

#Import StringIO...
# python 2
from StringIO import StringIO
# python 3
from io import StringIO

def main():
    data1, data2, data3 = read_file()
    do_calc(data1, data2, data3)   

def read_file():
    # use StringIO objects instead of strings...
    data1 = StringIO()
    data2 = StringIO()
    data3 = StringIO()

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something....
        # note that " += " became ".write()"
        data1.write(calculated_values)

    file2 = open('file2.txt', 'r+').read()
    for line in file1
        do something...
        data2.write(calculated_values)

    file1 = open('file1.txt', 'r+').read()
    for line in file1
        do something...
        data3.write(calculated_values)

    return data1, data2, data3

def do_calc(data1, data2, data3):
    d1_frame = pd.read_table(data1, sep='\t')
    d2_frame = pd.read_table(data2, sep='\t')
    d3_frame = pd.read_table(data3, sep='\t')

    all_data = [d1_frame, d2_frame, d3_frame]

main()

4 Comments

You are awesome. Thanks much !
Also, you can assign StringIo when reading the table. I just added StringIO when reading the files instead of when creating a null variable. The reason is that I also want to write the data1, 2, 3 to a file and be able to access it in another format elsewhere in the code.
@everestial007 no problem--I've had the same issue.
@everestial007 that's right--you can initialize from the file contents, or do some line-by-line processing where it says "do something..."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.