1

I have a file with 10 lines as header, actually the 11th row is column names. I know how to get rid of the lines to have dataframe, using:

df = pd.read_csv(inputfile, delimiter = "\t", skiprows=10)

but I want to bring back them and paste the header to output file.

inputfile:

[Header]
APT Version     1.9.4
Processing Date 12/18/2018 11:35 AM
Content         MMMM
Num col        64
Total rows      642
Num Samples     350
Total Samples   350
File    93 of 350
[Data]
Name     Sample    col1    col2    col3        
1002         SPP           2       3   0.2573
4
  • Are you having issues with getting the data part, or are you saying you just need help with the transfer of the first 10 lines of the input file to the output file? Commented Apr 21, 2019 at 0:27
  • I just want to transfer of the first 10 lines of input file with out any change to output file, Commented Apr 21, 2019 at 0:33
  • I removed the first 10 lines and worked on the main data, now I want to have the first 10 lines plus changed main data Commented Apr 21, 2019 at 0:35
  • Just read in the first 10 lines, save them in a variable, then spit them out when you are ready to write the output. See answer below. Commented Apr 21, 2019 at 0:48

1 Answer 1

1

Since it looks like you have the dataframe part working correctly, if you want to copy the first 10 lines from the input file to the output file, just read it in real quick. You can use the readline() function rather than read() so you don't accidentally digest the entire file. Using a list comprehension as a hack allows you to conrol how many lines you want to use. In this case, we're reading in 10 lines with the help of range(10) as your iteration counter. Using the context manager (with), you don't have to worry about file access issues when you're reaady to read the dataframe.

with open('inputfile.tsv') as f:
    header = [f.readline() for i in range(10)]

The comprehension is the same as the code below, just a lot easier to scan and comprehensions tend to work faster than loops.

# don't actually do it this way
header = []
with open('inputfile.tsv') as f:
    for i in range(10):
        header.append(f.readline())

When you're ready for the outpt file, just join the lines together before you print out the data. If you omit the file handler in the df.to_csv() function, it will return the string. You can immediately print out the data right below the header

with open('output.txt', 'w') as f:
    f.write("".join(header))
    f.write(df.to_csv())
Sign up to request clarification or add additional context in comments.

5 Comments

I used pandas "pd.read_csv" to read the 10 first lines and save them as a dataframe, but I couldn't join this df with the main analysed df in the output file, is there any way in pandas?
Do you need to do any processing on the first 10 lines? If not, I'd suggest just reading them in the simplest way possible as shown above. Pandas is a fantastic tool and I love using it and want the world to know how awesome it can be, but that doesn't mean it's the best solution for every problem.
Is your issue that you're using pd.to_csv() and you don't know how to put the header information in the file generated by pd.to_csv()?
yes, I read the header information using " header = pd.read_csv(inputfile, header=None, nrows=10)", but I don't know how to put header info and main data to output using pd.to_csv(), do you have any suggestion?
I do! I updated my answer a few minutes ago. Just don't pass in a filename to df.to_csv(). It will return the output as a string which you can pass to the f.write() function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.