How to remove header lines of a txt file, then work on dataframe, again return header lines in output file

Question

I have a file with 10 lines as header, actually the 11th row is column names. I know how to get rid of the lines to have dataframe, using:

df = pd.read_csv(inputfile, delimiter = "\t", skiprows=10)

but I want to bring back them and paste the header to output file.

inputfile:

[Header]
APT Version     1.9.4
Processing Date 12/18/2018 11:35 AM
Content         MMMM
Num col        64
Total rows      642
Num Samples     350
Total Samples   350
File    93 of 350
[Data]
Name     Sample    col1    col2    col3        
1002         SPP           2       3   0.2573

Are you having issues with getting the data part, or are you saying you just need help with the transfer of the first 10 lines of the input file to the output file? — Cohan
– Cohan, Commented Apr 21, 2019 at 0:27
I just want to transfer of the first 10 lines of input file with out any change to output file, — jamo
– jamo, Commented Apr 21, 2019 at 0:33
I removed the first 10 lines and worked on the main data, now I want to have the first 10 lines plus changed main data — jamo
– jamo, Commented Apr 21, 2019 at 0:35
Just read in the first 10 lines, save them in a variable, then spit them out when you are ready to write the output. See answer below. — Cohan
– Cohan, Commented Apr 21, 2019 at 0:48

Cohan · Accepted Answer · 2019-04-21 01:18:16Z

1

Since it looks like you have the dataframe part working correctly, if you want to copy the first 10 lines from the input file to the output file, just read it in real quick. You can use the readline() function rather than read() so you don't accidentally digest the entire file. Using a list comprehension as a hack allows you to conrol how many lines you want to use. In this case, we're reading in 10 lines with the help of range(10) as your iteration counter. Using the context manager (with), you don't have to worry about file access issues when you're reaady to read the dataframe.

with open('inputfile.tsv') as f:
    header = [f.readline() for i in range(10)]

The comprehension is the same as the code below, just a lot easier to scan and comprehensions tend to work faster than loops.

# don't actually do it this way
header = []
with open('inputfile.tsv') as f:
    for i in range(10):
        header.append(f.readline())

When you're ready for the outpt file, just join the lines together before you print out the data. If you omit the file handler in the df.to_csv() function, it will return the string. You can immediately print out the data right below the header

with open('output.txt', 'w') as f:
    f.write("".join(header))
    f.write(df.to_csv())

edited Apr 21, 2019 at 1:18

answered Apr 21, 2019 at 0:33

Cohan

4,5942 gold badges25 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

jamo Over a year ago

I used pandas "pd.read_csv" to read the 10 first lines and save them as a dataframe, but I couldn't join this df with the main analysed df in the output file, is there any way in pandas?

Cohan Over a year ago

Do you need to do any processing on the first 10 lines? If not, I'd suggest just reading them in the simplest way possible as shown above. Pandas is a fantastic tool and I love using it and want the world to know how awesome it can be, but that doesn't mean it's the best solution for every problem.

Cohan Over a year ago

Is your issue that you're using pd.to_csv() and you don't know how to put the header information in the file generated by pd.to_csv()?

jamo Over a year ago

yes, I read the header information using " header = pd.read_csv(inputfile, header=None, nrows=10)", but I don't know how to put header info and main data to output using pd.to_csv(), do you have any suggestion?

Cohan Over a year ago

I do! I updated my answer a few minutes ago. Just don't pass in a filename to df.to_csv(). It will return the output as a string which you can pass to the f.write() function.

Collectives™ on Stack Overflow

How to remove header lines of a txt file, then work on dataframe, again return header lines in output file

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related