2

I have a big file like below example:

1   10161   10166   3
1   10166   10172   2
1   10172   10182   1
1   10183   10192   1
1   10193   10199   1
1   10212   10248   1
1   10260   10296   1
1   11169   11205   1
1   11336   11372   1
2   11564   11586   2
2   11586   11587   3
2   11587   11600   4
3   11600   11622   2

I would like to add a "chr" at the beginning of each line, for example:

chr1    10161   10166   3
chr1    10166   10172   2
chr1    10172   10182   1
chr1    10183   10192   1
chr1    10193   10199   1
chr1    10212   10248   1
chr1    10260   10296   1
chr1    11169   11205   1
chr1    11336   11372   1
chr2    11564   11586   2
chr2    11586   11587   3
chr2    11587   11600   4
chr3    11600   11622   2

I tried the following code in python:

   file = open("myfile.bg", "r")
   for line in file: 
      newline = "chr" + line
   out = open("outfile.bg", "w")
   for new in newline:
      out.write("n"+new)

but did not return what I wanted. do you know how to fix the code for this purpose?

2
  • 1) you must concatenate the strings on newline (eg +=) 2) please post the result or the error if any Commented Oct 4, 2017 at 18:07
  • Not necessary now since the question's been answered, but it's typically helpful if you could include the output you're seeing. Commented Oct 4, 2017 at 18:31

3 Answers 3

2

Totally agree with @rychaza, here's my version using your code

file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
    out.write("chr" + line)
out.close()
file.close()
Sign up to request clarification or add additional context in comments.

4 Comments

You can't open the same file for input and output (at least not if it's larger than your stdio buffer size). In addition you're leaking file handles.
@thebjorn The answer isn't trying to - the input and output files are different.
ah, sorry, my bad.
@thebjorn the outfile inside the request was diffirent from the input file so this is possible and it is working
1

The problem with your code is that you iterate over the input file without doing anything with the data you read:

file = open("myfile.bg", "r")
for line in file: 
    newline = "chr" + line

the last line assigns each line in myfile.bg to the newline variable (a string, with 'chr' prepended), each line overwriting the previous result.

Then you iterate over the string in newline (which will be the last line in the input file, with 'chr' prepended):

out = open("outfile.bg", "w")
for new in newline:       # <== this iterates over a string, so `new` will be individual characters
    out.write("n"+new)    # this only writes 'n' before each character in newline

If you're just doing this once, e.g. in the shell, you could use the one-liner:

open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])

more correct (especially in a program, where you would care about open file handles etc.) would be:

with open('myfile.bg') as infp:
    lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
    outfp.writelines(['chr' + line for line in lines])

if the file is really big (close to the size of your available memory), you'll need to process it incrementally:

with open('myfile.bg') as infp:
    with open('outfile.bg', 'w') as outfp:
        for line in infp:
            outfp.write('chr' + line)

(this is much slower than the first two versions though..)

4 Comments

Only concern I see here is memory usage if the file is large.
What problem is the temporary file trying to solve? My only thought is if there is an antagonistic reader that could open it while it was being written, but that would be an issue for any file size due to buffering.
you can't open the same file for reading and writing, especially here, since you're writing more data than you're reading you'll end up reading new data instead of old data. The problem isn't likely to show up until your file size is larger than your stdio buffers though..
Ah indeed if you want to replace the same file. The question had been writing to a different file, so I wasn't sure where the temporary file was coming in.
0

The issue is you are iterating the input and re-setting the same variable (newline) for every line, then opening a file for output and iterating newline which is a string, so new will be each character in that string.

I think something like this should be what you're looking for:

with open('myfile.bg','rb') as file:
  with open('outfile.bg','wb') as out:
    for line in file:
      out.write('chr' + line)

When iterating a file, line should already contain the trailing newline.

The with statements will automatically clean up the file handle when the block ends.

1 Comment

@thebjorn What doesn't work? When I tested it it appeared to work perfectly. What output are you seeing?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.