0

I have the following command to pass into python script.

awk '/^>/{n=split($0,a,"_")} /string/{sum+=a[n]} END{print sum}' filein.fasta

Whatever I try It does a mess (os.system, popen, subprocess.call...) my last attempt is:

string = this variable is a string like "acgactactgtcagtgctgac" provided in a loop     
filein = open("filein.fasta")
with open('fileout.txt', 'a+') as outputd:
        subprocess.call(['awk', '\'/^>/{n=split($0,a,"_")}', '/' + line + '/{sum+=a[n]}', 'END{print sum}\'', filein], stdout=outputd, shell=True)    

in this way, I have no error at this point but it doesn't work properly because it causes a bug after in the script. How can I properly pass this command in python? the quote marks are a sore subject in this situation

1 Answer 1

1

Please avoid using awk commands in python scripts.

I really like awk but python can easily do what awk can do.

awk '/^>/{n=split($0,a,"_")} /string/{sum+=a[n]} END{print sum}' filein.fasta

does

For each line that contains > at begining, it splits using delimiter _. It keeps parsing and when /string/ is found, it adds the last field of the splitted line to variable sum.

Using python :

sum = 0
with open("filein.fasta") as input:
    for line in input:
        if line[0] == '>':
            fields = line.split('_')
        if (string in line) and fields:
            sum += int(fields[-1]) # or float
print(sum)

Calling subprocess will make your code less portable and harder to debug or monitor.

Btw, the awk script is not good, it should be :

awk '/^>/{n=split($0,a,"_")} /string/&&n{sum+=a[n]} END{print sum}' filein.fasta
Sign up to request clarification or add additional context in comments.

1 Comment

I have a fasta file. in the ID of the sequences there is a number of abbundance. I need to search in the file all the occurrances of a specific string that I provide with a variable and it has to output the sum of the values of abbundance . the sequences id is composed like: ">NODE_2_length_1959_cov_102497" and I need the sum of the cov value in the end of the ID line(102497 + ... = sum). the string that I provide is not in the same line as the ID but in one of the lines above. so every time he find an occurrance it hase to store the cov value upon it and in the end sum all of them)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.