3

I am having trouble passing a string in python (python variable) as input to a sequence alignment program (muscle) on the command line (bash). muscle can take stdin from the command line, e.g.;

~# echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" | muscle

MUSCLE v3.8.31 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

- 2 seqs, max length 8, avg  length 8
00:00:00     22 MB(2%)  Iter   1  100.00%  K-mer dist pass 1
00:00:00     22 MB(2%)  Iter   1  100.00%  K-mer dist pass 2
00:00:00     23 MB(2%)  Iter   1  100.00%  Align node       
00:00:00     23 MB(2%)  Iter   1  100.00%  Root alignment
>1
ATTTCTCT
>2
ATTTCTCC

It is this fasta alignment (the last 4 lines) that I am after - you can redirect the output of muscle (echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" | muscle > out.file to get just the fasta alignment, which I need for downstream processing. BUt to get there, I have to pass 'muscle' the FASTA sequence string, which I think is best done via echo as above in bash.

So, the script takes two multiFASTA files and and pairs each FASTA sequence based on a list of ids for each file - this is working (though I realize it is perhaps not the most efficient way to go about it - I am new python user). Then I need to align each set in muscle before calculating distances/difference.

Here is what I have so far :

#! /env/python
from pairwise_distances import K2Pdistance
import pairwise_distances
import subprocess
from subprocess import call
import os     

fasta1=['']
for line in open('test1.fasta'):
    if not line.startswith('>'):
         fasta1.append(line.strip())

fasta2=['']
for line in open('test2.fasta'):
    if not line.startswith('>'):
         fasta2.append(line.strip())

for l1, l2 in zip(open('test1.list'), open ('test2.list')):
 try:
   a=fasta1[int(l1)]
 except IndexError,e:  
   a="GGG"

 try:
   b=fasta2[int(l2)]
 except (IndexError):
   b="CCC"

 temp_align=str(">1"+'\n'+a+'\n'+">2"+'\n'+b) 

 first=subprocess.check_output(['echo','-e',temp_align])
 print first
 subprocess.call(['bash','muscle'], stdin=first.stdout)
 print second

 #new=K2Pdistance(outfast1,outfast2) 
 #subprocess.Popen(['bash','muscle'], stdin=subprocess.check_output(['echo','-e',temp_align], stdout=subprocess.PIPE).std.out)`

The 'temp_align' variable is what I want to pass to muscle - it is the result of combining the appropriate fasta sequence from each multiFASTA file, for each loop over the ids/list, and is formatted like a FASTA file.

This issue is that I can echo the FASTA string, but I cant seem to "pipe" that via stdin to muscle... The main error I get is: AttributeError: 'str' object has no attribute 'stdout'.

~#python Beta3.py 
>1
ATCGACTACT
>2
ATCGCGCTACT

Traceback (most recent call last):
  File "Beta3.py", line 38, in <module>
    subprocess.call(['bash','muscle'], stdin=first.stdout)
AttributeError: 'str' object has no attribute 'stdout'

Can subprocess or other commandline modules take strings as stdin? IF not, I am not sure how I can echo and then pipe into muscle... Any other ideas would be much appreciated. I tried a few options from this thread unix.stackexchange

but no luck yet.

EDIT: Here are some example files:

~# cat test1.fasta 
>1
ATCGACTACT
>2
ACTCAGTCA
>3
TTCACAGGG
~# cat test2.fasta 
>1
ATCGCGCTACT
>2
GGCGTCAGTCA
>3
TTCAACCCCAGGG
~# cat test1.list 
1
2
3
~# cat test2.list 
1
3
2

EDIT 2: That previous post is related, but my question is about linking two bash commands starting with a python variable that is a string...and then, ideally, capturing the stdout of the second command back into a python variable... I dont quite see how to translate the answers on that post into a solution for my specific question...I guess I don;t fully understand what the poster was trying to do.

5
  • first is a string, to send it to a command see: stackoverflow.com/a/165662/7311767 Commented Feb 25, 2017 at 4:50
  • Why subprocess.call(['bash','muscle'], stdin=first.stdout) and not just subprocess.call(['muscle'], stdin=first.stdout)? Commented Feb 25, 2017 at 4:52
  • It doesn't make a difference (string error still present) , but I guess 'bash' is not needed there. Commented Feb 25, 2017 at 5:08
  • The result of subprocess.check_output is a string (stdout of the exec'd subprocess). You want to either use subprocess.call or provide a string like object as the stdin parameter which should simplify things. Commented Feb 25, 2017 at 18:02
  • This question is already closed as a duplicate, so I can't provide a full answer, but BioPython has a wrapper for muscle to do multiple sequence alignment, see Chapter 6.4.2 in the tutorial. Commented Feb 25, 2017 at 20:12

1 Answer 1

3

It seems that you want to communicate with the muscle process, then you need a PIPE, use this

(out, err) = subprocess.Popen(['muscle'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate(temp_align)
print out
Sign up to request clarification or add additional context in comments.

1 Comment

ahh. I get it it now. Thanks so much for your help.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.