0

When I run my script in bash, I get the error: sh: 2: Syntax error: "|" unexpected. I don't know why, I want to use pipelines here, and a script in perl with that command works, but I need it in Python.

Example of input (text file):

Kribbella flavida
Saccharopolyspora erythraea
Nocardiopsis dassonvillei
Roseiflexus sp.

Script:

#!/usr/bin/python

import sys import os

input_ = open(sys.argv[1],'r') output_file = sys.argv[2]
#stopwords = open(sys.argv[3],'r')



names_board = []

for name in input_:
    names_board.append(name.rstrip())
    print(name) for row in names_board:    
    print(row)    
    os.system("esearch -db pubmed -query %s | efetch -format xml | xtract -pattern PubmedArticle -element AbstractText >> %s" % (name,
output_file))
4
  • 2
    What gets printed if you replace os.system with print? Does that look reasonable? Commented Nov 5, 2016 at 14:34
  • What operating system are you using? Have you read man esearch, man efetch, and man xtract? Commented Nov 5, 2016 at 14:34
  • This ubuntu, but this programs are eutilies from ncbi. Commented Nov 5, 2016 at 14:36
  • ok i see print... Commented Nov 5, 2016 at 14:40

2 Answers 2

2

A possibly unrelated problem is that you aren't properly quoting the input and output file names in the command. Use

os.system('esearch -db pubmed -query "%s" | efetch -format xml | xtract -pattern PubmedArticle -element AbstractText >> "%s"' % (name, output_file))

However, even that is not foolproof for all legal file names (such as filenames that contain a double quote). I would recommend using the subprocess module instead of os.system, leaving the shell out of the process altogether

esearch = ["esearch", "-db", "pubmed", "-query", name]
efetch = ["efetch", "-format", "xml"]
xtract = ["xtract", "-pattern", "PubmedArticle", "-element", "AbstractText"]
with open(sys.argv[2], "a") as output_file:
    p1 = subprocess.Popen(esearch, stdout=subprocess.PIPE)
    p2 = subprocess.Popen(efetch, stdin=p1.stdout, stdout=subprocess.PIPE)
    subprocess.call(xtract, stdin=p2.stdout, stdout=output_file)
Sign up to request clarification or add additional context in comments.

Comments

1

The problem is that name contains the newline that terminates the line read from input. When you interpolate name into the shell command, the newline gets inserted too, and the shell then treats it as the end of the first command. However, the second line then starts with a pipe symbol, which is a syntax error: pipe symbols must come between commands on the same line.

A good hint that that is the problem is found in the fact that sh reports an error at line 2, while the command seems to only consist of one line. After substitution, though, it is two lines, and the second one is problematic.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.