0

I have some code like this:

f = open("words.txt", "w")
subprocess.call(["grep", p, "/usr/share/dict/words"], stdout=f)
f.close()

I want to grep the MacOs dictionary for a certain pattern and write the results to words.txt. For example, if I want to do something like grep '\<a.\>' /usr/share/dict/words, I'd run the above code with p = "'\<a.\>'". However, the subprocess call doesn't seem to work properly and words.txt remains empty. Any thoughts on why that is? Also, is there a way to apply regex to /usr/share/dict/words without calling a grep-subprocess?

edit: When I run grep '\<a.\>' /usr/share/dict/words in my terminal, I get words like: aa ad ae ah ai ak al am an ar as at aw ax ay as results in the terminal (or a file if I redirect them there). This is what I expect words.txt to have after I run the subprocess call.

2
  • 1
    Please provide at least one match that you expect. So far I suppose that you can safely remove ' and \ characters from your pattern. You don't have to escape characters because you are not using shell right now. Your current call should work with additional argument shell=True Commented Nov 7, 2016 at 11:54
  • @woockashek added example matches Commented Nov 7, 2016 at 13:17

2 Answers 2

2

Like @woockashek already commented, you are not getting any results because there are no hits on '\<a.\>' in your input file. You are probably actually hoping to find hits for \<a.\> but then obviously you need to omit the single quotes, which are messing you up.

Of course, Python knows full well how to look for a regex in a file.

import re

rx = re.compile(r'\ba.\b')
with open('/usr/share/dict/words', 'Ur') as reader, open('words.txt', 'w') as writer:
    for line in reader:
        if rx.search(line):
            print(line, file=writer, end='')

The single quotes here are part of Python's string syntax, just like the single quotes on the command line are part of the shell's syntax. In neither case are they part of the actual regular expression you are searching for.

The subprocess.Popen documentation vaguely alludes to the frequently overlooked fact that the shell's quoting is not necessary or useful when you don't have shell=True (which usually you should avoid anyway, for this and other reasons).

Python unfortunately doesn't support \< and \> as word boundary operators, so we have to use (the functionally equivalent) \b instead.

Sign up to request clarification or add additional context in comments.

Comments

-1

The standard input and output channels for the process started by call() are bound to the parent’s input and output. That means the calling programm cannot capture the output of the command. Use check_output() to capture the output for later processing:

import subprocess
f = open("words.txt", "w")
output = subprocess.check_output(['grep', p ,'-1'])
file.write(output)
print output
f.close()

PD: I hope it works, i cant check the answer because i have not MacOS to try it.

1 Comment

You can bind the standard output to an open file handle just fine; that's not the problem here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.