1

I'm trying to write a simple python script where

  1. it takes values from stdin
  2. replaces a specific matched word
  3. passes on the output with the NEW value back to stdout

I only have the part where it takes the values from stdin and looks for the matching words, I'm a bit stuck after that.

import re
import sys

for line in sys.stdin:
    matchObj = re.search(r'<something>(.*)</something>',line)
    if matchObj:
        oldWord = matchObj.group(1)
        print oldWord

Contents of foo

<something>REPLACEME</something>
<blah>UNTOUCH</blah>

Ideally if I run this command

cat foo | ./test.py

I would get something like this

<something>NEWWORD</something
<blah>UNTOUCH</blah>
1
  • RTFM re.sub(). Commented Oct 22, 2014 at 16:25

2 Answers 2

1

Are you looking for re.sub?

import re
import sys

for line in sys.stdin:
    sys.stdout.write(re.sub(r'(<something>)REPLACEME(</something>)',
                            r'\1NEWWORD\2',
                            line))

Running the above on your example data:

$ echo '<something>REPLACEME</something>\n<something>UNTOUCH</something>' | python2 test.py
<something>NEWWORD</something>
<blah>UNTOUCH</blah>

Note that parsing XML with regular expressions is probably a bad idea. The Python standard library comes with a number of XML modules.

Here's an example:

import sys
import xml.etree.ElementTree

tree = xml.etree.ElementTree.parse(sys.stdin)
root = tree.getroot()

for node in root.iter('something'):
    if node.text == 'REPLACEME':
        node.text == 'NEWWORD'

tree.write(sys.stdout)

The above would work just the same:

$ echo '<root><something>REPLACEME</something>\n<blah>UNTOUCH</blah></root>' | python2 test.py
<root><something>REPLACEME</something>
<blah>UNTOUCH</blah></root>
Sign up to request clarification or add additional context in comments.

Comments

1

firs if you run cat foo | ./test.py you got test.py: command not found , you need to run this : cat foo |python ./test.py .

then the output of your code will be :

REPLACEME

but for the output that you want, you need to use re.sub():

import re
import sys

for line in sys.stdin:
    matchObj = re.sub(r'<something>(.*)</something>','<something>NEWWORD</something>',line)
    if matchObj:
        print matchObj

output :

<something>NEWWORD</something>

<blah>UNTOUCH</blah>

Also as a pythonic way you can use The ElementTree XML API

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.