1

I have a chunck of text in a file:

<tr bgcolor="#F9F9F9">
     <td align="left">8/7/2012 11:23:42 AM</td>
     <td align="left"><em>Here is the text I want to parse out</em></td>
     <td class="ra">9.00</td>
     <td class="ra">297.00</td>
     <td class="ra">0.00</td>
     <td class="ra">0.00</td>
     <td class="ra">$0.00</td>
     <td class="ra">$0.50</td>
     <td class="ra"></td>
 </tr>

using grep I would like to end up with the result being

Here is the text I want to parse out

Working on the code now I have

cat file.txt | grep -m 1 -oP '<em>[^</em>]*'

but that does not work... thanks for your help!

2
  • 2
    cat file.txt | grep ... can be simplified to grep ... file.txt. Commented Aug 7, 2012 at 17:13
  • 2
    Do note that while what you want to do is possible (as demonstrated in answers below), regex is generally not the right tool to parse XML. For more robust solutions, use tools such as xmlstarlet or a language that gives you access to a proper XML parser. Commented Aug 7, 2012 at 17:20

1 Answer 1

4

A correct regex would be (?<=<em>).*?(?=</em>).

So, try:

grep -m 1 -oP '(?<=<em>).*?(?=</em>)' file.txt
Sign up to request clarification or add additional context in comments.

3 Comments

That gives me this Here is the text I want to parse out</em></td> <td class="ra">9.00</td> <td class="ra">297.00</td> <td class="ra">0.00</td> <td class="ra">0.00</td> <td class="ra">$0.00</td> <td class="ra">$0.50</td> <td class="ra"></td> </tr>
OK, so what it is doing is going to the last </em> which is in another text block below it... should have mentioned that, so I need the end to be the first occurrence of </em>... make sense?
@GregAlexander That could happen if the XML was all in one line, rather than nicely formatted as you show. Try to add a ? after * as I did in the edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.