3

How to grep data item from this html string

a <- "<div class=\"tst-10\">100%</div>"

so that the result is 100%? The main idea is to get data between > <.

2 Answers 2

5

I would use gsub() in this case:

gsub("(<.*>)(.*)(<.*>)", "\\2", a)
[1] "100%"

Basically, this breaks the string up into three parts, each separated by regular brackets ( and ). We can then use these as backreferences. The contents matched by the first set of backreferences can be referred to as \1 (use a double slash to escape the special character), those matched in the second, \2 and so on.

So, essentially, we're saying parse this string, figure out what matches my conditions, and return only the second backreference.

Piece by piece:

  • <.*> says to look for a "<" followed by any number of any characters ".*" up until you get to a ">"
  • .* means to match any number of characters (up until the next condition)

Keeping this in mind, you could actually probably use gsub("(.*>)(.*)(<.*)", "\\2", a) and get the same result.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, works perfectly, can you explain how it does what it does?
@jrara, I've updated my answer to try to explain what it does. Honestly, though, I'm terrible at explaining regex in general.
4

I always use this regular expression to remove HTML tags:

gsub("<(.|\n)*?>","",a)

Gives:

[1] "100%"

Differs from mrdwab's in that I just remove every html tag and his extracts content from within html tags, which is probably more appropriate for this example. Look out that both will give different results if there are more tags:

> gsub("(<.*>)(.*)(<.*>)", "\\2", paste(a,"<lalala>foo</lalala>"))
[1] "foo"

> gsub("<(.|\n)*?>","", paste(a,"<lalala>foo</lalala>"))
[1] "100% foo"

I think that I found it here on SO once, not sure which answer.

1 Comment

Good call on possibly needing to extract all html tags and not just matching a specific pattern. +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.