1

I have a string like this

orem ipsum dolor sit amet, consectetur adipiscing elit. Fusce rutrum, neque eu 
varius placerat, <p class="how-pkg"> leo diam viverra velit, </p> a commodo 
nibh metus nec orci. Nulla pharetra ut augue quis blandit.

I want to strip out a string value which is inside this <p class="how-pkg"> ------ </p>

Is there any way to accomplish this straight ahead?

without splitting the string multiple times.

Expected out put :leo diam viverra velit,

2
  • Do you only have one such tag in your string? Or can there be more? Commented Dec 20, 2013 at 7:20
  • htmlagilitypack.codeplex.com Commented Dec 20, 2013 at 7:22

6 Answers 6

4

use html agility pack and write

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(yourText);
var text = doc.DocumentNode.SelectNodes("/p[@class='how-pkg']").InnerText;
Sign up to request clarification or add additional context in comments.

4 Comments

doc is string in this case?
@SonerGönül nope, it's HtmlDocument from HtmlAgilityPack <- I've added creation of object to answer
+1 Also "/p[@class='how-pkg']" selector can be more appropriate if suddenly there will be other tags
@SergeyBerezovskiy thanks, added more specific selector
2

Using only string operations.

var searchForStart = "<p class=\"how-pkg\">";
int startIndex = input.IndexOf(searchForStart ) + searchFor.Length;
var searchForStop = "</p>";
int stopIndex = input.IndexIf(searchForStop, startIndex);

var output = text.Substring(startIndex, stopIndex - startIndex);

Comments

1
string s = "orem ipsum dolor sit amet, consectetur adipiscing elit. Fusce rutrum, neque eu varius placerat, <p class=\"how-pkg\"> leo diam viverra velit, </p> a commodo nibh metus nec orci. Nulla pharetra ut augue quis blandit.";
int start = s.IndexOf("<p class=\"how-pkg\">") + 20;
int end = s.IndexOf("</p>", start);

string result = s.Substring(start, end - start);

Comments

1

Assuming source is a your string:

var start = "<p class=\"how-pkg\">";
var p0 = source.IndexOf(start);
var p1 = source.IndexOf("</p>");
var s = source.Substring(p0 + start.Length, p1 - p0);

Something like that

3 Comments

It will actually include <p class=\"how-pkg\"> in the output.
This is wrong.. This gets output as a <p class="how-pkg"> leo diam viverra velit,
Now it won't work if there is </p> somewhere before <p class=\"how-pkg\"> in the input.
1

If your tag structure is always going to be the same then you can use regex to extract the value like this:

    var result = Regex.Match("<p class="how-pkg">hello</p>", "(?<=<p class="how-pkg">).*(?=</p>)").Value;

If your tag structure will change then you can capture both tag and values with named groups like this:

    <(?<tag>\.*)>(?<text>.*)</\k<tag>>

To capture just the value hello from <one>hello</one>:

    (?<=<.*>).*(?=</\w*>)

eg.

    var result = Regex.Match("<p class="how-pkg">hello</p>", "(?<=<.*>).*(?=</\w*>)").Value;

Comments

1

Simplest way:

  • search for <p (or <p class)
  • search for > after that - you found a tag (disregards of specified class) and opening point
  • (optinal) check if you support this class
  • search for </p> - you found result and the point where continue search (if necessary).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.