2

Say I have the following HTML string

<head>

</head>

<body>
<img src="stickman.gif" width="24" height="39" alt="Stickman">
<a href="http://www.w3schools.com">W3Schools</a>
</body> 

I want to add a string in between the <head> tags. So the final HTML string become

<head>
<base href="http://www.w3schools.com/images/">
</head>

<body>
<img src="stickman.gif" width="24" height="39" alt="Stickman">
<a href="http://www.w3schools.com">W3Schools</a>
</body> 

So I have to search for the first occurrence of the <head> string then insert <base href="http://www.w3schools.com/images/"> right after.

How do I do this in C#.

9
  • You can do this in more ways then one: -Regular Expressions -Splitting your text by a certain character and writing the data you have gotten with the line you need to add -Using an XMLreader/writer Commented May 13, 2013 at 8:10
  • I don't mind using Regex Commented May 13, 2013 at 8:13
  • 2
    Regex is really overkill for what you want to do here. Simple .NET string manipulation is good enough and a lot less complex. Commented May 13, 2013 at 8:18
  • Don't use RegEx when manipulating HTML/XML, because HTML is not regular, and RegEx is for manipulating Regular Expressions. Commented May 13, 2013 at 8:25
  • @abelenky: Since when is RegEx for manipulating Regular Expressions? Commented May 13, 2013 at 8:39

4 Answers 4

7

So why not just do something easy like

myHtmlString.Replace("<head>", "<head><base href=\"http://www.w3schools.com/images/\">");

Not the most elegant or expandable, but satisfies the conditions of your question.

Sign up to request clarification or add additional context in comments.

4 Comments

For some reason it does not found the `<head>' tag if there's some other string before it
The above doesn't care if there is something before or after it. There must be something else going on if it's not working.
@PutraKg, your question doesn't have any text before the head tag. Is this question about all html? stackoverflow.com/a/1732454/659190
The question showed is just one example. I am testing the answer against real websites too. Sorry for not mentioning that as I forgot to consider against doctype etc when writing the question.
5

Another way of doing this:

string html = "<head></head><body><img src=\"stickman.gif\" width=\"24\" height=\"39\" alt=\"Stickman\"><a href=\"http://www.w3schools.com\">W3Schools</a></body>";
var index = html.IndexOf("<head>");

if (index >= 0)
{
     html = html.Insert(index + "<head>".Length, "<base href=\"http://www.w3schools.com/images/\">");
}

Comments

2

This is how can it be done with Regex, if you prefer to use it

public string ReplaceHead(string html)
{
    string rx = "<head[^>]*>((.|\n)*?)head>";
    Regex r = new Regex(rx);
    MatchCollection matches = r.Matches(html);
    string s1, s2;
    Match m = matches[0];
    s1 = m.Value;
    s2 = "<base href="http://www.w3schools.com/images/">" + s1;
    html = html.Replace(s1, s2);
    return html;
}

4 Comments

no, it's not. Question was not: "which is the easier way to do it", but "how can it be done with regex" (but now it has been edited) Anyway can be useful to check what you have inside the tag before doing the replace, that's why I uses a similar solution in a project of mine
I would use this but some people suggested that in my situation regex would probably be over-killed. So I edited my question and remove the regex preference. Anyway, I appreciate your answer.
It's overkill if you are sure to have only 1 <head> tag, and you are sure the string you'll add it's not already inside the tag. With my solution you can check if "w3schools.com/images" etc is already inside before doing replace. Of course this is silly in the example, but it can be different for a real case
As one of the answer of the thread Jodrell has linked: "I like to parse HTML with regular expressions. I don't attempt to parse idiot HTML that is deliberately broken."
1

Just replace the HEAD's tail, in HTML there should only be one:

"<head></head>".Replace( "</head>" , "<a href=\"http://www.w3fools.com\">W3Fools</a>" + "</head>" );

You can flip this around to and replace the HEAD's open, to insert a tag at the beginning.

If you need anything more complex then you should look into using parsed HTML.

1 Comment

What do you mean by proper html? I agree with the last part.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.