1

I have a database containing Page ojects with html content. A lot of the rows in the db contain this content

  <p style="float: left; margin-right: 20px; height: 300px;">
        <img src="...">More html ...
 </p>

So I created a super simple regex replace:

 foreach (var page in db.Pages)
                {
                    string pattern = @"<p style=""float: left; margin-right: 20px;"">(.*)</p>/ms";
                    if( Regex.Match(page.Content, pattern).Success)
                    {
                        page.Content = Regex.Replace(page.Content, pattern, "<div class=\"contentimage\" >$1</div>");
                    }
                }
//                db.SubmitChanges();

Altough when I run the regex in a regex testing tool, it works. but in c# code it doesn't. Can anyone help me out please.

If anyone know how to do an update with the regex replace in sql, that would be fine to.

Regex isn't my strongest point (altough a great shame). But it is on my list of things to learn asap ;)

4
  • 6
    I hate to say it, but regex is really not the tool of choice for procesing html... Commented Jul 26, 2010 at 20:45
  • Come now Marc, have you never read a perl web script? Those guys make clear that regex is the tool of choice for everything! Unless you're one of those lamo microsoft developers who think code should be readable, and regex should have a standard non-language specific set of instructions.. Commented Jul 26, 2010 at 20:58
  • 1
    For all those who think Regex should be used to process HTML I would recommend a good read: stackoverflow.com/questions/1732348/… IMHO every time someone tags a question with both regex and html tags he should be quoted this answer. Commented Jul 26, 2010 at 21:08
  • @Darin +1 for referencing that question Commented Jul 27, 2010 at 13:12

1 Answer 1

3

Your problem is "/ms". You're trying to specify a couple of regex flags, but C# specifies flags differently than php/perl (your regex tester probably tests regexes aimed at those languages. I suggest Expresso (it's free) for working with .NET regexes). Change your pattern to this:

string pattern = @"<p style=""float: left; margin-right: 20px; height: 300px;"">(.*)</p>";

(also note that I added the "height" attribute in order to make it match -- was that just a typo?)

And your regex instantiation to this:

if( Regex.Match(page.Content, pattern,RegexOptions.Multiline | RegexOptions.Singleline).Success)

And it should work.

[EDIT] Oh, and fixing the replace method:

page.Content = Regex.Replace(page.Content, pattern, "<div class=\"contentimage\" >$1</div>", RegexOptions.Multiline | RegexOptions.Singleline);
Sign up to request clarification or add additional context in comments.

4 Comments

And I completely agree with Marc that unless your HTML is always going to be very similar to your example, Regex is not really the way to go.
Thanks alot, worked like a charm. And @Marc Gravell: Regex was the right tool for this job. Try putting this in less then 10 lines with a html parser :D this works like a charm, ergo: regex 1 - htmlparser 0 ;) I wasn't a fan off regex miself, but more and more I am becoming one
Well as long as the HTML is always going to be perfectly formed like this, it'll work OK. Any other case, though, and it'll break.
It worked great, and offcourse I already figured the replace out ;) great help, thanks again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.