2
click <a href="javascript:validate('http://www.google.com');">here</a> to open google.com

I need to replace the above sentence to the following:

click <a href="http://www.google.com">here</a> to open google.com

Please help me with the regular expression to do this in C#

2
  • 4
    HtmlAgilityPack: htmlagilitypack.codeplex.com Commented Sep 17, 2011 at 15:37
  • Austin should submit this as an answer, as using the DOM may be a preferred solution to Regex parsing for this use case. Commented Sep 17, 2011 at 15:52

5 Answers 5

1
 Regex regex = new Regex ("href\=\".+?'(.+)'", 
            RegexOptions.IgnoreCase);
        MatchCollection matches = regex.Matches(text);

then youll need to extract Group #1 :

matches .Groups[1]

and this is your new value to assign.

Sign up to request clarification or add additional context in comments.

Comments

1

Here you go:

The Regex:

(?<=href\=")(javascript:validate\('(?<URL>[^"']*)'\);)

The Code:

string url = "click <a href=\"javascript:validate('http://www.google.com');\">here</a> to open google.com";
Regex regex = new Regex("(?<=href\\=\")javascript:validate\\('(?<URL>[^\"']*)'\\);");
string output = regex.Replace(url, "${URL}");

The Output:

click <a href="http://www.google.com">here</a> to open google.com

Comments

1

No Regex needed:

var s = 
    inputString.Replace(
        "javascript:validate('http://www.google.com');",
        "http://www.google.com" );

2 Comments

I think the OP wants to use regex because the url (google.com) will not be the same every time. I doubt he would ask about how to do this with regex if it were possible for him to just use Replace.
Yes, I do think that too, but from what he asked I cannot derive it, so I tried to answer just what he asked.
0

HtmlAgilityPack: http://htmlagilitypack.codeplex.com

This is the preferred method for parsing HTML.

Comments

0

Parsing the HTML as Austin suggested is a much more efficient way of doing this, but if you absolutely must use REGEX try something like this (referenced from MSDN System.Text.RegularExpressions Namespace):

using System;
using System.Text.RegularExpressions;

class MyClass
{
    static void Main(string[] args)
    {
        string pattern = @"<a href=\"[^\(]*\('([^']+)'\);\">";
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
        string sInput = "click <a href=\"javascript:validate('http://www.google.com');\">here</a> to open google.com";

        MyClass c = new MyClass();

        // Assign the replace method to the MatchEvaluator delegate.
        MatchEvaluator myEvaluator = new MatchEvaluator(c.ReplaceCC);

        // Write out the original string.
        Console.WriteLine(sInput);

        // Replace matched characters using the delegate method.
        sInput = r.Replace(sInput, myEvaluator);

        // Write out the modified string.
        Console.WriteLine(sInput);
    }

    // Replace each Regex cc match
    public string ReplaceCC(Match m)
    {
        return "click <a href=\"" + m.Group[0] + "\">";
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.