2

How can I extract a valid URL from a string like this one

h*tps://www.google.com/url?q=h*tp://www.site.net/file.doc&sa=U&ei=_YeOUc&ved=0CB&usg=AFQjCN-5OX

I want to extract this part: h*tp://www.site.net/file.doc, this is my valid URL.

6
  • 1
    Which bit do you what from the string? Commented May 12, 2013 at 8:50
  • What do you mean with valid? Do you wont to replace the "*" with a "t" or what else? Commented May 12, 2013 at 8:50
  • 1
    Regex is the right way to go. Define the pattern that you want to extract, get the Regex's Matches and pick the one that you require. Commented May 12, 2013 at 8:50
  • Thanks for your attentions, i edited the question. Commented May 12, 2013 at 8:59
  • FeliceM, i replaced the t by the star because of the restriction of posting more than two link.. i'm new here! Thank you anyway this informationn can be usefull to me. Commented May 12, 2013 at 9:03

4 Answers 4

5

Add System.Web.dll assembly and use HttpUtility class with static methods. Example:

using System;
using System.Web;


class MainClass
{
    public static void Main (string[] args)
    {
        Uri uri = new Uri("https://www.google.com/url?q=http://www.site.net/file.doc&sa=U&ei=_YeOUc&ved=0CB&usg=AFQjCN-5OX");
        Uri doc = new Uri (HttpUtility.ParseQueryString (uri.Query).Get ("q"));
        Console.WriteLine (doc);
    }
}
Sign up to request clarification or add additional context in comments.

5 Comments

With the protocol as h*tps, would Uri parse the string correctly?
Uri parse the string correctly.
Thats incorrect, tested and got an Invalid URI: The URI scheme is not valid. exception.
Whick version of .net/mono?
5, the version is irrelevant, the class hasn't changed. h*tps is not a valid protocol so how could it parse it?
1

I don't know what your other strings can look like, but if your 'valid URL' is between the first = and the first &, you could use:

(?<==).*?(?=&)

It basically looks for the first = and matches anything before the next &.

Tested here.

Comments

1

You can use split function

    string txt="https://www.google.com/url?q=http://www.site.net/file.doc&sa=U&ei=_YeOUc&ved=0CB&usg=AFQjCN-5OX";

    txt.split("?q=")[1].split("&")[0];

3 Comments

There are still a bunch of garbage behind.
ok. You can use so: 'txt.split("?q=")[1].split("&")[0];'
Please edit it into your post.
0

in this particular case with the string you posted you can do this:

string input = "your URL";
string newString = input.Substring(36, 22) ;

But if the length of the initial part of the URL changes, and also the lenght of the part you like to extract changes, then would not work.

3 Comments

this is also usefull thanks:
You can replace the numbers (36, 22) with int variables and fix the value counting the character to a certain occurrence. My answer is very basic.
@NaourassDerouichi: Please use the Uri class and ParseQuery utility to process the URL.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.