1

I'm trying to create a parsing system for c#, to block my program from fetching images from "banned" websites that are located in a list. I have tried using bool class, to do a Regex.Replace operation, unfortunately it didn't work out.

To elaborate on what I exactly would like, this is an example: I have a List BannedSites = new List { "site" };

 if(Bannedsites.Contains(input))
 { 
    Don't go to that site
 }
 else 
 { 
    Go to that site
 }

Though the error I mostly get is I have "site" in the list, though if someone does "site " with a space afterwards it goes to the else statement, since it doesn't directly exist in the list, or if someone does "site?" and we know a questionmark at the end of the url doesn't make a difference usually to access the site, so they bypass it again. Is it possible to do something that if the input contains "site", WITHING the string, for it to not go to the site. Sorry if this is a simple code, though I haven't been able to figure it out and google didn't help.

Thanks in advance!

2 Answers 2

5

You can use LINQ's .Any to help with that:

if(Bannedsites.Any(x => input.Contains(x)) {
    // Don't go to that site
} else {
    // Go to that site
}

Remember to use .ToUpperInvariant() on everything to make it case-insensitive.

Sign up to request clarification or add additional context in comments.

3 Comments

beware of false positives, for example "wokersexchange" contains "sexchange"
@Theraot: Or the site with the hyphen. Pre-hyphen.
If it worked @Nom, please see the holo tickbox next to minitech's answer, tick it and it will give you 2 points. Over 10 points and you can upvote! ps mini, I voted for you in mod election as #1, good luck mate
2

If you make sure that you only have the domain names (and arguably ips) in the list Bannedsites then you can look for the domain only.

To get the domain of a Uri, do as follows:

var uri = new Uri("http://stackoverflow.com/questions/11060418/c-sharp-string-parsing-containing-in-a-list");
Console.WriteLine(uri.DnsSafeHost);

The output is:

stackoverflow.com

Now you can get it to work like this (remember to store in upper case in Bannedsites):

var uri = new Uri(input)
if(Bannedsites.Contains(uri.DnsSafeHost.ToUpper(CultureInfo.InvariantCulture)))
{
    //Don't go to that site
}
else
{
    //Go to that site
}

This will also ensure that the domain didn't appear as a part of another string by chance, for example as part of a parameter.

Also note that this method will give you subdomains, so:

var uri = new Uri("http://msdn.microsoft.com/en-US/");
Console.WriteLine(uri.DnsSafeHost);

returns:

msdn.microsoft.com

and not only:

microsoft.com

You may also verify that the uri is valid with uri.IsWellFormedOriginalString():

var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Contains(uri.DnsSafeHost))
{
    //Don't go to that site
}
else
{
    //Go to that site
}

Now, let's say that you want to take into account the detail of subdomains, well, you can do this:

var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Any(x => uri.DnsSafeHost.EndsWith(x))
{
    // Don't go to that site
}
else
{
    // Go to that site
}

Lastly if you are banning particular pages not whole webs (in which case caring for the subdomains makes no sense), then you can do as follows:

var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Contains((uri.DnsSafeHost + uri.AbsolutePath)))
{
    //Don't go to that site
}
else
{
    //Go to that site
}

Using AbsolutePath you take care of those "?" and "#" often used to pass parameters, and any other character that doesn't change the requested page.


You may also consider using Uri.Compare and store a list of Uri instead of a list of strings.


I leave you the task of making the comparisons case invariant as RFC 1035 says: " For all parts of the DNS that are part of the official protocol, all comparisons between character strings (e.g., labels, domain names, etc.) are done in a case-insensitive manner. "

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.