3

I tried doing this:

using System;
using System.Collections.Generic;
using System.Text;

namespace UrlsDetector
{
    class UrlDetector
    {
        public static string RemoveUrl(string input)
        {
            var words = input;
            while(words.Contains("https://"))
            {
                string urlToRemove = words.Substring("https://", @" ");
                words = words.Replace("https://" + urlToRemove , @"");
            }
        }
        
    }

    class Program
    {
        static void Main()
        {
            Console.WriteLine(UrlDetector.RemoveUrl(
                "I saw a cat and a horse on https://www.youtube.com/"));

        }
    }
}

but it doesn't work.

What I want to achieve is remove the entire "https://www.youtube.com/" and display "I saw a cat and a horse on".

I also want to display a message like "the sentence you input doesn't have url" if the sentence doesn't have any url.

3

4 Answers 4

4

If you are looking for a non RegEx way to do this, here you go. But the method I encoded below assumes that a URL begins with "http://" or "https://", which means it will not work with URL's that begin with something like ftp:// or file://, although the code below can be easily modified to support that. Also, it assumes the URL path continues until it reaches either the end of the string or a white space character (like a space or a tab or a new line). Again, this can easily be modified if your requirements are different.

Also, if the string contains no URL, currently it just returns a blank string. You can modify this easily too!

using System;

public class Program
{
    public static void Main()
    {
        string str = "I saw a cat and a horse on https://www.youtube.com/";

        UrlExtraction extraction = RemoveUrl(str);
        Console.WriteLine("Original Text: " + extraction.OriginalText);
        Console.WriteLine();
        Console.WriteLine("Url: " + extraction.ExtractedUrl);
        Console.WriteLine("Text: " + extraction.TextWithoutUrl);
    }

    private static UrlExtraction RemoveUrl(string str)
    {       
        if (String.IsNullOrWhiteSpace(str))
        {
            return new UrlExtraction("", "", "");
        }

        int startIndex = str.IndexOf("https://", 
                StringComparison.InvariantCultureIgnoreCase);

        if (startIndex == -1)
        {
            startIndex = str.IndexOf("http://", 
                StringComparison.InvariantCultureIgnoreCase);
        }

        if (startIndex == -1)
        {
            return new UrlExtraction(str, "", "");
        }

        int endIndex = startIndex;
        while (endIndex < str.Length && !IsWhiteSpace(str[endIndex])) 
        {           
            endIndex++;
        }

        return new UrlExtraction(str, str.Substring(startIndex, endIndex - startIndex), 
            str.Remove(startIndex, endIndex - startIndex));
    }

    private static bool IsWhiteSpace(char c)
    {
        return 
            c == '\n' || 
            c == '\r' || 
            c == ' ' || 
            c == '\t';
    }

    private class UrlExtraction
    {
        public string ExtractedUrl {get; set;}
        public string TextWithoutUrl {get; set;}
        public string OriginalText {get; set;}

        public UrlExtraction(string originalText, string extractedUrl, 
            string textWithoutUrl)
        {
            OriginalText = originalText;
            ExtractedUrl = extractedUrl;
            TextWithoutUrl = textWithoutUrl;
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

4

A simplified version of what you're doing. Instead of using SubString or IndexOf, I split the input into a list of strings, and remove the items that contain a URL. I iterate over the list in reverse as removing an item in a forward loop direction will skip an index.

    public static string RemoveUrl(string input)
    {
        List<string> words = input.Split(" ").ToList();
        for (int i = words.Count - 1; i >= 0; i--) 
        {
            if (words[i].StartsWith("https://")) words.RemoveAt(i);
        }
        return string.Join(" ", words);
    }

This methods advantage is avoiding SubString and Replace methods that essentially create new Strings each time they're used. In a loop this excessive string manipulation can put pressure on the Garbage Collector and bloat the Managed Heap. A Split and Join has less performance cost in comparison especially when used in a loop like this with a lot of data.

@Moshi is correct with large amounts of data, so this is more of a Production Code Base example:

public static class Ext
{
    public static LinkedList<T> RemoveAll<T>(this LinkedList<T> list, Predicate<T> match)
    {
        if (list == null)
        {
            throw new ArgumentNullException("list");
        }
        if (match == null)
        {
            throw new ArgumentNullException("match");
        }
        var count = 0;
        var node = list.First;
        while (node != null)
        {
            var next = node.Next;
            if (match(node.Value))
            {
                list.Remove(node);
                count++;
            }
            node = next;
        }
        return list;
    }
}

public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
        var s= "I saw a https://www.youtube.com/cat and a https://www.youtube.com/horse on https://www.youtube.com/";

        //Uncomment for second run 
        //s= @"I saw a https://www.youtube.com/cat and a https://www.youtube.com/horse on https://www.youtube.com/
        //but it doesnt work
        //what I want to achieve is remove the entire https://www.youtube.com/ and display I saw a cat and a horse on
        //I also want to display a message like the sentence you input doesn't have url if the sentence doesn't have any url.";

        Stopwatch watch = new Stopwatch();

        watch.Start();
        var resultList = RemoveUrl(s);
        watch.Stop(); Debug.WriteLine(watch.Elapsed.ToString());

        watch.Reset(); watch.Start();
        var wordsLL = new LinkedList<string>(s.Split(' '));
        var result = string.Join(' ', wordsLL.RemoveAll(x => x.StartsWith("https://")));
        watch.Stop(); Debug.WriteLine(watch.Elapsed.ToString());
       }
 }

var s one line:
watch.Elapsed = {00:00:00.0116388}
watch.Elapsed = {00:00:00.0134778}

var s multilines:
watch.Elapsed = {00:00:00.0013588}
watch.Elapsed = {00:00:00.0009252}

1 Comment

It won't be good choice because RemoveAt will take O(n) time. see here
3

Using basic string manipulation will never get you where you want to be. Using regular expressions makes this very easy for you. search for a piece of text that looks like "http(s)?:\/\/\S*[^\s\.]":

  • http: the text block http
  • (s)?: the optional (?) letter s
  • :\/\/: the characters ://
  • \S*: any amount (*) non white characters (\S)
  • [^\s\.]: any character that is not (^) in the list ([ ]) of characters being white characters (\s) or dot (\.). This allows you to exclude the dot at the end of a sentence from your url.
using System;
using System.Text.RegularExpressions;

namespace UrlsDetector
{
  internal class Program
  {

    static void Main(string[] args)
    {
      Console.WriteLine(UrlDetector.RemoveUrl(
          "I saw a cat and a horse on https://www.youtube.com/ and also on http://www.example.com."));
      Console.ReadLine();
    }
  }

  class UrlDetector
  {
    public static string RemoveUrl(string input)
    {

      var regex = new Regex($@"http(s)?:\/\/\S*[^\s.]");
      return regex.Replace(input, "");
    }
  }
}

Using regular expressions you can also detect matches Regex.Match(...) which allows you to detect any urls in your text.

3 Comments

"http:(s)?:..." the first colon is wrong. Also: what if the URL is followed by another punctuation character like !;?
btw sir kevin How do use exactly the Regex.Match() to detect the url and display some message if doest have any url?
I created a quick fiddle: dotnetfiddle.net/zQbQYr . I added 2 wayus of doing it (via Matchand via IsMatch). IsMatch is used when just a check will do (boolean). Match() is used if you want to do more stuff with the result. your case, IsMatch will do.
1

Better way to use, split and StringBuilder. Code will be look like this. StringBuilder is optimized this kind of situation.

Pseudocode:

    var words = "I saw a cat and a horse on https://www.youtube.com/".Split(" ").ToList();
    var sb = new StringBuilder();
    foreach(var word in words){
        if(!word.StartsWith("https://")) sb.Append(word + " ");
    }
    return sb.ToString();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.