How do I strip non-alphanumeric characters (including spaces) from a string?

Question

How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?

I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).

"Hello there(hello#)".Replace(regex-i-want, "");

should give

"Hellotherehello"

I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", ""); but the spaces remain.

How about first defining what exactly you mean by alpha numeric? Do you just want A-Z,a-z,0-9? Unicode has plenty more letters and numbers. — CodesInChaos
– CodesInChaos, Commented Jan 8, 2012 at 16:36
With that edit, it looks much better - taking back my minus vote. — Anders Abel
– Anders Abel, Commented Jan 8, 2012 at 16:46
Why do you have a space in your bracket? And string.Replace doesn't take a regex in the first place. — CodesInChaos
– CodesInChaos, Commented Jan 8, 2012 at 17:04
Just to be absolutely clear: You don't want a letter like ä either? — CodesInChaos
– CodesInChaos, Commented Jan 8, 2012 at 17:12
I answered my question taking your tips into account (see below). — James
– James, Commented Jan 8, 2012 at 17:29

Tim Pietzcker · Accepted Answer · 2012-01-08 17:11:03Z

70

In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace() which I had overlooked completely...):

result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");

should work. The + makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.

If you want to keep non-ASCII letters/digits, too, use the following regex:

@"[^\p{L}\p{N}]+"

which leaves

BonjourmesélèvesGutenMorgenliebeSchüler

instead of

BonjourmeslvesGutenMorgenliebeSchler

edited Jan 8, 2012 at 17:11

answered Jan 8, 2012 at 16:45

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

James Over a year ago

I tried this...it's very close but it seems to leave spaces in - I want them stripped too! Thanks.

Tim Pietzcker Over a year ago

No, it doesn't. Unless you have special spaces in there like non-breakable space ASCII 160 (and the second version correctly removes those, too).

James Over a year ago

Hmmm I tried the following: string t = "hello there - ( efrwef )"; string a = "New: " + t.Replace(@"[^\p{L}\p{N}]+", ""); and a ends up being "hello there - ( efrwef )" - completely unchanged - I know I'm doing something wrong here.

CodesInChaos Over a year ago

string.Replace doesn't take a regex.

James Over a year ago

AHHH that would explain all. So, how could I do what is described above with regex bits and pieces in C#?

|

Dmitrii Bychenko · Accepted Answer · 2016-06-21 16:03:28Z

23

You can use Linq to filter out required characters:

  String source = "Hello there(hello#)";

  // "Hellotherehello"
  String result = new String(source
    .Where(ch => Char.IsLetterOrDigit(ch))
    .ToArray());

Or

  String result = String.Concat(source
    .Where(ch => Char.IsLetterOrDigit(ch)));

And so you have no need in regular expressions.

edited Jun 21, 2016 at 16:03

answered Nov 24, 2015 at 14:31

Dmitrii Bychenko

188k20 gold badges178 silver badges231 bronze badges

3 Comments

Marc L. Over a year ago

Great addition! Would be interesting to know the relative performance of this to the Regex solution. Out of the gate, it reads a lot better.

Marc L. Over a year ago

A quick test in LinqPad suggests there's negligible difference between this and even a compiled Regex solution. Readability wins for me.

Will Croxford Over a year ago

Looks really neat and readable, if performance same, I'm using it thanks. NB for new programmers like me, this means you need to add the line using System.Linq; at the top of the file for the C# compiler to recognise method Where.

Adrianne · Accepted Answer · 2012-01-08 18:27:23Z

3

Or you can do this too:

    public static string RemoveNonAlphanumeric(string text)
    {
        StringBuilder sb = new StringBuilder(text.Length);

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                sb.Append(text[i]);
        }

        return sb.ToString();
    }

Usage:

string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");

//text: textLaLalol123

edited Jan 8, 2012 at 18:27

answered Jan 8, 2012 at 17:04

Adrianne

2603 silver badges12 bronze badges

4 Comments

CodesInChaos Over a year ago

While I like the general approach, it doesn't fit the requirement of only allowing A-Z,a-z,0-9. It allows other letters and digits too.

CodesInChaos Over a year ago

There are more than 10 digits in unicode too. ٠١٢٣٤ are some examples.

CodesInChaos Over a year ago

Sorry, but it's still wrong. ToLower uses the current locale. So when you run in in Turkey, it won't allow I, but allows İ instead. en.wikipedia.org/wiki/Dotted_and_dotless_I

Adrianne Over a year ago

@CodeInChaos wow... guess my laziness took me to do that. Fixed :)

James · Accepted Answer · 2012-01-08 17:15:48Z

2

The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).

The following code should do what was specified:

Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");

This gives:

regexed = "Hellotherehello"

answered Jan 8, 2012 at 17:15

James

31.9k19 gold badges91 silver badges117 bronze badges

Comments

Justin Caldicott · Accepted Answer · 2014-03-29 13:18:01Z

And as a replace operation as an extension method:

public static class StringExtensions
{
    public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
    {
        StringBuilder result = new StringBuilder(text.Length);

        foreach(char c in text)
        {
            if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                result.Append(c);
            else
                result.Append(replaceChar);
        }

        return result.ToString();
    } 
}

And test:

[TestFixture]
public sealed class StringExtensionsTests
{
    [Test]
    public void Test()
    {
        Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ñ $ 123 ٠١٢٣٤".ReplaceNonAlphanumeric('_'));
    }
}

John Conde · Accepted Answer · 2012-10-25 00:30:48Z

1

var text = "Hello there(hello#)";

var rgx = new Regex("[^a-zA-Z0-9]");

text = rgx.Replace(text, string.Empty);

edited Oct 25, 2012 at 0:30

John Conde

220k100 gold badges464 silver badges504 bronze badges

answered Oct 25, 2012 at 0:14

Michel

113 bronze badges

1 Comment

ForceMagic Over a year ago

Welcome on SO. A little explanation always make your answer more valuable. On SO, people tend to like to know why, instead of just how. ;)

K D · Accepted Answer · 2014-05-14 07:05:55Z

-2

Use following regex to strip those all characters from the string using Regex.Replace

([^A-Za-z0-9\s])

edited May 14, 2014 at 7:05

answered Jan 8, 2012 at 18:43

K D

6,0191 gold badge25 silver badges36 bronze badges

3 Comments

PostureOfLearning Over a year ago

'string.Replace()' does not take regex as an argument

K D Over a year ago

@PostureOfLearning Thank you for your remark but you should look at the question.. the quesiton is not about the replace method it is about the Regex. the usage of method is copied from the question it self provided with helpful regex. Kindly take back your vote :)

PostureOfLearning Over a year ago

I understand the question and I realize that the question also has invalid code. However, I accept invalid code in a question since they are trying to learn, but I find incorrect code in an answer not acceptable. It is an answer and should work. Your answer lead me in the wrong direction when looking to solve my own problem. Having said this, if you want to change it I'll be happy to take back the vote ;)

Veronica · Accepted Answer · 2012-01-08 16:45:05Z

-6

In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx However as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.

answered Jan 8, 2012 at 16:45

Veronica

2421 silver badge3 bronze badges

1 Comment

Marc L. Over a year ago

Do yourself and SO a favor and remove this.

Collectives™ on Stack Overflow

How do I strip non-alphanumeric characters (including spaces) from a string?

8 Answers 8

6 Comments

3 Comments

4 Comments

Comments

Comments

1 Comment

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

6 Comments

3 Comments

4 Comments

Comments

Comments

1 Comment

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related