Replacing all non-alphanumeric characters with empty strings

Question

I tried using this but didn't work-

return value.replaceAll("/[^A-Za-z0-9 ]/", "");

Guys, you forget there are alphabets other than the Latin one. — Mateva
– Mateva, Commented Oct 14, 2015 at 16:48
But if you want to validate a hostname for instance this would be good to exclude invalid alphabets. — Gurnard
– Gurnard, Commented Aug 2, 2019 at 12:08

Dave Jarvis · Accepted Answer · 2017-09-18 17:14:33Z

306

Use [^A-Za-z0-9].

Note: removed the space since that is not typically considered alphanumeric.

edited Sep 18, 2017 at 17:14

Dave Jarvis

31.3k43 gold badges186 silver badges326 bronze badges

answered Nov 26, 2009 at 20:30

Mirek Pluta

8,0631 gold badge35 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Andrew Duffy Over a year ago

Neither should the space at the end of the character class.

erik.aortiz Over a year ago

the reg exp is ok, just remove "/" from the regexp string from value.replaceAll("/[^A-Za-z0-9 ]/", ""); to value.replaceAll("[^A-Za-z0-9 ]", ""); you don't need the "/" inside the regexp, I think you've confused with javascript patterns

SüniÚr Over a year ago

note that this onl works with Latin alphabet and doesn't works with accent characters or any "special" char set.

Andrew Duffy · Accepted Answer · 2009-11-26 20:33:36Z

150

Try

return value.replaceAll("[^A-Za-z0-9]", "");

or

return value.replaceAll("[\\W]|_", "");

answered Nov 26, 2009 at 20:33

Andrew Duffy

6,9682 gold badges26 silver badges17 bronze badges

3 Comments

erickson Over a year ago

With underscores, return value.replaceAll("\\W", "");

Andrew Duffy Over a year ago

Of course. Compilers are great at spotting that sort of thing.

WW. Over a year ago

The second one doesn't answer the question. What about characters like : / \ etc?

matejs · Accepted Answer · 2019-11-15 08:14:32Z

92

You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.

If the replacement of these characters is not wanted use pre-defined character classes instead:

 str.replaceAll("[^\\p{IsAlphabetic}\\p{IsDigit}]", "");

PS: \p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].

edited Nov 15, 2019 at 8:14

matejs

3,5668 gold badges37 silver badges48 bronze badges

answered Sep 17, 2015 at 10:25

Andre Steingress

4,41130 silver badges28 bronze badges

5 Comments

Mateva Over a year ago

Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!

Bogdan Klichuk Over a year ago

Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\\p{IsAlphabetic}\\p{IsDigit}] works well.

Andre Steingress Over a year ago

@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.

Jakub Turcovsky Over a year ago

@AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.

Robert Goodrick Over a year ago

How to only allow Alpha, Digit, and Emoji?

erickson · Accepted Answer · 2009-11-26 20:31:16Z

65

return value.replaceAll("[^A-Za-z0-9 ]", "");

This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.

answered Nov 26, 2009 at 20:31

erickson

271k59 gold badges406 silver badges502 bronze badges

Comments

nhinkle · Accepted Answer · 2014-05-20 03:14:27Z

23

You could also try this simpler regex:

 str = str.replaceAll("\\P{Alnum}", "");

edited May 20, 2014 at 3:14

nhinkle

1,1571 gold badge17 silver badges32 bronze badges

answered Aug 6, 2013 at 12:17

saurav

3,4722 gold badges26 silver badges33 bronze badges

2 Comments

Jonik Over a year ago

Or, preserving whitespace: str.replaceAll("[^\\p{Alnum}\\s]", "")

membersound Over a year ago

Or \\p{Alnum}\\p{Space}.

Community · Accepted Answer · 2020-06-20 09:12:55Z

13

Solution:

value.replaceAll("[^A-Za-z0-9]", "")

Explanation:

[^abc] When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.

Looking at the keyword as two function:

[(Pattern)] = match(Pattern)
[^(Pattern)] = notMatch(Pattern)

Moreover regarding a pattern:

A-Z = all characters included from A to Z
a-z = all characters included from a to z
0=9 = all characters included from 0 to 9

Therefore it will substitute all the char NOT included in the pattern

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Nov 21, 2018 at 12:07

GalloCedrone

5,0873 gold badges28 silver badges43 bronze badges

Comments

abyx · Accepted Answer · 2009-11-26 20:39:19Z

12

Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.

answered Nov 26, 2009 at 20:39

abyx

73.5k19 gold badges99 silver badges121 bronze badges

Comments

zneo · Accepted Answer · 2009-11-27 02:08:47Z

8

I made this method for creating filenames:

public static String safeChar(String input)
{
    char[] allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
    char[] charArray = input.toString().toCharArray();
    StringBuilder result = new StringBuilder();
    for (char c : charArray)
    {
        for (char a : allowed)
        {
            if(c==a) result.append(a);
        }
    }
    return result.toString();
}

answered Nov 27, 2009 at 2:08

zneo

5983 silver badges10 bronze badges

3 Comments

Michael Peterson Over a year ago

This is pretty brute-force. Regex is the way to go with the OP's situation.

zneo Over a year ago

You're right, regex is better. But at the time, regex and me I didn't come along well.

Michael Peterson Over a year ago

Hah, does anyone really get along that well with regex? ;)

snap · Accepted Answer · 2018-05-24 10:18:52Z

3

If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:

 String value = "your value";

 // this could be placed as a static final constant, so the compiling is only done once
 Pattern pattern = Pattern.compile("[^\\w]", Pattern.UNICODE_CHARACTER_CLASS);

 value = pattern.matcher(value).replaceAll("");

Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)

answered May 24, 2018 at 10:18

snap

2,0051 gold badge23 silver badges26 bronze badges

Comments

Deb · Accepted Answer · 2018-10-04 08:40:07Z

2

Using Guava you can easily combine different type of criteria. For your specific solution you can use:

value = CharMatcher.inRange('0', '9')
        .or(CharMatcher.inRange('a', 'z')
        .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)

edited Oct 4, 2018 at 8:40

answered Oct 4, 2018 at 7:45

Deb

2,9721 gold badge19 silver badges33 bronze badges

1 Comment

E-Riz Over a year ago

So much more readable than regex. People should look at the CharMatcher class when they think regex is the only solution to a problem.

user5004526 · Accepted Answer · 2016-11-01 19:36:37Z

1

Simple method:

public boolean isBlank(String value) {
    return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}

public String normalizeOnlyLettersNumbers(String str) {
    if (!isBlank(str)) {
        return str.replaceAll("[^\\p{L}\\p{Nd}]+", "");
    } else {
        return "";
    }
}

answered Nov 1, 2016 at 19:36

user5004526

Comments

Jason Roman · Accepted Answer · 2017-08-23 15:46:30Z

1

public static void main(String[] args) {
    String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

    System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

}

output: ChlamydiasppIgGIgMIgAAbs8006

Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java

edited Aug 23, 2017 at 15:46

Jason Roman

8,30610 gold badges37 silver badges44 bronze badges

answered Aug 23, 2017 at 15:21

Albin

114 bronze badges

Comments

Bunarro · Accepted Answer · 2019-10-28 08:30:18Z

0

Guava's CharMatcher provides a concise solution:

output = CharMatcher.javaLetterOrDigit().retainFrom(input);

edited Oct 28, 2019 at 8:30

answered Oct 28, 2019 at 7:56

Bunarro

1,8501 gold badge15 silver badges8 bronze badges

Comments

Patrick Junio · Accepted Answer · 2022-11-15 23:34:39Z

0

Dart

If you tried this and it didn't work..

value.replaceAll("[^A-Za-z0-9]", "");

Just use RegExp like this:

value.replaceAll(RegExp("[^A-Za-z0-9]"), "");

edited Nov 15, 2022 at 23:34

answered Nov 15, 2022 at 23:32

Patrick Junio

32 bronze badges

Collectives™ on Stack Overflow

Replacing all non-alphanumeric characters with empty strings

14 Answers 14

3 Comments

3 Comments

5 Comments

Comments

2 Comments

Solution:

Explanation:

Comments

Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Dart

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

3 Comments

3 Comments

5 Comments

Comments

2 Comments

Solution:

Explanation:

Comments

Comments

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Dart

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related