238

I tried using this but didn't work-

return value.replaceAll("/[^A-Za-z0-9 ]/", "");
3
  • 43
    Guys, you forget there are alphabets other than the Latin one. Commented Oct 14, 2015 at 16:48
  • 3
    But if you want to validate a hostname for instance this would be good to exclude invalid alphabets. Commented Aug 2, 2019 at 12:08
  • @Mateva Good. Those invalid characters will get removed. Commented Dec 10, 2024 at 16:27

14 Answers 14

306

Use [^A-Za-z0-9].

Note: removed the space since that is not typically considered alphanumeric.

Sign up to request clarification or add additional context in comments.

3 Comments

Neither should the space at the end of the character class.
the reg exp is ok, just remove "/" from the regexp string from value.replaceAll("/[^A-Za-z0-9 ]/", ""); to value.replaceAll("[^A-Za-z0-9 ]", ""); you don't need the "/" inside the regexp, I think you've confused with javascript patterns
note that this onl works with Latin alphabet and doesn't works with accent characters or any "special" char set.
150

Try

return value.replaceAll("[^A-Za-z0-9]", "");

or

return value.replaceAll("[\\W]|_", "");

3 Comments

With underscores, return value.replaceAll("\\W", "");
Of course. Compilers are great at spotting that sort of thing.
The second one doesn't answer the question. What about characters like : / \ etc?
92

You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.

If the replacement of these characters is not wanted use pre-defined character classes instead:

 str.replaceAll("[^\\p{IsAlphabetic}\\p{IsDigit}]", "");

PS: \p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].

5 Comments

Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world!
Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. [^\\p{IsAlphabetic}\\p{IsDigit}] works well.
@JakubTurcovsky docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the docs.oracle.com/javase/10/docs/api/java/util/regex/… flag is specified.
@AndreSteingress Correct, the reason {IsDigit} doesn't work for me and {Digit} does is that I'm trying this on Android. And Android has UNICODE_CHARACTER_CLASS turned on by default. Thanks for clearance.
How to only allow Alpha, Digit, and Emoji?
65
return value.replaceAll("[^A-Za-z0-9 ]", "");

This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.

Comments

23

You could also try this simpler regex:

 str = str.replaceAll("\\P{Alnum}", "");

2 Comments

Or, preserving whitespace: str.replaceAll("[^\\p{Alnum}\\s]", "")
Or \\p{Alnum}\\p{Space}.
13

Solution:

value.replaceAll("[^A-Za-z0-9]", "")

Explanation:

[^abc] When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.

Looking at the keyword as two function:

  • [(Pattern)] = match(Pattern)
  • [^(Pattern)] = notMatch(Pattern)

Moreover regarding a pattern:

  • A-Z = all characters included from A to Z

  • a-z = all characters included from a to z

  • 0=9 = all characters included from 0 to 9

Therefore it will substitute all the char NOT included in the pattern

Comments

12

Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.

Comments

8

I made this method for creating filenames:

public static String safeChar(String input)
{
    char[] allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
    char[] charArray = input.toString().toCharArray();
    StringBuilder result = new StringBuilder();
    for (char c : charArray)
    {
        for (char a : allowed)
        {
            if(c==a) result.append(a);
        }
    }
    return result.toString();
}

3 Comments

This is pretty brute-force. Regex is the way to go with the OP's situation.
You're right, regex is better. But at the time, regex and me I didn't come along well.
Hah, does anyone really get along that well with regex? ;)
3

If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:

 String value = "your value";

 // this could be placed as a static final constant, so the compiling is only done once
 Pattern pattern = Pattern.compile("[^\\w]", Pattern.UNICODE_CHARACTER_CLASS);

 value = pattern.matcher(value).replaceAll("");

Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)

Comments

2

Using Guava you can easily combine different type of criteria. For your specific solution you can use:

value = CharMatcher.inRange('0', '9')
        .or(CharMatcher.inRange('a', 'z')
        .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)

1 Comment

So much more readable than regex. People should look at the CharMatcher class when they think regex is the only solution to a problem.
1

Simple method:

public boolean isBlank(String value) {
    return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}

public String normalizeOnlyLettersNumbers(String str) {
    if (!isBlank(str)) {
        return str.replaceAll("[^\\p{L}\\p{Nd}]+", "");
    } else {
        return "";
    }
}

Comments

1
public static void main(String[] args) {
    String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

    System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

}

output: ChlamydiasppIgGIgMIgAAbs8006

Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java

Comments

0

Guava's CharMatcher provides a concise solution:

output = CharMatcher.javaLetterOrDigit().retainFrom(input);

Comments

0

Dart

If you tried this and it didn't work..

value.replaceAll("[^A-Za-z0-9]", "");

Just use RegExp like this:

value.replaceAll(RegExp("[^A-Za-z0-9]"), "");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.