1

I want to remove all non-alphabetic characters from a String.

Input:

"-Hello, 1 world$!"

Output:

"Helloworld"

But instead I'm getting: "Hello1world"

How can I fix it?

My code:

public class LabProgram {
    public static String removeNonAlpha (String userString) {
    String[] stringArray = userString.split("\\W+");
        String result = new String();
        
        for(int i = 0; i < stringArray.length;i++){
            result = result+ stringArray[i];
        }
        
        return result;
    }
    
    public static void main(String args[]) {
        Scanner scnr = new Scanner(System.in);
        String str = scnr.nextLine();
        String result = removeNonAlpha(str);
        System.out.println(result);
    }
}
2
  • 2
    Please fix your code. Formatting matters as you want folks to be able to quickly and easily read and understand your code and question. Commented Apr 24, 2022 at 21:22
  • 1
    You're using \W to split non-word character, but word characters are defined as alphanumeric plus underscore docs.oracle.com/javase/tutorial/essential/regex/… Commented Apr 24, 2022 at 21:27

6 Answers 6

1

Take a look replaceAll(), which expects a regular expression as the first argument and a replacement-string as a second:

return userString.replaceAll("[^\\p{Alpha}]", "");

for more information on regular expressions take a look at this tutorial

Sign up to request clarification or add additional context in comments.

Comments

1

The issue is that your regex pattern is matching more than just letters, but also matching numbers and the underscore character, as that is what \W does. Replacing this fixes the issue:

String[] stringArray = userString.split("\\P{Alpha}+");

Per the Pattern Javadocs, \W matches any non-word character, where a word character is defined in \w as [a-zA-Z_0-9]. This means that it matches upper and lowercase ASCII letters A - Z & a - z, the numbers 0 - 9, and the underscore character ("_").

The solution would be to use a regex pattern that excludes only the characters you want excluded. Per the pattern documentation could do [^a-zA-Z] or \P{Alpha} to exclude the main 26 upper and lowercase letters. If you want to count letters other than just the 26 ASCII letters (e.g. letters in non-Latin alphabets), you could use \P{IsAlphabetic}.

\p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.

As other answers have pointed out, there are other issues with your code that make it non-idiomatic, but those aren't affecting the correctness of your solution.

1 Comment

Thanks for pointing out the difference between \\P and \\p
1

this should work:

import java.util.Scanner;

public class LabProgram {
    public static String removeNonAlpha (String userString) {
        // If you only want to remove the characters A to Z (lower an uppercase)
        //return userString.replaceAll("[^A-Za-z]+", "");
        return userString.replaceAll("[^\\p{Alpha}]+", "");
    }
    
    public static void main(String args[]) {
        Scanner scnr = new Scanner(System.in);
        String str = scnr.nextLine();
        String result = removeNonAlpha(str);
        System.out.println(result);
    }
}

3 Comments

\p{alpha} is preferable, since it gets all alphabetic characters, not just A to Z (and a to z)
@passer-by thanks i did not know something like this exists - changed my answer
[^\\p{Alpha}] is the same as \\P{Alpha} (uppercase P) - javadoc -- and it will include/exclude all Unicode alphabetic character if the UNICODE_CHARACTER_CLASS option is set
0

You could use:

 public static String removeNonAlpha (String userString) {
    return userString.replaceAll("[^a-zA-Z]+",  "");
}

Comments

0

\W is equivalent to [a-zA-Z_0-9], so it include numerics caracters.

Just replace it by "[^a-zA-Z]+", like in the below example :

import java.util.Arrays;

class Scratch {
    public static void main(String[] args) {
        String input = "-Hello, 1    world$!";
        System.out.println("Input : " + input);
        String[] split = input.split("[^a-zA-Z]+");
        StringBuilder builder = new StringBuilder();
        Arrays.stream(split).forEach(builder::append);
        System.out.println("Ouput :" + builder);
    }
}

Output :

Input : -Hello, 1    world$!
Ouput :Helloworld

You can have a look at this article for more details about regular expressions : https://www.vogella.com/tutorials/JavaRegularExpressions/article.html#meta-characters

Comments

0

Here is what i did.

import java.util.Scanner; 

public class LabProgram {

  public static String removeNonAlpha(String userString) {
     String result = "";
     for (int i = 0; i < userString.length(); ++i) {
        if (Character.isLetter(userString.charAt(i))) {
           result = result + userString.charAt(i);
        }
     }
  return result;
  }

  public static void main(String[] args) {
 
     Scanner scnr = new Scanner(System.in);
  
     String userWord;
  
     userWord = scnr.nextLine();
  
     System.out.println(removeNonAlpha(userWord));
  
  }

}

2 Comments

Questions title: "How can I remove all Non-Alphabetic characters from a String using Regex in Java" (emphasis added)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.