0

I need to validate that user type only English text. So it can be Latin letters with some punctuation symbols. For now I write the following regex:

@NotEmpty
@Pattern(regexp = "^[ \\w \\d \\s \\. \\& \\+ \\- \\, \\! \\@ \\# \\$ \\% \\^ \\* \\( \\) \\; \\\\ \\/ \\| \\< \\> \\\" \\' \\? \\= \\: \\[ \\] ]*$")
private String str;

And it works fine.
But I think about more elegant way: I want to validate that my string contains only ASCII symbols. Can I do it with some special annotation or parameter? Or I need to write my custom validator for that? (can you help me with example in this case).

I want something like:

static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII"); // or "ISO-8859-1" for ISO Latin 1

boolean isValid(String input) {    
    return asciiEncoder.canEncode(input);
}

2 Answers 2

1

Option 1:

The Strings in Java are always encoded as UTF-16 where the ASCII character set is contained in the range of 0-127. Thus any non-ASCII char will never contain a number from 0 to 127.

str.chars().allMatch(c -> c < 128);

Option 2: Regex

public class Main {
    public static void main(String[] args) {
        char nonAscii = 0x00FF;
        String asciiText = "Day";
        String nonAsciiText = "Night " + nonAscii;
        System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
        System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
    }
}

Option 3: with java.nio.charset.Charset

import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;

public class StringUtils {

  static CharsetEncoder asciiEncoder = 
      StandardCharsets.US_ASCII.newEncoder(); 

  public static boolean isPureAscii(String v) {
    return asciiEncoder.canEncode(v);
  }

  public static void main (String args[])
    throws Exception {

     String test = "Réal";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
     test = "Real";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
  }
}

Option 4: Using Guava , 3rd party

boolean isAscii = CharMatcher.ascii(someString);

Reference:

Option 1 quotes JeremyP & Julian Lettner from https://stackoverflow.com/a/3585791/1245478

Option 2 quotes Arne from https://stackoverflow.com/a/3585284/1245478

Option 3 quotes RealHowTo from https://stackoverflow.com/a/3585247/1245478

Option 4 quotes Colin D from https://stackoverflow.com/a/3585089/1245478

Sign up to request clarification or add additional context in comments.

2 Comments

ok, but do you know how to make this validation through hibernate?
Try using the regex pattern from my Option 2 instead of that long non-elegant line of regex in the @Pattern that you don't like
0

Take a look at this site :

http://www.rgagnon.com/javadetails/java-0536.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.