3

I'm getting a date from a web (html): " abril   2013  Viernes 19"

I've tried all normal regex with no success.

Finally I discovered the string bytes (str.getBytes()), and this are the values:

[-96, 97, 98, 114, 105, 108, -96, -96, -96, 50, 48, 49, 51, -96, -96, 86, 105, 101, 114, 110, 101, 115, -96, 49, 57]

What are this -96?

how to replace 1 or more -96 or whatever empty space is by 1 space?

4
  • 2
    What are you trying to do? Remove all spaces? Commented Apr 19, 2013 at 16:30
  • 1
    It's not clear what you're trying to do here. Why do you want to replace these negative bytes? Commented Apr 19, 2013 at 16:31
  • Why are you using a regex to parse a date? Have you tried SimpleDateFormat? Commented Apr 19, 2013 at 17:50
  • Yes i need to rEplace all empty chars by 1 space Commented Apr 20, 2013 at 20:59

4 Answers 4

4

The byte -96 (A0 in hexadecimal, or 160 as an unsigned byte), is the non-breaking space in the ISO-8859-1 character encoding, which is probably the encoding you used to transform the string to bytes.

Sign up to request clarification or add additional context in comments.

1 Comment

@rgettman I think there's a serial downvoter in this thread. A bunch of my posts were downvoted, but then the system picked it up and reverted most of them.
4

The first byte (-96) is negative because in Java bytes are signed. It corresponds to character 160 (256 - 96), which is a non-breaking space. You'll need to specify that character directly in your regular expression.

str = str.replaceAll(String.valueOf((char) -96), " ");

1 Comment

@rgettman When I first saw the answer, I thought it will work, but I've tried and doesn't work. Any suggestions?
2

You should be able to use the Character.isSpaceChar function to do this. As mentioned in a response to a related question, you can use it in a java regex like this:

String sampleString = "\u00A0abril\u00A0\u00A02013\u00A0Viernes\u00A019";
String result = sampleString.replaceAll("\\p{javaSpaceChar}", " ");

I think that will do exactly what you want while avoiding any need to deal with raw bytes.

1 Comment

Yes, that works, thanks a lot! str.replaceAll("\\p{javaSpaceChar}+", " ").trim();
1

I fixed this way (please if anyone have a better answer I'll appreciate it):

byte[] b=str.getBytes();
for (int i = 0; i < b.length; i++) {
    if (b[i]==-96)
        b[i]=" ".getBytes()[0];
}
String strOut=new String(b).trim();
Pattern blank=Pattern.compile("\\s+|\b+|\t+|\n+|\f+|\r+");
strOut=blank.matcher(strOut).replaceAll(" ");

Thanks every body for help!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.