Java problems while working with file

Question

I've got some problems trying to delete from my string a subsequence \u000.

Firstly, I read bytes [] from my file into string by String str = new String(bytes, "UTF8"); then I get the str which equals \u0004Word which means 4Word. 4 is length of word Word. So now I need to convert it to regular 4Words. replaceAll("\u000", "");, replaceALL("\\\\u000", "") etc doesn't work. How to do that?

void FillingStorage() throws Exception{
    Path path = Paths.get(System.getProperty("db.file"));//that's my file
    byte[] data = Files.readAllBytes(path);
    String str = new String(data, "UTF8");
    System.out.println(str);
    String res = str.replaceAll("I don't know what to write here cos nothing I've tried works");
    return;
}

UPDATE! Firstly, I fill my HashMap with Key -> Value and Key1 -> Value1. Then I write it in file as bytes. So when I try to convert it back to string and print it I see: Key Value Key1 Value1 instead of 3Key 5Value 4Key1 6Value1. But suprisingly if you look at string that I print you will see smth like that: \u0003Key \u0005Value etc... so looks like that my string contains these numbers but java can't print them.

This is how I write my bytes in file:

DataOutputStream stream = new DataOutputStream(new FileOutputStream(System.getProperty("db.file"), true));
    for (Map.Entry<String, String> entry : storage.entrySet()) {
        byte[] bytesKey = entry.getKey().getBytes(StandardCharsets.UTF_8);
        stream.write((int)bytesKey.length);//it disappears!
        stream.write(bytesKey);
        byte[] bytesVal = entry.getValue().getBytes(StandardCharsets.UTF_8);
        stream.write((Integer)bytesVal.length);//disappears too!
        stream.write(bytesVal);
    }
    stream.close();

What you see when you print str? I am asking because I doubt that there is \u000 in it since you claim that replaceALL("\\\\u000", "") doesn't work. Or maybe you forgot to store result of replaceAll in str reference (strings are immutable, so original string is not changed by replaceAll method, but new string is created and returned). — Pshemo
– Pshemo, Commented Oct 9, 2014 at 17:43
Unrelated, but you should use new String(data, StandardCharsets.UTF_8) instead to avoid the UnsupportedEncodingException which can't actually happen with UTF-8. — David Conrad
– David Conrad, Commented Oct 9, 2014 at 17:52
Can the string be over 127 characters long? If there was an extraneous character \u0080 or greater at the beginning of the string, it would cause problems interpreting the data as UTF-8. You need to remove the length before you convert it to a string. — David Conrad
– David Conrad, Commented Oct 9, 2014 at 17:55

Marko Topolnik · Accepted Answer · 2014-10-09 17:48:00Z

1

First of all, your requirement does not call for regular expressions, so you should have used replace() instead.

Second, \uxxxx is character literal syntax in Java, so it is not exactly clear that you actually have the characters \ u 0 0 0 in your string; it would be much more logical that your byte array simply starts with the single byte equal to 4, which is the string length.

In that case you should simply discard the initial byte from the array when converting to String, using the constructor which accepts offset and len arguments.

If you happen to indeed have all those chars in the string, again simply using substring to get rid of the initial 6 characters should be all you need.

answered Oct 9, 2014 at 17:48

Marko Topolnik

201k31 gold badges337 silver badges455 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Maxim Gotovchits Over a year ago

It's no so easy because bytes keeps smth like 1A3AAA2AA so it's pretty difficult to parse it in your way.

Marko Topolnik Over a year ago

You should then present these difficulties in the question.

Maxim Gotovchits Over a year ago

Updated, I've a new problem =(

Collectives™ on Stack Overflow

Java problems while working with file

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related