2

A very odd behavior on the part of the Android java regex functions:

I am trying to replace "<file_info.*>" in:

<?xml version="1.0" encoding="utf-8"?>
<file_info domain_id="ac-demo" language="en" os="androidtab" version="11" >
     <id string_name="app_name">MobilityPlus</id>
    <!-- general buttons text -->
......

Calling: String.replaceAll( "<file_info.*>", "<resources>" ); And I get only the replaced part:

<?xml version="1.0" encoding="utf-8"?>
<resources>

And the rest is Cropped!! WHY??? I need the whole string returned and only the searched part replaced. Using at least 2 online regex testers and got exactly what I wanted, but in Android/Java it won't.

COULD THIS MEAN BUG? In GOOGLE CODE??

Any recommendations on how to go around this issue will be most appreciated! Thanks!

(Note: I tried using both String.replaceAll() and Pattern+Matcher and both yield the same results. with multiple lines flag and without, and even after removing all \t \r \n... characters)

2
  • 1
    It's not at all clear what you're trying to do, but I'm pretty confident that a greedy regular expression isn't going to do whatever it is. You really need to be parsing the XML. Commented Nov 19, 2013 at 17:04
  • Why do hard work (xml parsing, bla bla bla...) when you can do the job in 3 lines? this was just the sort of thing I was trying to accomplish. And it seems that the regex Android API is indeed non-standard, or at least different (no real standard I know..) from the Java standard. The problem with this, is that you can't us e the abundance of online regex testers out there, that would otherwise fit... Commented Nov 20, 2013 at 9:48

2 Answers 2

2

Try non-greedy quantifier with DOTALL:

String.replaceAll( "(?s)<file_info.*?>", "<resources>" );

Though I should caution you against parsing/manipulating XML using regex.

Sign up to request clarification or add additional context in comments.

3 Comments

This seems like the best answer because of the greediness :) Still, with my poor knowledge of regex, I don't understand why google didn't go with the java/perl standard.. or this is a flag?
I believe Java regex will also behave same way. Another way to write above is String.replaceAll( "(?s)<file_info[^>]*>", "<resources>" );
You can check link and see that the simple form "<file_info.*>" works without any issue. Same with any other online regex tester. Is this a java issue? I usually work with Qt/C++ which is more compatible with Perl/standard RegEx, So I might be wrong...
0

MULTILINE mode is irrelevant, but it sounds like you have DOTALL mode set. That allows the . to match line separator characters (\n, \r, etc.). You're actually replacing everything from the first occurrence of <file_info to the last occurrence of > in the document.

But you can't count on those or any other whitespace characters being present in XML; they're only there to make it easier for us wetware types to read it. If you want to replace just the one tag, you should use a negated character class, like so:

s = s.replaceAll( "<file_info[^>]*>", "<resources>" );

1 Comment

Thanks, but it seems to be the "greediness" thing (not a big reg-ex pro myself...). But good to know how to set DOTALL straight in the regex

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.