1

I have a scenario like below,

There are few sub-strings need to extract from one string,

example : Main string :

<title><spring:message code='cdc.header.title'/><br></span><span><p></p> <spring:message code='cdc.accessdenied.title'/></title>

So i need to extract <spring:message code='cdc.header.title'/>,<spring:message code='cdc.accessdenied.title'/> ,

I mean what ever spring tag is there i want to retrieve those sub string as List<String>,

I dont want to use XML parser, I want to java PATTERN matcher because my file might not be well formed.

Please help me on this . Thanks

0

4 Answers 4

2

With this approach, it can be done in just one line of code (updated with new requirement as per comment):

List<String> springTags = Arrays.asList(str.replaceAll("(?s)^.*?(?=<spring)|(?<=/>)(?!.*<spring).*?$", "").split("(?s)(?<=/>).*?(?=<spring|$)"));

This works by first stripping off any leading and trailing xml wrapping/chars, then splitting on xml end/start of tag. It will actually extract all spring tags from any kind of input - whatever comes before or after the spring tags is thrown away.

Here's some test code:

String str = "<title><spring:message code='cdc.header.title'/> <span></span></br><spring:message code='cdc.accessdenied.title'/></title>";
List<String> springTags = Arrays.asList(str.replaceAll("^.*?(?=<spring)|(?<=/>)(?!.*<spring).*?$", "").split("(?<=/>).*?(?=<spring|$)"));
System.out.println(springTags);

Output:

[<spring:message code='cdc.header.title'/>, <spring:message code='cdc.accessdenied.title'/>]
Sign up to request clarification or add additional context in comments.

5 Comments

Seems like it meets requirement, Can you please explain what happens internally , I am so thankful for this
If my string has like this <title><spring:message code='cdc.header.title'/> <span></span></br><spring:message code='cdc.accessdenied.title'/></title>, Its giving wrong output
OK I've updated the answer to cater for intervening text by capturing it as part of the delimiter. btw, you can un-accept an answer an accept another one :)
Thanks for you answer, Your code is working in many scenarios except few, Like if my string has line separator then it is failing. But Still your answer is awesome
I've added the "dot all" flag (?s) to the regexes - see if that solves your embedded newline issue. I didn't update the test code sample though, because I didn't know where to put them.
1
<tag> something</tag>

you can extract "something", using XML parser library.

2 Comments

Can you please share complete example, In this case if my tag is not well formed(in case of quote missed ) then will this extract properly
I dont want to use XML parser, I want to java PATTERN matcher because my file might not be well formed.
0

Here's an example that does this in pure Java:

public static ArrayList<String> parseDocument(
        final String document,
        final String begin,
        final String end) {

    ArrayList<String> subs = new ArrayList<String>(0);

    document_parse:
        for (int i = 0, h, j, k; i < document.length(); ) {

            for (h = i, k = 0; k < begin.length(); h++, k++) {
                if (h > document.length() - begin.length()) {
                    break document_parse;

                } else if (document.charAt(h) != begin.charAt(k)) {
                    i++;
                    continue document_parse;
                }
            }

            end_search:
                for ( ; ; h++) {
                    if (h > document.length() - end.length()) {
                        break document_parse;
                    }

                    for (j = h, k = 0; k < end.length(); j++, k++) {
                        if (document.charAt(j) != end.charAt(k)) {
                            continue end_search;
                        }
                    }

                    if (k == end.length()) {
                        break;
                    }
                }

            h += end.length();

            subs.add(document.substring(i, h));

            i = h;
        }

    return subs;
}

This kind of thing might be faster than regex. The loops are a bit complex but I tested it and it works.

3 Comments

I played around with this and it works. Called with the string in the OP and output is what is expected.
Thanks, Works awesome, But because of this condition its skipping last charcter if (h == document.length() - end.length()) so i updated it as if (h > document.length() - end.length()), Now working cool
You're welcome. And yeah you're right. I also found another minor correction: the else if (h == document.length()) break; in the first inner loop (which is supposed to prevent index out of bounds if the doc ends with a partial tag) should be else if (h == document.length() - 1) break; or could be refactored as being a loop condition. I actually might end up using this myself too since I have a similar parsing I need to do.
0

You can use the DOM parser and parse the file as an XML file. I guess you have to retrieve other nodes, attributes and values also, Parser will really help you in this case.

2 Comments

Can you please share complete example
I dont want to use XML parser, I want to java PATTERN matcher because my file might not be well formed and it is JSP.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.