3

Given a wikiText string such as:

{{ValueDescription
    |key=highway
    |value=secondary
    |image=Image:Meyenburg-L134.jpg
    |description=A highway linking large towns.
    |onNode=no
    |onWay=yes
    |onArea=no
    |combination=
    * {{Tag|name}}
    * {{Tag|ref}}
    |implies=
    * {{Tag|motorcar||yes}}
    }}

I'd like to parse templates ValueDescription and Tag in Java/Groovy. I tried with with regex /\{\{\s*Tag(.+)\}\}/ and it's fine (it returns |name |ref and |motorcar||yes), but /\{\{\s*ValueDescription(.+)\}\}/ doesn't work (it should return all the text above).

The expected output

Is there a way to skip nested templates in the regex?

Ideally I would rather use a simple wikiText 2 xml tool, but I couldn't find anything like that.

Thanks! Mulone

1
  • 1
    Can you please provide some sample outputs you are expecting from above input? Commented Jun 3, 2011 at 13:37

2 Answers 2

4

Arbitrarily nested tags won't work since that's makes the grammar non-regular. You need something capable of dealing with a context-free grammar. ANTLR is a fine option.

Sign up to request clarification or add additional context in comments.

Comments

2

Create your regex pattern using Pattern.DOTALL option like this:

Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}", Pattern.DOTALL);

Sample Code:

Pattern p=Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}",Pattern.DOTALL);
Matcher m=p.matcher(str);
while (m.find())
   System.out.println("Matched: [" + m.group(1) + ']');

OUTPUT

Matched: [
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
]

Update

Assuming closing }} appears on a separate line for {{ValueDescription following pattern will work to capture multiple ValueDescription:

Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+?)\n\\}\\}", Pattern.DOTALL);

4 Comments

this works but if there's another '''{{ValueDescription''' block it won't stop.
@Mulone: Assuming closing }} appears on a separate line for {{ValueDescription following pattern will work to capture multiple ValueDescription: Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+?)\n\\}\\}", Pattern.DOTALL);
I don't think that that assumption is valid when reading wikitext. Is there a way to make it robust?
@Mulone: Regular expressions do have limitations here, you need to have some type of pattern to match. Closing }} must be either on a separate line or be followed by some other character that we can use in the pattern above. For validating/matching a non-regular text you will eventually need a parser utility or will need to write your own parser.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.