0

I have seen a lot of similar problems that keep getting me partway there, but I end up starting to have to build recursive solutions on top which I'm trying to avoid.

I have an XML file I want to convert into a map.

The structure is unknown except for one thing

<items>
<item>
{section to be converted to map}
</item>
<item>
{section to be converted to map}
</item>
</items>

All the solutions I've seen seem to be converting the entire thing into map, or parse through each attribute/value manually.

I'm wondering if there's a method to say...

Jump into items, jump into each item, convert each item to map, then I can process each map individually.

I am streaming through a potentially large file, so don't want to hold the entire thing in memory, just one item at a time.

I have tried to use xmleventreader to do this, but get jammed up in a recursive nightmare. The item was for the event reader to find individual items, and them on those individual items to have them process, but not finding any documentation on how to capture what's between each value so it can be processed as an XML later.

private void parseItemList(MultipartFile file) {
    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    try {
        XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(file.getInputStream());
        processItems(xmlEventReader);
    } catch (FileNotFoundException | XMLStreamException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

The idea would be to convert:

<items>
<item>
<id>1</id>
<value>4</value>
</item>
<item>
<id>2</id>
<attributes>
<value>5</value>
</attributes>
</item>
</items>

into

{id: 1, value: 4}
{id: 2, attributes{value: 5}}

and so on, where I can process and push each map individually before moving onto the next. If there's a library that handles this easily, I'd love to be pointed in that direction.

2 Answers 2

1

You can use Declarative Stream Mapping (DSM) stream parsing library to easily convert complex XML to java class. You can call custom functions to process partial data while you are reading the xml document.

First of all, you must define mappings between XML data and your class fields in yaml format.

Here are mapping definitions for your XML.

result:     
   type: object                    # type is the object to consume low memory. 
                                   #you will get an object(Map) as a result not List
   path: /items/item
   function: processFunction       # call registered ' processFunction' for every 'items/item'
   fields:
       value:          # not exist for second item it is skiped
       id:
       attributes:      # not exist for first item so it is skiped
         type: object
         fields:
           value: 

Create a process function:

FunctionExecutor processFunction=new FunctionExecutor() {           
            @Override
            public void execute(Params params) {
                Node node = params.getCurrentNode();
                // deserialize to your class
                //YourClass.class=node.toObject(YourClass.class);
                //or
                // directly access data
                System.out.println(node.getData());

            }
        } 

Java Code to parse XML:

 DSMBuilder builder = new DSMBuilder(new File("path/to/mapping.yaml")).setType(DSMBuilder.TYPE.XML);

 // register function
 builder.registerFunction("processFunction",processFunction);

 DSM dsm= builder.create();
 Object object =  dsm.toObject(xmlContent);
 // type of object varibale is a Map<String,Obejct>

Output:

{id=1, value=4}
{id=2, attributes={value=5}}

UPDATE:

If you want to value tag in "item" and "attributes" tag become in same level you can change your mapping definition as follows

result:     
   type: object                    # it is defiden as object for low memory. 
                                   #you will get an object(Map) as a result not List
   path: /items/item
   function: processFunction       # call registered '' processFunction for every 'items/item'
   fields:
       id:
       value:          
          path: .*value   # path is regex.   take both value tag in item and attributes tags

Output:

{id=1, value=4}
{id=2, value=5}
Sign up to request clarification or add additional context in comments.

Comments

0

Your map definition looks like JSON and it would be easiest to just convert your XML into JSON

4 Comments

Is that possible without loading the entire xml into memory?
I don't believe that you can. Manually doing this will result in loading the document into memory as well.
I'm using xmleventrreader right now and building a godawful recursive method to build out a map as I go through, it doesn't load all into memory but it's painful to build.
I would like to point out that implementing your own parser can be VERY worrisome as the XML definition allows for full URL/URI in place of simple namespaces.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.