0

I have requirement where reader being is used to convert a file to a composite object, I need to take different objects inside that object and write it as separate json file. That means for single line of csv file, there will be multiple jsons files created and that needs to be written to Marklogic database. I have used multiple item writer to convert a file to single output file, but now I need to split each line to multiple line and write same to marklogic database. Any idea how single line can be split to multiple files and written to Marklogic database.

example of composite object created out of Item reader, below is just an example not the actual issue scenario:

    Person{
        HomeAddress homeadd;
        OfficeAddress officeAdd;
    }

A single line of csv represents home add and office add. I need in output two json files/objects (one for each type of add) written to Marklogic database. Thanks

1 Answer 1

1

If you were using MLCP to process the CSV to one record per line of CSV, then you could also define a transform rule on the input and hijack that process to parse/insert the additional documents.

You could also use a post-commit trigger and after the initial insert, process the documents into the required pieces. If this is high-volume, then you may decide to do this via Corb2.

You could pre-process the CSV into multiple csv files suitable for immidiate ingestion.

Considering all of the options above you could use the data movement SDK to author your solution: https://developer.marklogic.com/learn/data-movement-sdk (or even the MLCP/Hadoop related libraries)

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks David. I am using ItemReader and custom implementation for ItemProcessor and ItemWriter for doing so. I have split the object into more than one object and created file in writer. Now looking into ways to write back into Marklogic database, without writing the file onto disk.
If you're reading data from a file, I think David's first suggestion of using an MLCP transform to split the line into two documents is the easiest way to go. MLCP does well when your data is in a file; I normally bring in Spring Batch when data has to be retrieved from a different source.
Actually data has to be transformed before writing into Marklogic, which is via processor is used to process the data and, by using writer, I am trying to avoid disk writes by just populating the streams instead of file as output and then inserting them into marklogic.
MLCP has an option to transform data on the way in. Why does it have to be transformed externally?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.