1

I have CSV file with ~30 columns, one of the columns is a json string. What I want to do is to read the csv and breakdown the json to rows (explode).

for example: CSV:

"data1,date1,{"USERS-1":"ff", "name1":"Joe1", "age":"1"},1" 
"data2,date2,{"USERS-2":"ff", "name2":"Joe2", "age":"2"},2" 
"data3,date3,{"USERS-3":"ff", "name3":"Joe3", "age":"3"},3" 

Result after:

"data1,date1,"USERS-1","ff",1"
"data1,date1,"name1","Joe1",1"
"data1,date1,"age","1",1"
"data2,date2,"USERS-2","ff",2"
"data2,date2,"name2","Joe1",2"
"data2,date2,"age","2",2"
"data3,date3,"USERS-3","ff",3"
"data3,date3,"name3","Joe1",3"
"data3,date3,"age","3",3"

I'm not writing in scala.

The Json is unstructured!

3

1 Answer 1

2

Joe! I wrote a class that in order to show you how I would approach your problem. Following the code I will give you extra details in order for you to better understand what the code does.

public class MMM {

public static void main(String[] args) {
    String s = "data1,date1,{\"USERS-1\":\"ff\", \"name1\":\"Joe1\", \"age\":\"1\"},1";
    processLine(s);
}

public static void processLine(String s) {
    final String dates = s.split("[{]")[0];
    final String content = s.split("[{]")[1];
    final List<String> elements = Arrays.stream(content.split("[,}]")).map(String::trim).filter(x -> !x.isEmpty())
            .collect(Collectors.toList());
    String result = dates;
    for (int i = 0; i < elements.size() - 1; i++) {
        result += elements.get(i);
        result += elements.get(elements.size() - 1);
        System.out.println(result);
        result = dates;
    }
}
}

Basically, what the code does is to split a line read from the CSV into 2 parts, the dates and the contents found between the brackets. The contents are split again, trimmed in order to remove " " found at the ends of the strings and the the empty strings are filtered out. We now have a list of the elements concerning us. For a better visualisation of what the method does I decided to print the result. You can easily modify the code in order to have them returned in a list or whatever you might like. I hope my answer was helpful, have a nice day!

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your response but the question was related to Spark application rather than Java question.
The solution is to create a class implementing the Function<String, List<String>> interface. Add the same functionality as the one provided by the method written by me in the call() method which you must overwrite. After you have you class you can use it in a map function called on the data structure that you use. After applying the class/function on your data structure all that remains is to explode or flatMap your data and you will obtain the desired output. If you are using JavaRDD : yourRDD.map(new YourFunction()).flatMap(x -> x.iterator()).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.