1

Suppose I have a string/object having some data in pipe separated format as below

***Input:***
TIMESTAMP|COUNTRYCODE|RESPONSETIME|FLAG
1544190995|US|500|Y
1723922044|GB|370|N
1711557214|US|750|Y

I want to read this string/object and filter data based on particular columns names (assume for eg. TIMESTAMP and FLAG). And return/display the output as shown below-

***Output:***
TIMESTAMP|FLAG
1544190995|Y
1723922044|N
1711557214|Y

I tried using below code:

  1. First i have required header names stored an array:

    headerArray[] = {TIMESTAMP, FLAG}
    
  2. By comparing headerArray[] with first row of input, I got the index of specified column header in input:

    headerIndex[] = {0, 3}
    
  3. Then tried using below code to filter and get the specified columns and values:

    return br.lines()
            .skip(1) // skip headers
            .map(s -> s.split("|"))
            .filter(a -> a[0] && a[3])
            .collect(Collectors.toList());
    

Note: I have over a million lines of pipe separated values. And I want to return all filtered out column values in a single object. I suppose that not possible by returning value as list.

3
  • What's wrong with your code? Commented Dec 1, 2019 at 13:29
  • Not working as expected. It seems like returning random characters. something like "T|1|1". And i'm not sure how to return it in an object and will it process millions lines and Commented Dec 1, 2019 at 13:40
  • .filter(a -> a[0] && a[3]) surely doesn’t even compile. Commented Dec 2, 2019 at 13:27

1 Answer 1

1

You have some problems:

first you should change split's pattern to \\| and instead of filter you can you map to create new string.

 br.lines().skip(1) // skip headers
            .map(s -> s.split("\\|"))
            .map(a -> String.join("|", a[0], a[3]))
            .collect(toList())
Sign up to request clarification or add additional context in comments.

10 Comments

toList() - is this predefined method in java8 ? And I guess this returns list
@ShakthiRaj Why do you really want to process millions of records in memory(within JVM) in that case? It's not meaning/purposeful to attempt the task in such a manner. And what is the source of your input for these millions of entries to be processed?
@HadiJ Not really confident of a solution to post, but this seems much better a task using a regex/pattern rather than split and join. It just seems redundant at the very first look.
For now i'm using BufferReader to read the input. And yes i agree JVM has restrictions, i did some trail and error with different collections to optimize performance and maximize the no. of lines processed. For eg. using HashSet i processed 50K lines, but hashset doesnt allow duplicate values. Whereas List allows duplicate values, but only process 5k lines. I'm just wondering how to process a million records using Java, there must be some way to do that even though it might effect the performance time.
There is no reason why a list should “only process 5k lines”. Of course, that depends on the actual implementation used, as List is only an interface.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.