0
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;

public class work {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        Map m1 = new HashMap();
        try (BufferedReader br = new BufferedReader(new FileReader("error.txt"))) {
            StringBuilder sb = new StringBuilder();
            String line = br.readLine();
            while (line != null) {
                String[] words = line.split(" ");//**This is where i was strucked**
                for (int i = 0; i < words.length; i++) {
                    if (m1.get(words[i]) == null) {
                        m1.put(words[i], 1);
                    } else {

                        int newValue = Integer.valueOf(String.valueOf(m1.get(words[i])));

                        newValue++;
                        m1.put(words[i], newValue);
                    }
                }
                sb.append(System.lineSeparator());
                line = br.readLine();
            }
        }
        Map<String, String> sorted = new TreeMap<String, String>(m1);
        for (Object key : sorted.keySet()) {
            System.out.println("Error : " + key + "Repeated " + m1.get(key) + " times.");
        }
    }

}

I have a text file as below and i want to count of duplicate lines.I was strucked at how to split this and count.Can any one help me.

ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
    2018-10-29T12:01:00Z E! Error in plugin [inputs.openldap]: LDAP Result Code 32 "No Such Object": 
ERROR  [CompactionExecutor:21454] 2018-10-29 12:02:41,906 NoSpamLogger.java:91 - Maximum memory usage reached (125.000MiB), cannot allocate chunk of 1.000MiB
    2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
    2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
    2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
    2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
    2018-09-20 14:08:14.571 [main] ERROR  org.apache.flink.yarn.YarnApplicationMasterRunner  -     -Dlogback.configurationFile=file:logback.xml
7
  • this might be helpful - stackoverflow.com/questions/46796021/… Commented Oct 31, 2018 at 7:17
  • If you just was to count duplicate lines then why do you need to split on words? Commented Oct 31, 2018 at 7:18
  • Then how to count duplicate lines Commented Oct 31, 2018 at 7:18
  • create a Map<String, Integer> Commented Oct 31, 2018 at 7:25
  • For every sentence with index i, check if it's a duplicate of any sentence at index in [0, i-1]. If yes then skip, else count number of duplicates by taking similarity with every sentence at index [i+1, n-1]. Finally add all such counts. Complexity will O(n^2*L), where L is length of longest sentence. Commented Oct 31, 2018 at 7:26

2 Answers 2

1

Please try this if you are using Java8, and intent is to count duplicate lines (not words)

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;

public class Work {
    public static void main(String[] args) throws IOException {
        Map<String, Long> dupes = Files.lines(Paths.get("/tmp/error.txt"))
                .collect(Collectors.groupingBy(Function.identity(), 
                     Collectors.counting()));

        // pretty print
        dupes.forEach((k, v)-> System.out.printf("(%d) times : %s ....%n", 
             v, k.substring(0,  Math.min(50, k.length()))));
    }
}

output:

(2) times : 2018-09-20 14:08:14.571 [main] ERROR  org.apache.f ....
(8) times :     2018-10-29T12:01:00Z E! Error in plugin [input ....
(9) times : ERROR  [CompactionExecutor:21454] 2018-10-29 12:02 ....
(5) times :     2018-09-20 14:08:14.571 [main] ERROR  org.apac ....
Sign up to request clarification or add additional context in comments.

Comments

0

Map<String,Integer> can be used with the record as key and count as value.

    Map<String,Integer>  countMap= new HashMap<String,Integer>();

    try (
            BufferedReader  br= new BufferedReader(new FileReader(new File("D:\\error.txt")))

        ){

        String data="";
        while ((data=br.readLine())!=null) {

            if(countMap.containsKey(data)) {
                countMap.put(data, countMap.get(data)+1);
            }else {
                countMap.put(data, 1);
            }

        }

        countMap.forEach((k,v)->{System.out.println(k+" Occurs "+v+" times.");});

    } catch (IOException  e) {
        e.printStackTrace();
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.