0

I have some problems storing variables in an arrayList. The propose of the programm is to read from one file (A) , read another text file (B) and than compare how much percentage covers the occured vocabulary from A in B. For this reason, I store every word which occurs togheter in neuS. And here comes the problem. If I try to get the output, it seems to store the values random times inside! So for example I get output like:

elektrotechnik und
die bedeutendste
die bedeutendste
und simulation
erleben die
eine form
eine form

So there are some words (correctly said Ngramms, because I store always two words togheter), which are only one time inside neuS but others only one time. I also have seen the output like three times the same. I want all words only stored once inside neuS. What am I'm doing wrong? The code isn't complete, there are some code which I supposse that's irrelevant for this issue.

Thanks!

BufferedReader in = new BufferedReader(new FileReader("informatik_test.txt"));
String str;

// 
while ((sCurrentLine = in.readLine()) != null) {
    // System.out.println(sCurrentLine);
    arr = sCurrentLine.split(" ");
    for (int i = 0; i < arr.length - 1; i = i + 2) {
        String s = (arr[i].toString() + " " + arr[i + 1].toString())
                .toLowerCase();
        if (null == (hash.get(s))) {
            hash.put(s, 1);
        } else {
            int x = hash.get(s) + 1;
            hash.put(s, x);
        }
    }
    //

    ArrayList< String> words = new ArrayList< String>();
    ArrayList< String> neuS = new ArrayList< String>();
    ArrayList< Long> neuZ = new ArrayList< Long>();

    // Read all Lines from a file
    for (String line = br.readLine(); line != null; line = br.readLine()) {
        String h[] = line.split("   ");

        words.add(h[0].toLowerCase());

    }
    //
    for (String x : hash.keySet()) {
        summe = summe + hash.get(x);
        long neu = hash.get(x);
        for (String s : words) {

            if (x.equals(s)) {
                neuS.add(x);
                neuZ.add(neu);
                disc = disc + 1;
            }

        }
    }
    // Testing which word for output -->! THE PROBLEM!!
    for (String m : neuS) {
        System.out.println(m);
    }

}
3
  • What kind of Object is 'hash'? (hash.put(s, 1);) Commented Jul 13, 2015 at 21:33
  • It must be a HashMap, look at the methods being used. Commented Jul 13, 2015 at 21:38
  • Yes, it's an HashMap! Commented Jul 13, 2015 at 21:45

3 Answers 3

1

If you want the words in neuS to only be stored once, than neuS should be a HashSet. As it is, because both words and neuS are arrays, if words contains duplicates, neuS will contain duplicates too.

Side note: for String h[] = line.split(" "); you have 2 spaces in the split. Is that deliberate?

Sign up to request clarification or add additional context in comments.

2 Comments

YES :) If I use it, it works. But how does HashSet work? Will the lines equally been added into neuZ because there I need to store all of them, beacause later I make a sum of theses values. So neuS as HashSet and neuZ as ArrayList will get right percentage? And yes, the split is of the structure from my data ;)
All Sets in Java ensure that objects only occur once within them. HashSet does this through hashing, which is normally more efficient than the common alternative way to create a set: through a Tree. I'd recommend reading about data structures like Sets and lists, as well as hashing.
1

You could turn neuS into a HashSet. It would fix your output.

Comments

1
for (String s : words) { 
    if (x.equals(s)) {
        neuS.add(x);
        neuZ.add(neu);
        disc = disc + 1;
    }
}

You should add break; after disc = disc + 1; and you should check if x is in neuS before adding it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.