So basically, for this assignment I'm working on, we have to read in from a huge file of about a million lines, store the keys and values in a data structure of our choice (I'm using hash tables), offer functionality to change values for keys, and then save the key value stores back into a file. I'm using the cuckoo hashing method along with a method I found from a Harvard paper called "stashing" to accomplish this, and I'm fine with all of it. My only concern is the amount of time it is taking the program just to read in the data from the file.
The file is formatted so that each line has a key (integer) and a value (String) written like this:
12345 'abcdef'
23456 'bcdefg'
and so on. The method I have come up with to read this in is this:
private static void readData() throws IOException {
try {
BufferedReader inStream = new BufferedReader(new FileReader("input/data.db"));
StreamTokenizer st = new StreamTokenizer(inStream);
String line = inStream.readLine();
do{
String[] arr = line.split(" ");
line = inStream.readLine();
Long n = Long.parseLong(arr[0]);
String s = arr[1];
//HashNode<Long, String> node = HashNode.create(n, s);
//table = HashTable.empty();
//table.add(n, s);
}while(line != null);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The method works fine for actually getting the data, however I tested it with our test file of a million lines and it took about 20 minutes for it to get all the way through reading this all in. Surely, this isn't a fast time for reading in data from a file, and I am positive there must be a better way of doing it.
I have tried several different methods for input (BufferedInputStream with FileInputStream, using Scanner however the file extension is .db so Scanner didn't work, I initially didn't have the tokenizer but added it in hopes it would help). I don't know if the computer I'm running it on makes much of a difference. I have a MacBook Air that I am currently doing the run on; however, I am having a mate run it on his laptop in a bit to see if that might help it along. Any input on how to help this or what I might be doing to slow things SO much would be sincerely and greatly appreciated.
P.S. please don't hate me for programming on a Mac :-)