I am given the task of writing Java code to read in lines from a database (of which there can be 100s of thousands of lines) and returning the sum of permutations of each line in the database. For example, if there are 3 lines as such:
ABCD
BACD
DCAB
Then the output should be 3 (ABCD == BACD, ABCD == DCAB, BACD == DCAB) where == means 'is a permutation'.
An obvious answer would be (n = words in dictionary, m = characters in word):
- Loop through database (
O(n))
On each line: (O(1))- Sort in order (
O(mlogm)) - Add to array (
O(1))
- Sort in order (
- Loop through array (
i = 0 : n) (O(n))- Loop through array (
j = i + 1 : n) (O(n))- If
array[i] == array[j]thencount++(O(1))
- If
- Loop through array (
- Return count (
O(1))
Total complexity:
= O(n) * (O(1) * (O(mlogm) + O(1))) + O(n) * (O(n) * O(1)) + O(1)
= O(n) * (O(mlogm)) + O(n) * (O(n)) + O(1)
= O(nmlogm) + O(n^2) + O(1)
= O(nmlogm + n^2)
= O(n^2) assuming n >> m
Of course, this is anything but efficient for a large database, so I'd like to see if there are better algorithms. I have thought of using a hash table, but not exactly sure of how to implement it. So, on each line read maybe record the number of occurrences of each character, but then I've only seen implementations of seeing if two words are permutations of each other this way, not n words - this would require more than 1 hash table I believe).
Some information: Not all words are the same length. No information on the distribution is known (it's a randomly 'sorted' database).
Can someone provide some suggestions?
No actual code please.
perm["inp"] = ["nip", "pin"].nin step 3)? What'snin step 1)? It seems you attach the same name to different things here.