1

Given some strings, ab,abc,ad, ace, ddd ... from a file, to find out whether some string S is in the strings. If I want to use a hash to check the existence of the S.

Algorithm analysis: Since each 'put' operation takes O(1) * K time, where K is the length of the string, the building up of the map takes O(n * K) time, where n is the number of string. And the look up takes constant time, so the overall time is O(n*k). Is this the right thought? While we often say hash table allows constant time look up, it typically ignores the time spent on building up the hash. When answering a question like this, should I say the time complexity is O(n*k), or just O(n)? Thanks a lot. Just started to learn algorithm analysis.

String line = null;

BufferedReader br = new BufferedReader(new File("file.txt"));
HashMap<String, Integer> strs = new HashMap<String,Integer>();
while((line=br.readLine()) != null){
   Integer count = strs.get(line);
   if( count == null){
      strs.put(line, 1);
   }
   count++;
}

// Suppose the string to be checked is "str";
if(strs.contains(str)){
   return true;
}
return false;    

1 Answer 1

2

When you say O(n), the n is proportional to the size of the input. However you are asking whether to state O(n * k) where k is also proportional to the size of the input. In this case, what you are really saying is O(n^2) - where the algorithm might take time t for an input of K, but it will take time 4t for an input of 2k. Is this what you mean to say?

If you just mean to say that the algorithm requires t time for k input, and 2t time for 2k input, then you simply say the algorithm is O(n).

Now on to the algorithm... if I am interpreting it right, the steps are building the hash while reading the file, and then the hashtable lookup. One pass over the file is O(n) concurrent with putting all the strings in the hashtable which is also O(n) (assuming the hashtable is reasonably well implemented). Looking up one entry in the hashtable is O(1).

So overall the time complexity is O(n), concurrent with O(n), plus O(1)... Usually we just describe this as O(n), or time in linear proportion to the size of the input.

Sign up to request clarification or add additional context in comments.

7 Comments

Let's ignore my code. A file contains a list of strings and check whether a string exists in the strings. With a HashMap, what's the time complexity? That's what I want to answer. Sorry for the question.
"K" is meant to be the average length of the strings.
Edited to give an algorithm-centric answer rather than a big-o syntax-centric answer :)
So reading the file line by line takes n steps, and put each string into the map takes another n step. The total is 2n, which is O(n). In both case, 'n' is the number of strings?
In big-o notation, n represents the 'size' or 'difficulty' of the input. If we assume all strings are below a length that would start to affect the resources needed to process the algorithm, then n represents the number of strings to compare against. Being O(n) means the time it takes to run the algorithm generally increases in direct proportion to the size of the input. (think linear equation in mathematics, y=someFactor * x). If you have further questions about big-O notation the SO police will probably start suggesting you ask a new question :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.