The right way to analyze this simple algorithm

Question

Given some strings, ab,abc,ad, ace, ddd ... from a file, to find out whether some string S is in the strings. If I want to use a hash to check the existence of the S.

Algorithm analysis: Since each 'put' operation takes O(1) * K time, where K is the length of the string, the building up of the map takes O(n * K) time, where n is the number of string. And the look up takes constant time, so the overall time is O(n*k). Is this the right thought? While we often say hash table allows constant time look up, it typically ignores the time spent on building up the hash. When answering a question like this, should I say the time complexity is O(n*k), or just O(n)? Thanks a lot. Just started to learn algorithm analysis.

String line = null;

BufferedReader br = new BufferedReader(new File("file.txt"));
HashMap<String, Integer> strs = new HashMap<String,Integer>();
while((line=br.readLine()) != null){
   Integer count = strs.get(line);
   if( count == null){
      strs.put(line, 1);
   }
   count++;
}

// Suppose the string to be checked is "str";
if(strs.contains(str)){
   return true;
}
return false;

user1445967 · Accepted Answer · 2014-01-27 19:30:36Z

2

When you say O(n), the n is proportional to the size of the input. However you are asking whether to state O(n * k) where k is also proportional to the size of the input. In this case, what you are really saying is O(n^2) - where the algorithm might take time t for an input of K, but it will take time 4t for an input of 2k. Is this what you mean to say?

If you just mean to say that the algorithm requires t time for k input, and 2t time for 2k input, then you simply say the algorithm is O(n).

Now on to the algorithm... if I am interpreting it right, the steps are building the hash while reading the file, and then the hashtable lookup. One pass over the file is O(n) concurrent with putting all the strings in the hashtable which is also O(n) (assuming the hashtable is reasonably well implemented). Looking up one entry in the hashtable is O(1).

So overall the time complexity is O(n), concurrent with O(n), plus O(1)... Usually we just describe this as O(n), or time in linear proportion to the size of the input.

edited Jan 27, 2014 at 19:30

answered Jan 27, 2014 at 19:21

user1445967

1,5704 gold badges14 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user697911 Over a year ago

Let's ignore my code. A file contains a list of strings and check whether a string exists in the strings. With a HashMap, what's the time complexity? That's what I want to answer. Sorry for the question.

user697911 Over a year ago

"K" is meant to be the average length of the strings.

user1445967 Over a year ago

Edited to give an algorithm-centric answer rather than a big-o syntax-centric answer :)

user697911 Over a year ago

So reading the file line by line takes n steps, and put each string into the map takes another n step. The total is 2n, which is O(n). In both case, 'n' is the number of strings?

user1445967 Over a year ago

In big-o notation, n represents the 'size' or 'difficulty' of the input. If we assume all strings are below a length that would start to affect the resources needed to process the algorithm, then n represents the number of strings to compare against. Being O(n) means the time it takes to run the algorithm generally increases in direct proportion to the size of the input. (think linear equation in mathematics, y=someFactor * x). If you have further questions about big-O notation the SO police will probably start suggesting you ask a new question :)

|

Collectives™ on Stack Overflow

The right way to analyze this simple algorithm

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related