I have to find max(s.length * s.count) for any substring s of a given string t, where s.length is the length of the substring and s.count is the number of times s occurs within t. Substrings may overlap within t.
Example:
For the string aaaaaa, the substring aaa has the max (occurrences * length), substrings and occurrences are:
a: 6
aa: 5
aaa: 4
aaaa : 3
aaaaa: 2
aaaaaa: 1
So aaa is our winner with 3 occurrences * length 4 is 12. Yes, aaaa also has a score of 12, but aaa comes first.
I have tried the only means I know or can figure out, but I have an input string of 100,000 length, and just finding all the substrings is O(n^2), and this hangs my program:
var theSet = new HashSet<string>();
for (int i = 1; i < source.Length; i++)
{
for (int start = 0; start <= source.Length - i; start++)
{
var sub = source.Substring(start, i);
if (!theSet.Contains(sub))
{
theSet.Add(sub);
}
}
}
...
// Some not-noteworthy benchmark related code
...
int maxVal = 0;
foreach (var sub in subs)
{
var count = 0;
for (var i = 0; i < source.Length - sub.Length + 1; i++)
{
if (source.Substring(i, sub.Length).Equals(sub)) count++;
}
if (sub.Length * count > maxVal)
{
maxVal = sub.Length * count;
}
}
I know I am looking for a relatively unknown algorithm and or data structure with this, as google yields no results that closely match the problem. In fact, Google is where I basically only found the costly algorithms I have attempted to use in the above code.
O(n^4), notO(n^2). There areO(n^2)subs. For each you run anO(n^2)calculation.