0

I need a way to convert a strings collection into a unique string. This means that I need to have a different string if any of the strings inside the collection has changed.

I'm working on a big solution so I may wont be able to work with some better ideas. The required unique string will be used to compare the 2 collections, so different strings means different collections. I cannot compare the strings inside one by one because the order may change plus the solution is already built to return result based on 2 strings comparison. This is an add-on. The generated string will be passed as parameter for this comparison.

Thank you!

4
  • 4
    {"a","b","c"} != {"a", "c", "b"} or the order does not matter? first, you have to define what is a different collection here Commented Dec 19, 2011 at 15:49
  • Use a hash algorithm ? MD5, SHA-1... Commented Dec 19, 2011 at 15:50
  • 1
    If your goal is to compare the collections, look into implementing IEquatable<T> Commented Dec 19, 2011 at 15:50
  • the grammar is making it difficult to understand the question. Is a collection string really a collection of strings? Commented Dec 19, 2011 at 15:51

5 Answers 5

1

These both work by deciding to use the separator character of ":" and also using an escape character to make it clear when we mean something else by the separator character. We therefore just need to escape all our strings before concatenating them with our separator in between. This gives us unique strings for every collection. All we need to do if we want to make collections the same regardless or order is to sort our collection before we do anything. I should add that my sample uses LINQ and thus assumes the collection implements IEnumerable<string> and that you have a using declaration for System.LINQ

You can wrap that up in a function as follows

string GetUniqueString(IEnumerable<string> Collection, bool OrderMatters = true, string Escape = "/", string Separator = ":")
{
    if(Escape == Separator)
        throw new Exception("Escape character should never equal separator character because it fails in the case of empty strings");
    if(!OrderMatters) 
        Collection = Collection.OrderBy(v=>v);//Sorting fixes ordering issues.
    return Collection
        .Select(v=>v.Replace(Escape, Escape + Escape).Replace(Separator,Escape + Separator))//Escape String
        .Aggregate((a,b)=>a+Separator+b);
}
Sign up to request clarification or add additional context in comments.

Comments

1

What about using a hash function?

4 Comments

@MoslemBenDhaou A cryptographic hash function will almost certainly return unique strings. If you find two strings that hash to the same thing, it would be big news.
"Ea" and "FB", it simply depends of the prime number used to hash the strings. with 32bit sdk, its often the prime number 31. it is simply the difference between "a" and "B".
@MoslemBenDhaou Use a cryptographic hash.
@MoslemBenDhaou From wikipedia (en.wikipedia.org/wiki/Cryptographic_hash_function), "it is infeasible to find two different messages with the same hash".
1

Considering you constraints, use a delimited approach:

pick a delimiter and an escape method. e.g. use ; and escape it bwithin strings y \;, also escape \ by \\

So this list of strings...

"A;bc"
"D\ef;"

...becomes "A\;bc;D\\ef\;"

It ain't pretty, but considering that it has to be a string, then the good old ways of csv and its brethren isn't all too bad.

Comments

0

By a "collection string" you mean "collection of strings"?

Here's a naive (but working) approach: sort the collection (to eliminate dependency on order), concat them, and take a hash of that (MD5 for instance).

Trivial to implement, but not very clever performance-wise.

3 Comments

MD5 (for example) is a 128-bit number. That's a whole damn lot of different values. Other hashes are even longer. I wouldn't take collisions too seriously.
The actual problem with this solution (as with many solutions proffered) is the corner case of comparing {"AB", "C"} with {"A","BC"}. The hashing part really is fine (but unnecessary)
You can put a separator of your choice between them. I used hashing as a mean to limit size of this "token", but if it's not a problem for you, then ok, don't do it.
0

Are you saying that you need to encode a string collection as a string. So for example the collection {"abc", "def"} may be encoded as "sDFSDFSDFSD" but {"a", "b"} might be encoded as "SDFeg". If so and you don't care about unique keys then you could use something like SHA or MD5.

2 Comments

yes this is what I'm saying but I need the strings generated from encoding the 2 collections to be always unique. That's why I can't use hash functions.
@Moslem Most hash functions can be considered to be unique unless the sample size is huge, and I mean absolutely vast, but if you don't care about the size of the result then you can just concatenate them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.