1

I am making a list of unique "set of 3 strings" from some data, in a way that if the 3 strings come together they become a set, and I can only have unique sets in my list.

  1. A,B,C
  2. B,C,D
  3. D,E,F and so on

And I keep adding sets to the list if they do not exist in the list already, so that if I encounter these three strings together {A,B,C} I wont put it in the list again. So I have 2 questions. And the answer to second one actually depends on the answer of the first one.

  1. How to store this set of 3 string, use List or array or concatenate them or anything else? (I may add it to a Dictionary to record their count as well but that's for later)
  2. How to compare a set of 3 strings with another, irrespective of their order, obviously depending on the structure used? I want to know a proper solution to this rather than doing everything naively!

I am using C# by the way.

8
  • Is the order of the strings important? Maybe create a separate class and have a method `bool Compare()' that will make the comparison Commented Dec 3, 2015 at 19:21
  • You don't want to store these strings by concatenating them because "ab" "cd "ef" concatenated together is the same as "abcd" "e" and "f" concatenated, but those 2 are unique sets according to your criteria Commented Dec 3, 2015 at 19:23
  • 1
    To store sets you can use Tuple<string, string, string> Commented Dec 3, 2015 at 19:28
  • @KonstantinZadiran but how would you compare them? because then you will have to compare each of the strings separately, and I didnt mention it but I do have a really large number of these sets, like in some cases, millions! Commented Dec 3, 2015 at 19:30
  • 1
    Would {A,B,C} and {B,A,C} be considered equal? Commented Dec 3, 2015 at 19:30

4 Answers 4

3
  1. Either an array or a list is your best bet for storing the data, since as wentimo mentioned in a comment, concatenating them means that you are losing data that you may need. To steal his example, "ab" "cd "ef" concatenated together is the same as "abcd" "e" and "f" concatenated, but shouldn't be treated as equivalent sets.

  2. To compare them, I would order the list alphabetically, then compare each value in order. That takes care of the fact that the order of the values doesn't matter. A pseudocode example might look like this:

    Compare(List<string> a, List<string> b)
    {
        a.Sort();
        b.Sort();
        if(a.Length == b.Length)
        {
            for(int i = 0; i < a.Length; i++)
            {
                if(a[i] != b[i])
                {
                    return false;
                }
            }
            return true;
        }
        else
        {
            return false;
        }
    }
    

Update

Now that you stated in a comment that performance is an imporatant consideration since you may have millions of these sets to compare and that you won't have duplicate elements in a set, here is a more optimized version of my code, note that I no longer have to sort the two lists, which will save quite a bit of time in executing this function.

Compare(List<string> a, List<string> b)
{
    if(a.Length == b.Length)
    {
        for(int i = 0; i < a.Length; i++)
        {
            if(!b.Contains(a[i]))
            {
                return false;
            }
        }
        return true;
    }
    else
    {
        return false;
    }
}

DrewJordan's approach of using a hashtable is still probably than my approach, since it just has to sort each set of three and then can do the comparison to your existing sets much faster than my approach can.

Sign up to request clarification or add additional context in comments.

3 Comments

Another advantage to this approach over the Tuple approach is that it is easy to expand it to handle any List size instead of just sets of three
thats one way, I can make it better by just making a loop over one List, and check if the other List has all the strings in the first List. This does look like a possible solution, but I am looking for a more optimized solution as my dataset contains millions of these sets. I will mark this as the answer if I dont get a better solution, for now I am just upvoting it. Thanks Kevin
This already does only loop over one of the lists, that's why there is only one for loop. I'm not sure what you mean by saying it is looping over both unless you mean that it is sorting both lists, which is somewhat inefficient. Also, if performance is a large concern, you should note that in your question. since that influences the kind of response you will get. I'll see if I can come up with a more optimized version
1

Probably the best way is to use a HashSet, if you don't need to have duplicate elements in your sets. It sounds like each set of 3 has 3 unique elements; if that is actually the case, I would combine a HashSet approach with the concatenation that you already worked out, i.e. order the elements, combine with some separator, and then add the concatenated elements to a HashSet which will prevent duplicates from ever occuring in the first place.

If your sets of three could have duplicate elements, then Kevin's approach is what you're going to have to do for each. You might get some better performance from using a list of HashSets for each set of three, but with only three elements the overhead of creating a hash for each element of potentially millions of sets seems like it would perform worse then just iterating over them once.

6 Comments

yup, thats exactly what came to my mind looking at the solution given by Kevin. Thanks Jordan.
actually now I'm thinking that the best way based on your comments would be to first order and concatenate them like you mentioned (A_B_C), and then just create a hashset of those strings. Then you don't ever have to check yourself if it's already there, the HashSet will only allow unique elements.
The assumption that he will never have duplicate elements is a weird assumption to make, though from Taha's comment here it seems to be correct. That is definitely something that should have been in the question in the first place since it hugely affects the way to approach this problem
@KevinWells I agree, it can make a huge difference. I think if there can be duplicate elements your solution is the way to go, but if (as it now sounds) he won't have dupes, I think this approach, or the one in my comment above, is better.
I totally agree with you, I was trying to respond to his question as it was asked, but your approach seems to be superior now that we know his actual requirements
|
0

here is a simple string-wrapper for you:

/// The wrapper for three strings
public class StringTriplet
{

    private List<string> Store;

    // accessors to three source strings:
    public string A { get; private set; }
    public string B { get; private set; }
    public string C { get; private set; }

    // constructor (need to feel internal storage)
    public StringTriplet(string a, string b, string c)
    {
        this.Store = new List<string>();
        this.Store.Add(a);
        this.Store.Add(b);
        this.Store.Add(c);
        // sort is reqiured, cause later we don't want to compare all strings each other
        this.Store.Sort();
        this.A = a;
        this.B = b;
        this.C = c;
    }


    // additional method. you could add IComparable declaration to the entire class, but it is not necessary in your task...
    public int CompareTo(StringTriplet obj)
    {
        if (null == obj)
            return -1;

        int cmp;
        cmp = this.Store.Count.CompareTo(obj.Store.Count);
        if (0 != cmp)
            return cmp;

        for (int i = 0; i < this.Store.Count; i++)
        {
            if (null == this.Store[i])
                return 1;

            cmp = this.Store[i].CompareTo(obj.Store[i]);
            if ( 0 != cmp )
                return cmp;
        }

        return 0;
    }

    // additional method. it is a good practice : override both 'Equals' and 'GetHashCode'. See below..
    override public bool Equals(object obj)
    {
        if (! (obj is StringTriplet))
            return false;
        var t = obj as StringTriplet;
        return ( 0 == this.CompareTo(t));
    }

    // necessary method . it will be implicitly used on adding values to the HashSet
    public override int GetHashCode()
    {
        int res = 0;
        for (int i = 0; i < this.Store.Count; i++)
            res = res ^ (null == this.Store[i] ? 0 : this.Store[i].GetHashCode()) ^ i;

        return res;
    }
}

Now you could just create hashset and add values:

var t = new HashSet<StringTriplet> ();

t.Add (new StringTriplet ("a", "b", "c"));
t.Add (new StringTriplet ("a", "b1", "c"));
t.Add (new StringTriplet ("a", "b", "c"));  // dup
t.Add (new StringTriplet ("a", "c", "b"));  // dup
t.Add (new StringTriplet ("1", "2", "3"));
t.Add (new StringTriplet ("1", "2", "4"));
t.Add (new StringTriplet ("3", "2", "1"));

foreach (var s in t) {
    Console.WriteLine (s.A + " " + s.B + " " + s.C);
}
return 0;

Comments

0

You can inherit from List<String> and override Equals() and GetHashCode() methods:

public class StringList : List<String>
{
    public override bool Equals(object obj)
    {
        StringList other = obj as StringList;
        if (other == null) return false;
        return this.All(x => other.Contains(x));
    }
    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 19;
            foreach (String s in this)
            {
                hash = hash + s.GetHashCode() * 31;
            }
            return hash;
        }
    }
}

Now, you can use HashSet<StringList> to store only unique sets

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.