LINQ to JSON group query on array

Question

I have a sample of JSON data that I am converting to a JArray with NewtonSoft.

        string jsonString = @"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";

I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:

sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1

However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:

        JArray autoFeatures = JArray.Parse(jsonString);
        var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
                       group f by f into grp
                       orderby grp.Count() descending
                       select new { indFeature = grp.Key, count = grp.Count() };

        foreach (var feature in features)
        {
            Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
        }

Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3

I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.

var features = JsonConvert.DeserializeObject<List<Dictionary<string, string[]>>>(jsonString).SelectMany(d => d).GroupBy(k => string.Concat(k.Value.OrderBy(s => s))).Select(g => new { Feature = g.Key, Count = g.Count() }).OrderByDescending(a => a.Count);. Value strings internally pre-ordered (to generates ordered groups that ignore the string values positions) — Jimi
– Jimi, Commented Aug 30, 2019 at 21:55

DetectivePikachu · Accepted Answer · 2019-08-30 20:26:12Z

4

This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it

var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
                       group f by f into grp
                       orderby grp.Count() descending
                       select new { indFeature = grp.Key, count = grp.Count() };

Produces the following output

sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1

answered Aug 30, 2019 at 20:26

DetectivePikachu

6603 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Stewbob Over a year ago

That's gets exactly what I need. Thanks. I didn't realize I would have to join the strings together. I figured there was a way to extract that information without doing a string manipulation.

DetectivePikachu Over a year ago

There very well may be a way to do that with LINQ, but its far beyond my powers! Though hopefully now that I have posted an answer, the experts will come out of the woodwork to show how wrong and inefficient my version is and we can both learn something :)

steve16351 · Accepted Answer · 2019-08-30 20:46:34Z

3

You could use a HashSet to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer class in the GroupBy to help compare one set of features to another to check if they're the same.

For example:

var featureSets = autoFeatures
    .Select(feature => new HashSet<string>(feature["features"].Values<string>()))
    .GroupBy(a => a, new HashSetComparer<string>())
    .Select(a => new { Set = a.Key, Count = a.Count() })
    .OrderByDescending(a => a.Count);

foreach (var result in featureSets)
{
    Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}

And the comparer class leverages the SetEquals method of the HashSet class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)

public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
    public bool Equals(HashSet<T> x, HashSet<T> y)
    {
        // so if x and y both contain "sunroof" only, this is true 
        // even if x and y are a different instance
        return x.SetEquals(y);
    }

    public int GetHashCode(HashSet<T> obj)
    {
        // force comparison every time by always returning the same, 
        // or we could do something smarter like hash the contents
        return 0; 
    }
}

answered Aug 30, 2019 at 20:46

steve16351

5,8572 gold badges21 silver badges33 bronze badges

4 Comments

Stewbob Over a year ago

Just tried your solution on a more complex dataset. The HashSetComparer is essential to capture and combine(group) the cases where the features are not listed in the same order. Thank you.

Brett Caswell Over a year ago

sure, it's a good consideration, but I'm not so sure it's a good answer to this question. For, it seems the merit of raising this consideration is in the absence of it being part of the question itself. Also, raising this consideration prompts more considerations: was there any meaning in the order of the features to begin with and/or means of ordering these features to begin with. if you may want "sunroof, mag wheels" and "mag wheels, sunroof" to be indistinguishably different then you wouldn't use this approach; If you can order features already, this approach would be unnecessary.

Brett Caswell Over a year ago

I don't want to dissect this thoughtful answer too much, but in the event someone thinks this is the best approach.. it depends.

Stewbob Over a year ago

Based on my question 'as asked' it's a bit overkill. Based on my actual needs and dataset, this answer is absolutely necessary.

Collectives™ on Stack Overflow

LINQ to JSON group query on array

2 Answers 2

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related