0

I have written a function that gives me an multidimensional array of an Match with multiple regex strings. (FileCheck[][])

  1. FileCheck[0] // This string[] contains all the filenames
  2. FileCheck[1] // This string[] is 0 or 1 depending on a Regex match is found.
  3. FileCheck[2] // This string[] contains the Index of the first found Regex.

        foreach (string File in InputFolder)
        {
            int j = 0;
            FileCheck[0][k] = Path.GetFileName(File);
            Console.WriteLine(FileCheck[0][k]);
            foreach (Regex Filemask in Filemasks)
            {
                if (string.IsNullOrEmpty(FileCheck[1][k]) || FileCheck[1][k] == "0")
                {
                    if (Filemask.IsMatch(FileCheck[0][k]))
                    {
                        FileCheck[1][k] = "1";
                        FileCheck[2][k] = j.ToString(); // This is the Index of the Regex thats Valid
                    }
                    else
                    {
                        FileCheck[1][k] = "0";
                    }
                    j++;
                }
                Console.WriteLine(FileCheck[1][k]);
            }
            k++;
        }
        Console.ReadLine();
    
        // I need the Index of the Regex with the most valid hits
    

I'm trying to write a function that gives me the string of the RegexIndex that has the most duplicates. This is what I tried but did not work :( (I only get the count of the string the the most duplicates but not the string itself)

        // I need the Index of the Regex with the most valid hits
        var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
            .Where(x => FileCheck[1][x] == "1")
            .GroupBy(x => FileCheck[2][x])
            .OrderByDescending(x => x.Count())
            .First().ToList();
        Console.WriteLine(LINQ[1]);

Example Data

        string[][] FileCheck = new string[3][];
        FileCheck[0] = new string[]{ "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt"};
        FileCheck[1] = new string[]{ "0","1","1","0","1","1"};
        FileCheck[2] = new string[]{ null, "3", "3", null,"1","2"};

In this example I need as result of the Linq query:

 string result = "3";
7
  • 2
    What means "but did not work"? What is the result you get? An exception? Any different result? Please be more specific on your question. Commented Mar 8, 2016 at 11:46
  • I get the count of the Most Elements but not the string of the RegexIndex. I have edited my question to make it more clear. Commented Mar 8, 2016 at 12:09
  • @kami Um, are you sure? Your query above gives 2 which is the string, not the count (which is 3) Commented Mar 8, 2016 at 12:12
  • Is the multidimensional array required for your structuring? Using an array of classes would make the code more readable and queries such as this much easier to create. Commented Mar 8, 2016 at 12:22
  • If you change the Example Data to "FileCheck[2] = new string[]{ null, "3", "2", null,"1","2"};" 2 should be the result to but I get 5 then... So there must be something wrong. Commented Mar 8, 2016 at 12:24

4 Answers 4

1

With your current code, substituting 'ToList()' with 'Key' would do the trick.

var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
            .Where(x => FileCheck[1][x] == "1")
            .GroupBy(x => FileCheck[2][x])
            .OrderByDescending(x => x.Count())
            .First().Key;

Since the index is null for values that are not found, you could also filter out null values and skip looking at the FileCheck[1] array. For example:

var maxOccurringIndex = FileCheck[2].Where(ind => ind != null)
        .GroupBy(ind=>ind)
        .OrderByDescending(x => x.Count())
        .First().Key;

However, just a suggestion, you can use classes instead of a nested array, e.g.:

class FileCheckInfo
{
    public string File{get;set;}
    public bool Match => Index.HasValue;
    public int? Index{get;set;}

    public override string ToString() => $"{File} [{(Match ? Index.ToString() : "no match")}]";
}

Assuming InputFolder is an enumerable of string and Filemasks an enumerable of 'Regex', an array can be filled with:

FileCheckInfo[] FileCheck = InputFolder.Select(f=>
    new FileCheckInfo{
        File = f, 
        Index = Filemasks.Select((rx,ind) => new {ind, IsMatch = rx.IsMatch(f)}).FirstOrDefault(r=>r.IsMatch)?.ind
        }).ToArray();

Getting the max occurring would be much the same:

var maxOccurringIndex = FileCheck.Where(f=>f.Match).GroupBy(f=>f.Index).OrderByDescending(gr=>gr.Count()).First().Key;

edit PS, the above is all assuming you need to reuse the results, if you only have to find the maximum occurrence you're much better of with an approach such as Martin suggested! If the goal is only to get the max occurrence, you can use:

var maxOccurringIndex = Filemasks.Select((rx,ind) => new {ind, Count = InputFolder.Count(f=>rx.IsMatch(f))})
        .OrderByDescending(m=>m.Count).FirstOrDefault()?.ind;
Sign up to request clarification or add additional context in comments.

Comments

1

Your question and code seems very convoluted. I am guessing that you have a list of file names and another list of file masks (regular expressions) and you want to find the file mask that matches most file names. Here is a way to do that:

var fileNames = new[] { "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt" };
var fileMasks = new[] { @"\.txt$", @"\.xml$", "valid" };
var fileMaskWithMostMatches = fileMasks
  .Select(
    fileMask => new {
      FileMask = fileMask,
      FileNamesMatched = fileNames.Count(
        fileName => Regex.Match(
            fileName,
            fileMask,
            RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
          )
          .Success
      )
    }
  )
  .OrderByDescending(x => x.FileNamesMatched)
  .First()
  .FileMask;

With the sample data the value of fileMaskWithMostMatches is valid.

Note that the Regex class will do some caching of regular expressions but if you have many regular expressions it will be more effecient to create the regular expressions outside the implied fileNames.Count for-each loop to avoid recreating the same regular expression again and again (creating a regular expression may take a non-trivial amount of time depending on the complexity).

Comments

1

As an alternative to Martin's answer, here's a simpler version to your existing Linq query that gives the desired result;

var LINQ = FileCheck[2]
              .ToLookup(x => x)                   // Makes a lookup table
              .OrderByDescending(x => x.Count())  // Sorts by count, descending
              .Select(x => x.Key)                 // Extract the key
              .FirstOrDefault(x => x != null);    // Return the first non null key
                                                  // or null if none found.

Comments

0

Isn't this much more easier?

string result = FileCheck[2]
    .Where(x => x != null)
    .GroupBy(x => x)
    .OrderByDescending(x => x.Count())
    .FirstOrDefault().Key;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.