3

I want to find all elements of an array a1 which items are not a part of array a2 and array a3.

For example:

$a1 = @(1,2,3,4,5,6,7,8)
$a2 = @(1,2,3)
$a3 = @(4,5,6,7)

Expected result:

8

2 Answers 2

5

Try this:

 $a2AndA3 = $a2 + $a3
 $notInA2AndA3 = $a1 | Where-Object {!$a2AndA3.contains($_)}

As a one liner:

$notInA2AndA3 = $a1 | Where {!($a2 + $a3).contains($_)}
Sign up to request clarification or add additional context in comments.

Comments

4

k7s5a's helpful answer is conceptually elegant and convenient, but there's a caveat:

It doesn't scale well, because an array lookup must be performed for each $a1 element.

At least for larger arrays, PowerShell's Compare-Object cmdlet is the better choice:

If the input arrays are ALREADY SORTED:

(Compare-Object $a1 ($a2 + $a3) | Where-Object SideIndicator -eq '<=').InputObject

Note:
* Compare-Object doesn't require sorted input, but it can greatly enhance performance - see below.
* As Esperento57 points out, (Compare-Object $a1 ($a2 + $a3)).InputObject is sufficient in the specific case at hand, but only because $a2 and $a3 happen not to contain elements that aren't also in $a1.
Therefore, the more general solution is to use filter Where-Object SideIndicator -eq '<=', because it limits the results to objects missing from the LHS ($a1), and not also vice versa.

If the input arrays are NOT SORTED:

Explicitly sorting the input arrays before comparing them greatly enhances performance:

(Compare-Object ($a1 | Sort-Object) ($a2 + $a3 | Sort-Object) | 
   Where-Object SideIndicator -eq '<=').InputObject

The following example, which uses a 10,000-element array, illustrates the difference in performance:

$count = 10000                     # Adjust this number to test scaling.
$a1 = 0..$($count-1)               # With 10,000: 0..9999
$a2 = 0..$($count/2)               # With 10,000: 0..5000
$a3 = $($count/2+1)..($count-3)    # With 10,000: 5001..9997

$(foreach ($pass in 1..2) {

  if ($pass -eq 1 ) {
    $passDescr = "SORTED input"
  } else {
    $passDescr = "UNSORTED input"
    # Shuffle the arrays.
    $a1 = $a1 | Get-Random -Count ([int]::MaxValue)
    $a2 = $a2 | Get-Random -Count ([int]::MaxValue)
    $a3 = $a3 | Get-Random -Count ([int]::MaxValue)
  }

  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "CompareObject, explicitly sorted first"
    Timing = (Measure-Command {
        (Compare-Object ($a1 | Sort-Object) ($a2 + $a3 | Sort-Object) | Where-Object SideIndicator -eq '<=').InputObject |
        Out-Host; '---' | Out-Host
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "CompareObject"
    Timing = (Measure-Command {
        (Compare-Object $a1 ($a2 + $a3) | Where-Object SideIndicator -eq '<=').InputObject |
        Out-Host; '---' | Out-Host
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "!.Contains(), two-pass"
    Timing = (Measure-Command {
        $a2AndA3 = $a2 + $a3
        $a1 | Where-Object { !$a2AndA3.Contains($_) } | 
        Out-Host; '---' | Out-Host
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "!.Contains(), two-pass, explicitly sorted first"
    Timing = (Measure-Command {
        $a2AndA3 = $a2 + $a3 | Sort-Object
        $a1 | Sort-Object | Where-Object { !$a2AndA3.Contains($_) } | 
        Out-Host; '---' | Out-Host
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "!.Contains(), single-pass"
    Timing = (Measure-Command {
        $a1 | Where-Object { !($a2 + $a3).Contains($_) } |
        Out-Host; '---' | Out-Host
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "-notcontains, two-pass"
    Timing = (Measure-Command {
        $a2AndA3 = $a2 + $a3
        $a1 | Where-Object { $a2AndA3 -notcontains $_ } |
        Out-Host; '---' | Out-Host    
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "-notcontains, two-pass, explicitly sorted first"
    Timing = (Measure-Command {
        $a2AndA3 = $a2 + $a3 | Sort-Object
        $a1 | Sort-Object | Where-Object { $a2AndA3 -notcontains $_ } |
        Out-Host; '---' | Out-Host    
    }).TotalSeconds
  },
  [pscustomobject] @{
    TestCategory = $passDescr
    Test = "-notcontains, single-pass"
    Timing = (Measure-Command {
        $a1 | Where-Object { ($a2 + $a3) -notcontains $_ } |
        Out-Host; '---' | Out-Host    
    }).TotalSeconds
  } 
}) |
  Group-Object TestCategory | ForEach-Object {
    "`n=========== $($_.Name)`n"
    $_.Group | Sort-Object Timing | Select-Object Test, @{ l='Timing'; e={ '{0:N3}' -f $_.Timing } }
  }

Sample output from my machine (output of missing array elements omitted):

=========== SORTED input


Test                                            Timing
----                                            ------
CompareObject                                   0.068
CompareObject, explicitly sorted first          0.187
!.Contains(), two-pass                          0.548
-notcontains, two-pass                          6.186
-notcontains, two-pass, explicitly sorted first 6.972
!.Contains(), two-pass, explicitly sorted first 12.137
!.Contains(), single-pass                       13.354
-notcontains, single-pass                       18.379

=========== UNSORTED input

CompareObject, explicitly sorted first          0.198
CompareObject                                   6.617
-notcontains, two-pass                          6.927
-notcontains, two-pass, explicitly sorted first 7.142
!.Contains(), two-pass                          12.263
!.Contains(), two-pass, explicitly sorted first 12.641
-notcontains, single-pass                       19.273
!.Contains(), single-pass                       25.174
  • While timings will vary based on many factors, you can get a sense that Compare-Object scales much better, if the input is either pre-sorted or sorted on demand, and the performance gap widens with increasing element count.

  • When not using Compare-Object, performance can be somewhat increased - but not being able to take advantage of sorting is the fundamentally limiting factor:

    • Neither -notcontains / -contains nor .Contains() can take full advantage of presorted input.

    • If the input is already sorted: Using the .Contains() IList interface .NET method rather than the PowerShell -contains / -notcontains operators (which an earlier version of k7s5a's answer used) improves performance.

    • Joining arrays $a2 and $a3 once, up front, and then using the joined array in the pipeline improves performance (that way, the arrays don't have to be joined in every iteration).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.