0

I have tried several different variations based on some other stack overflow articles, but I will share a sample of what I have and a sample output and then some cobbled-together code hoping for some direction from the community:

C:\Scripts\contacts.csv:

id,first_name,last_name,email
1,john,smith,[email protected]
1,jane,smith,[email protected]
2,jane,smith,[email protected]
2,john,smith,[email protected]
3,sam,jones,[email protected]
3,sandy,jones,[email protected]

Need to turn this into a file where column "email" is unique to column "id". In other words there can be duplicate addresses, but only if there is a different id.

desired output C:\Scripts\contacts-trimmed.csv:

id,first_name,last_name,email
1,john,smith,[email protected]
2,john,smith,[email protected]
3,sam,jones,[email protected]
3,sandy,jones,[email protected]

I have tried this with a few different variations:

Import-Csv C:\Scripts\contacts.csv | sort first_name | Sort-Object -Property id,email -Unique | Export-Csv C:\Scripts\contacts-trim.csv -NoTypeInformation

Any help or direction would be most appreciated

4
  • What are the rules for discarding duplicates? I. e. why isn't the 2nd row of desired output 2,jane,smith,[email protected]? Commented Feb 8, 2021 at 20:35
  • the email address is the same even though the name is different. Basically, there can be multiple id's and multiple emails, but no duplicates of email for each id. So the group of id and email must be unique. Commented Feb 8, 2021 at 21:43
  • When going through the records one by one, I understand that you keep first record and discard second, because same ID and same email. Taking 3rd record, there is a new ID, so shouldn't 3rd record be kept and 4th one discarded? Commented Feb 8, 2021 at 21:49
  • Without getting too in depth... There can only be one user with an email address and the id is a student id. Many of our parents have multiple students and we can only have one parent with an email, but in many situations the parents both use the same email. We have to eliminate one or the other, but can't keep both so I have to sort by first name so that when it eliminates duplicates; it completely eliminates one of the parents and keeps the other if they are assigned to multiple students. I hope this makes sense. Commented Feb 9, 2021 at 13:05

1 Answer 1

1

You'll want to use the Group-Object cmdlet, to, well, group together records with similar values:

$records = @'
id,first_name,last_name,email
1,john,smith,[email protected]
1,jane,smith,[email protected]
2,jane,smith,[email protected]
2,john,smith,[email protected]
3,sam,jones,[email protected]
3,sandy,jones,[email protected]
'@ |ConvertFrom-Csv

# group records based on id and email column
$records |Group-Object id,email |ForEach-Object {
  # grab only the first record from each group
  $_.Group |Select-Object -First 1
} |Export-Csv .\no_duplicates.csv -NoTypeInformation
Sign up to request clarification or add additional context in comments.

7 Comments

Btw, this produces the same output as $records | Sort-Object id, email -Unique. It doesn't match OPs "desired" output though...
unless I am missing something the above answer seemed to work. I am checking through the output now.
It produces 2,jane... as the 2nd line, while in OPs "desired" output it is 2,john.... I think your output is correct though and OPs "desired" output isn't (unless I'm missing something ;-)).
@Andy if you need to explicitly sort the individual groups, you can always change the inner pipeline to $_.Group |Sort-Object Name -Descending |Select-Object -First1 for example
@MathiasR.Jessen thank you for your help. your code pointed me in the right direction. I'm new to powershell and was stumped :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.