how to remove duplicate rows using perl script [duplicate]

Question

I'm here to know how to remove duplicate lines

use strict;
use warnings;

my $input = input.txt;
my $output = output.txt;
my %seen;

open("OP",">$output") or die;
open("IP","<$input") or die;

while(my $DATA = <IP>)
{
  $DATA =~ tr/|/-/;
  my @lines = split("-",$DATA);
  chomp @lines;
  my @contries = grep { !$seen{$_}++ } @lines;
  my $original = join("|",@contries);
  print "$original\n";
}

close("IP");
close("OP");

input:

india|india|india|group|group|status
india|india|india|group|group|status
australia|australia|australia|group|group|status
america|america|america|group|group|status
singapore|singapore|singapore|group|group|status
india|india|india|group|group|status
america|america|america|group|group|status

Expected Output:

india|india|india|group|group|status
australia|australia|australia|group|group|status
america|america|america|group|group|status
singapore|singapore|singapore|group|group|status

when I run the above code I'm getting output like

   india|group|status
    
   australia
   america
   singapore

   status

I don't know why I'm getting empty line in second row

Your script doesn't gives the result whatever you have mentioned in the question. — vkk05
– vkk05, Commented Jun 23, 2021 at 7:32
Edit your question with what result you're getting currently. — vkk05
– vkk05, Commented Jun 23, 2021 at 7:41
@Noor Even with your last edit, your code doesn't run, and doesn't produce the output you say it does. Please run your code before posting it. For instance, you are missing quotes around input.txt and output.txt. And even once the quotes are added, the output of the script is not what you say it is. Fix you code, run it locally to make sure it behaves as you said it does, and then edit your question accordingly. — Dada
– Dada, Commented Jun 23, 2021 at 7:46
@Noor Nope, your script is still invalid, and even when fixed, the output isn't what you say it is. — Dada
– Dada, Commented Jun 23, 2021 at 7:52

Dave Cross · Accepted Answer · 2021-06-23 07:53:25Z

Running your code gives the following output:

Bareword "input" not allowed while "strict subs" in use at dup line 4.
Bareword "txt" not allowed while "strict subs" in use at dup line 4.
Bareword "txt" not allowed while "strict subs" in use at dup line 5.
Bareword "output" not allowed while "strict subs" in use at dup line 5.
Execution of dup aborted due to compilation errors.

That's because of these two lines:

my $input = input.txt;
my $output = output.txt;

They should be:

my $input = 'input.txt';
my $output = 'output.txt';

This makes me think that this isn't the code that you're actually running. You have retyped it into the question input box here and have made mistakes when retyping the code. This makes it hard to help you as we can't be sure whether any errors we correct are down to the real problems you want to solve or just typos in your code.

Please cut and paste code into questions - so you know it is an accurate representation of what you're doing. And please test the code before posting so you know it actually behaves how you say it does.

Having corrected the missing quotation marks and re-run the program, I now get this output:

india|group|status

australia
america
singapore

And (at least when I started typing this answer) that's nothing like the output you claim you are getting.

Please be more careful when asking for help here. The people here are very happy to help you, but you need to be precise in what you are asking them. A lot of people will read questions that you post here and if your questions aren't asked accurately, you are wasting a lot of people's time.

The problem here seems to be that you want to get a list of unique lines, but you're overcomplicating matters by splitting the lines into separate fields. That's completely unnecessary.

I think that the important part of your code can be simplified to this:

while (<IP>) {
  print unless $seen{$_}++;
}

Or even this:

print grep { ! $seen{$_}++ } <IP>;

Both of those options produce this output:

india|india|india|group|group|status
australia|australia|australia|group|group|status
america|america|america|group|group|status
singapore|singapore|singapore|group|group|status

vkk05 · Accepted Answer · 2021-06-23 08:11:31Z

0

You don't have to split and join the content in each line. Let's make it simple.

Here is the updated script.

use strict; use warnings;

my %seen;

my @lines = <DATA>;
chomp @lines;

my @contries = grep { !$seen{$_}++ } @lines;

foreach (@contries){
    print "$_\n";
}

__DATA__
india|india|india|group|group|status
india|india|india|group|group|status
australia|australia|australia|group|group|status
america|america|america|group|group|status
singapore|singapore|singapore|group|group|status
india|india|india|group|group|status
america|america|america|group|group|status

Result:

india|india|india|group|group|status
australia|australia|australia|group|group|status
america|america|america|group|group|status
singapore|singapore|singapore|group|group|status

Chomp'ing(chomp @lines;) a line is optional if you're just printing the content without \n.

edited Jun 23, 2021 at 8:11

answered Jun 23, 2021 at 7:57

vkk05

3,23215 silver badges41 bronze badges

4 Comments

Dave Cross Over a year ago

Why go to the effort of chomping the input, only to add the newline back when printing the results?

vkk05 Over a year ago

@DaveCross: Right. While printing I have habit of printing it in newline. That's the reason I added chomp.

Polar Bear Over a year ago

You could use print for grep{ !$seen{$_}++ } <DATA>; instead.

Dave Cross Over a year ago

@PolarBear: Or even omit the for.

Collectives™ on Stack Overflow

how to remove duplicate rows using perl script [duplicate]

2 Answers 2

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Linked

Related