To answer your immediate question, you're tripping over the default behavior of Perl's system operator. Usually, it's a great convenience for the shell to parse the command, but sometimes, as as you've seen, having multiple levels of encoding is a pain—or even a security vulnerability.
You can bypass the shell's quoting entirely with the system LIST and exec LIST forms. In your case, change your code to
#! /usr/bin/env perl
use strict;
use warnings;
my @cmd = (
"awk",
"-F", "\t",
'{ for ( i=1; i<=2; i++ ) {
printf "%s\t", $i
}
printf "\n";
}',
"myfile", "file2",
);
system(@cmd) == 0 or warn "$0: awk exited " . ($? >> 8);
You don't have to use the temporary array, but I don't like the resulting code with a multi-line command and a check for success.
Given myfile containing
1 2 3 4
foo bar baz
oui oui monsieur
and file2 with
a b c
d e f g
(where the separators in both cases are TAB characters), then the output is
1 2
foo bar
oui oui
a b
d e
They're invisible, but each line of output above has a trailing TAB.
Doing the same in Perl is straightforward. For example,
sub print_first_two_columns {
foreach my $path (@_) {
open my $fh, "<", $path or die "$0: open $path: $!";
while (<$fh>) {
chomp;
my(@cols) = (split /\t/)[0 .. 1];
print join("\t", @cols), "\n";
}
close $fh;
}
}
The part that may not be obvious is taking a slice of the values returned from split, but what's happening is simple in concept. A slice allows you to grab data at multiple indices (0 and 1 in this case, i.e., the first and second columns). The range-operator expression 0 .. 1 evaluates to the list 0 and 1. If you decide later you want the first four columns, you'd change it to 0 .. 3.
Call the sub above as in
print_first_two_columns "myfile", "file2";
Note that the code isn't exactly equivalent: it doesn't preserve the trailing TAB characters.
From the command line, it's even simpler:
$ perl -lane '$,="\t"; print @F[0,1]' myfile file2
1 2
foo bar
oui oui
a b
d e