Return to Answer

refactoring

Source Link

edited May 1, 2020 at 2:51

26.3k
1
27
64

I want to read file1 and, if the energy value of a row in file1 exists in file2, I should print the previous line.

awk '
  NR==FNR{ if (FNR>1) a[$3];a[$1]=$3; next } # file2: save ID,Energy value ofin file2array `a`
  $3$1 in a { print prev }            # if field3 of          # file1: existsif inID a,is printpresent previousin linearray
     if (a[$1] != $3){                # if it is not the same Energy value...
       prev=$0 }                       # save previous line
' file2 file1 > file3 }

Output (3rd and 5th line are different from yours):

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
48 4.77674676992522297732519e-05else 3.360e-01
203{ 2.17111189970080955017814e-04 1.685e-01
203 2.17111190317825032474949e-04 3.425e-01
245 2.54300542217183317417195e-04 0                     # it is the same Energy value.000e+00..

Then using sort | uniq to remove the repeated lines with same id.

The same awk script piped to sort -unk1,1 to remove duplicate IDs:

awk '
  NR==FNR{ if   print (FNR>1prev=="") a[$3]; next? }
$0 : $3prev in a {# print prevprevious }
line if {saved prev=$0or }current line
' file2 file1 | sort -unk1,1 > file3prev=""                        # uniquereset numericprevious sortline
 on first field  }
  }
' file2 file1 > file3

Output:

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
52 4.99184267458611271553633e-05 3.110e-01
203 2.17111189970080955017814e-04 1.685e-01
206 2.17705422992319207490738e-04 3.197e-01
245 2.54300542217183317417195e-04 0.000e+00

I didn't understand your last sentence and ignored it. Let me know if that's not the expected output.

I want to read file1 and, if the energy value of a row in file1 exists in file2, I should print the previous line.

awk '
  NR==FNR{ if (FNR>1) a[$3]; next } # save Energy value of file2
  $3 in a { print prev }            # if field3 of file1 exists in a, print previous line
  { prev=$0 }                       # save previous line
' file2 file1 > file3

Output (3rd and 5th line are different from yours):

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
48 4.77674676992522297732519e-05 3.360e-01
203 2.17111189970080955017814e-04 1.685e-01
203 2.17111190317825032474949e-04 3.425e-01
245 2.54300542217183317417195e-04 0.000e+00

Then using sort | uniq to remove the repeated lines with same id.

The same awk script piped to sort -unk1,1 to remove duplicate IDs:

awk '
  NR==FNR{ if (FNR>1) a[$3]; next }
  $3 in a { print prev }
  { prev=$0 }
' file2 file1 | sort -unk1,1 > file3 # unique numeric sort on first field

Output:

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
203 2.17111189970080955017814e-04 1.685e-01
245 2.54300542217183317417195e-04 0.000e+00

I didn't understand your last sentence and ignored it. Let me know if that's not the expected output.

awk '
  NR==FNR{ if (FNR>1)a[$1]=$3; next } # file2: save ID,Energy value in array `a`
  $1 in a{                            # file1: if ID is present in array
     if (a[$1] != $3){                # if it is not the same Energy value...
       prev=$0                        # save previous line
     }
     else {                           # it is the same Energy value...
       print (prev=="") ? $0 : prev   # print previous line if saved or current line
       prev=""                        # reset previous line
     }
  }
' file2 file1 > file3

Output:

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
52 4.99184267458611271553633e-05 3.110e-01
203 2.17111189970080955017814e-04 1.685e-01
206 2.17705422992319207490738e-04 3.197e-01
245 2.54300542217183317417195e-04 0.000e+00

Source Link

answered Apr 30, 2020 at 15:36

Freddy

26.3k
1
27
64

I want to read file1 and, if the energy value of a row in file1 exists in file2, I should print the previous line.

awk '
  NR==FNR{ if (FNR>1) a[$3]; next } # save Energy value of file2
  $3 in a { print prev }            # if field3 of file1 exists in a, print previous line
  { prev=$0 }                       # save previous line
' file2 file1 > file3

Output (3rd and 5th line are different from yours):

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
48 4.77674676992522297732519e-05 3.360e-01
203 2.17111189970080955017814e-04 1.685e-01
203 2.17111190317825032474949e-04 3.425e-01
245 2.54300542217183317417195e-04 0.000e+00

Then using sort | uniq to remove the repeated lines with same id.

The same awk script piped to sort -unk1,1 to remove duplicate IDs:

awk '
  NR==FNR{ if (FNR>1) a[$3]; next }
  $3 in a { print prev }
  { prev=$0 }
' file2 file1 | sort -unk1,1 > file3 # unique numeric sort on first field

Output:

43 4.38665978376386365240533e-05 3.215e-02
48 4.77674337753321466689890e-05 1.750e-01
203 2.17111189970080955017814e-04 1.685e-01
245 2.54300542217183317417195e-04 0.000e+00

I didn't understand your last sentence and ignored it. Let me know if that's not the expected output.