count using awk commands

Question

I have fileA.txt and a few lines of it are shown below:

AA
BB
CC
DD  
EE

And i have fileB.txt, and it has text like shown below:

Group  col2   col3    col4
1    pp    4567    AA,BC,AB
1    qp    3428    AA
2    pp    3892    AA
3    ee    28399   AA
4    dd    3829    BB,CC
1    dd    27819   BB
5    ak    29938   CC

For every line in fileA.txt, it should count the number of times it is present in fileB.txt based on column1 in fileB.txt.

Sample output should look like:

AA    3
BB    2
CC    2

AA is present 4 times but it is present in the group "1" twice. If it is present more than once in the same group in column1,it should be counted only once and therefore in the above output AA count is 3.

Any help using awk or any other oneliners?

"Oneliners"? Is the goal terseness? Correctness? Performance? — Charles Duffy
– Charles Duffy, Commented Mar 3, 2014 at 20:43
...the short form is that many, if not most, of the times I see folks using one-liners, they're doing so for bad reasons. Better to write code that's easy to read, understand and modify, even if it's a bit longer. — Charles Duffy
– Charles Duffy, Commented Mar 3, 2014 at 20:52

jaypal singh · Accepted Answer · 2014-03-03 21:32:11Z

1

Here is an awk one-liner that should work:

awk '
NR==FNR && !seen[$4,$1]++{count[$4]++;next}
($1 in count){print $1,count[$1]}' fileB.txt fileA.txt

Explaination:

NR==FNR&&!seen[$4,$1]++ pattern is only true when Column 1 has not been captured at all. For all duplicate captures we dont increment the counter.
$1 in count looks for first file column 1 presence in array. If it is present, we print along with counts.

Output:

$ awk 'NR==FNR && !seen[$4,$1]++{count[$4]++;next}($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
AA 3
BB 2
CC 1

Update based on the modified question:

awk '
NR==FNR {
  n = split($4,tmp,/,/);
  for(x = 1; x <= n; x++) {
    if(!seen[$1,tmp[x]]++) {
      count[tmp[x]]++
      }
    }
  next
}
($1 in count) {
    print $1, count[$1]
}' fileB.txt fileA.txt

Outputs:

AA 3
BB 2
CC 2

edited Mar 3, 2014 at 21:32

answered Mar 3, 2014 at 21:00

jaypal singh

77.6k24 gold badges108 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

chas Over a year ago

could you look at the initial question. I have modified the column4 in fileB.txt.

chas Over a year ago

could u let me know about $1 corresponds to which file? since i have to adapt it to my original file column numbers. If im right, $1 in the if loop corresponds to fileB and $1 outside loop corresponds to fileA?

jaypal singh Over a year ago

@user1779730 Yes, thats correct. For NR==FNR section $1 corresponds to fileB and for ($1 in count) section it corresponds to fileA

Charles Duffy · Accepted Answer · 2014-03-03 20:45:56Z

0

Pure bash (4.0 or newer):

#!/bin/bash

declare -A items=()

# read in the list of items to track
while read -r; do items[$REPLY]=0; done <fileA.txt

# read fourth column from fileB and increment for each match
while read -r _ _ _ item _; do
  [[ ${items[$item]} ]] || continue    # skip unrecognized values
  items[$item]=$(( items[$item] + 1 )) # otherwise, increment
done <fileB.txt

# print output
for key in "${!items[@]}"; do          # iterate over keys
  value="${items[$key]}"               # look up values
  printf '%s\t%s\n' "$key" "$value"    # print them together
done

answered Mar 3, 2014 at 20:45

Charles Duffy

300k43 gold badges442 silver badges498 bronze badges

Comments

Kevin · Accepted Answer · 2014-03-03 21:02:15Z

0

A simple awk one-liner.

awk 'NR>FNR{if($0 in a)print$0,a[$0];next}!a[$4,$1]++{a[$4]++}' fileB.txt fileA.txt

Note the order of files.

answered Mar 3, 2014 at 21:02

Kevin

56.6k15 gold badges107 silver badges139 bronze badges

2 Comments

chas Over a year ago

If i understand correctly, $0 refers to the column in fileA.txt and $4,$1 refers to the columns in fileB.txt. Am i right?

chas Over a year ago

could you look at the initial question. I have modified the column4 in fileB.txt.

Collectives™ on Stack Overflow

count using awk commands

3 Answers 3

Update based on the modified question:

3 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Update based on the modified question:

3 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related