Using Raku (formerly known as Perl_6)
~$ raku -ne 'BEGIN my %seen;
if .chars && /\.pdf/ { $_.subst(/ <?after \.pdf> \: \d+ $ /).IO.basename andthen %seen{$_}++ };
END .say for %seen.sort: +*.key.match(/^ \d+ <?before \.pdf>/);' file
OR (more simply):
~$ raku -ne 'BEGIN my %seen;
if .chars && s/ <?after \.pdf> \: \d+ $ // { %seen{$_.IO.basename}++ };
END .say for %seen.sort: +*.key.match(/^ \d+ <?before \.pdf>/);' file
OR (even more simply):
~$ raku -ne 'BEGIN my %seen;
if .chars && s/ <?after \.pdf> \: \d+ $ // { %seen{$_.IO.basename}++ };
END .say for %seen.sort: +*.key.IO.extension: "";' file
Raku is a programming language in the Perl family. Using Raku's awk-like -ne non-autoprinting commandline flags, you can obtain a sorted hash of key/value pairs where the value per PDF file name equals the count of the number of times that filename was seen. Output is sorted numerically according to filename (as a number). Using .say in the END block will give you paired "key => value" output:
Sample Input:
Category1:
./Folder1/Folder2/1.pdf:18
./Folder3/2.pdf:18
./Folder5/4.pdf:10
Category2:
./Folder3/2.pdf:18
./Folder5/4.pdf:10
Category3:
./Folder1/Folder2/1.pdf:18
./Folder5/4.pdf:10
Category4:
./Folder6/7.pdf:10
./Folder5/4.pdf:10
./Folder3/2.pdf:18
Sample Output:
1.pdf => 2
2.pdf => 3
4.pdf => 4
7.pdf => 1
If you need paired paired "key : value" output, change the END block of your code to:
END put($_.key, " : ", $_.value) for %hash.sort: +*.key.IO.extension: "";'
If you need to eliminate pairs with .value < 2, further change the END block of your code to:
END put($_.key, " : ", $_.value if .value > 2 ) for %hash.sort: +*.key.IO.extension: "";'
Finally, if you prefer code written in a "chained" method/function call -style, the code below gives the same (desired) code as that above:
~$ raku -e 'my %seen = lines.grep( *.chars > 0 && / \.pdf /) \
.map( *.subst(/ <?after \.pdf> \: \d+ $ / ).IO.basename).Bag; \
for %seen.sort( +*.key.IO.extension: "") {
put $_.key ~" : "~ $_.value if .value > 2 };' file
https://raku.org
1.pdfin Folder_2 might be a different file from file1.pdfin Folder_3. Thx.