0

I've looked everywhere and I'm out of luck.

I am trying to count the files in my current directory and all sub directories so that when I run the shell script count_files.sh it will produce a similar output to: $

2 sh
4 html
1 css
2 noexts

(EDIT the above output should have each count and extension on a newline)

$ where noexts are either files without any period as an extension (ex: fileName ) or files with a period but no extension (ex: fileName. ).

this pipeline:

find * | awf -F . '{print $NF}'

gives me a comprehensive list of all the files, and I've figured out how to remove files without any period (ex: fileName ) using sed '/\//d'

MY ISSUE is that I cannot remove the files from the output of the above pipeline that are separated by a period but have NULL after the period (ex: fileName. ), as it is separated by the delimiter '.'

How can I use sed like above to remove a null character from a pipe input?

I understand this could be a quick fix, but I've been googling like a madman with no luck. Thanks in advance.

Chip

1
  • Quick fix! Thank you Cyrus Commented Jan 16, 2015 at 17:05

1 Answer 1

1

To filter filenames that end with ., since filenames are the whole input line in find's output, you could use

sed '/\.$/d'

Where \. matches a literal dot and $ matches the end of the line.

However, I think I'd do the whole thing in awk. Since sorting does not appear to be necessary:

EDIT: Found a nicer way to do it with awk and find's -printf action.

find . -type f -printf '%f\n' | awk -F. '!/\./ || $NF == "" { ++count["noext"]; next } { ++count[$NF] } END { for(k in count) { print k " " count[k] } }'

Here we pass -printf '%f\n' to find to make it print only the file name without the preceding directory, which makes it much easier to work with for our purposes -- this way there's no need to worry about periods in directory names (such as /etc/somethingorother.d). The field separator is '.', the awk code is

!/\./ || $NF == "" {        # if the line (the filename) does not contain
                            # a period or there's nothing after the last .
  ++count["noext"]          # increment the "noext" counter
                            # note that this will be collated with files that
                            # have ".noext" as filename extension. see below.
  next                      # go to the next line
}
{                           # in all other lines
  ++count[$NF]              # increment the counter for the file extension
}
END {                       # in the very end:
  for(k in count) {         # print the counters.
    print count[k] " " k
  }
}

Note that this way, if there is a file "foo.noext", it will be counted among the files without a filename extension. If this is a worry, use a special counter for files without an extension -- either apart from the array or with a key that cannot be a filename extension (such as one that includes a . or the empty string).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.