How to loop through a directory recursively to delete files with certain extensions

Question

I need to loop through a directory recursively and remove all files with extension .pdf and .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.

My code so far

#/bin/sh

SEARCH_FOLDER="/tmp/*"

for f in $SEARCH_FOLDER
do
    if [ -d "$f" ]
    then
        for ff in $f/*
        do      
            echo "Processing $ff"
        done
    else
        echo "Processing file $f"
    fi
done

I need help to complete the code, since I'm not getting anywhere.

I know it's bad form to execute code without understanding it, but a lot of people come to this site to learn bash scripting. I got here by googling "bash scripting files recursively", and almost ran one of these answers (just to test the recursion) without realizing it would delete files. I know rm is a part of OP's code, but it's not actually relevant to the question asked. I think it'd be safer if answers were phrased using a harmless command like echo. — ki9
– ki9, Commented Apr 5, 2016 at 3:26
Similar question here: stackoverflow.com/questions/41799938/… — codeforester
– codeforester, Commented Jan 23, 2017 at 6:22
@Keith had similar experience, completely agree and changed the title — 463035818_is_not_an_ai
– 463035818_is_not_an_ai, Commented Jan 24, 2017 at 15:48
Warning for noobs like me, wasting hours: In most of the answers, you need to change where it says "/tmp/" directory you want to do it, example: "/home/my folder". — Santropedro
– Santropedro, Commented Sep 17, 2021 at 22:00

Gilles 'SO- stop being evil' · Accepted Answer · 2017-07-13 15:49:31Z

293

As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.

for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done

As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).

IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f

If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:

find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm

(The escaped brackets are required here to have the -print0 apply to both or clauses.)

GNU and *BSD find also has a -delete action, which would look like this:

find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete

edited Jul 13, 2017 at 15:49

Gilles 'SO- stop being evil'

109k38 gold badges217 silver badges263 bronze badges

answered Mar 9, 2011 at 15:21

James Scriven

8,2341 gold badge34 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

trev Over a year ago

This does not work as expected if there is a space in the file name (the for loop splits the results of find on whitespace).

Christian Over a year ago

How do you avaoid splitting on whitespace? I'm trying a similar thing and I have a lot of directories with whitespaces that screw up this loop.

zenperttu Over a year ago

because it's a very helpful answer?

gniourf_gniourf Over a year ago

@Matthew your edit didn't fix anything at all: it actually made the command only work if there's a unique found file. At least this version works if there are no spaces, tabs, etc. in filenames. I rolled back to the old version. Noting sensible can really fix a for f in $(find ...). Just don't use this method.

James Scriven Over a year ago

@DrewDormann my testing also shows that "$(find...)" makes things worse. I've undone your edit, along with making a long-overdue update of my own.

|

mouviciel · Accepted Answer · 2011-01-09 11:33:21Z

187

find is just made for that.

find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm

answered Jan 9, 2011 at 11:33

mouviciel

68k12 gold badges108 silver badges144 bronze badges

7 Comments

Matthew Flaschen Over a year ago

Or find's -delete option.

Grumbel Over a year ago

One should always use find ... -print0 | xargs -0 ..., not raw find | xargs to avoid problems with filenames containing newlines.

Gilles 'SO- stop being evil' Over a year ago

Using xargs with no options is almost always bad advice and this is no exception. Use find … -exec instead.

Gilles 'SO- stop being evil' Over a year ago

@CarlWinbäck Because the syntax of the input to xargs is not the syntax that find (or any other common command) prints. xargs expects a particular kind of quote-delimited input.

Sinjai Over a year ago

Using -delete -print with find will cause it to delete the files and print filenames as it does so. That's the behavior I was looking for so figured I'd post.

|

Gilles 'SO- stop being evil' · Accepted Answer · 2017-07-13 15:51:49Z

113

Without find:

for f in /tmp/* tmp/**/* ; do
  ...
done;

/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar). So for the question the code should look like this:

shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
  rm "$f"
done

Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar instead of shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.

edited Jul 13, 2017 at 15:51

Gilles 'SO- stop being evil'

109k38 gold badges217 silver badges263 bronze badges

answered Feb 26, 2013 at 11:54

Tomek

1,2701 gold badge8 silver badges6 bronze badges

8 Comments

ideasasylum Over a year ago

This method worked for me, even with filenames containing spaces on OSX

Troy Howard Over a year ago

Worth noting that globstar is only available in Bash 4.0 or newer.. which is not the default version on many machines.

phil294 Over a year ago

I dont think you need to specify the first argument. (At least as of today,) for f in /tmp/** will be enough. Includes the files from /tmp dir.

Ice-Blaze Over a year ago

Wouldn't it be better like this ? for f in /tmp/*.{pdf,doc} tmp/**/*.{,pdf,doc} ; do

tripleee Over a year ago

** is a nice extension but not portable to POSIX sh. (This question is tagged bash but it would be nice to point out that unlike several of the solutions here, this really is Bash-only. Or, well, it works in several other extended shells, too.)

|

falstro · Accepted Answer · 2015-11-13 14:59:26Z

38

If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).

recursiverm() {
  for d in *; do
    if [ -d "$d" ]; then
      (cd -- "$d" && recursiverm)
    fi
    rm -f *.pdf
    rm -f *.doc
  done
}

(cd /tmp; recursiverm)

That said, find is probably a better choice as has already been suggested.

edited Nov 13, 2015 at 14:59

answered Jan 9, 2011 at 11:35

falstro

35.9k11 gold badges76 silver badges88 bronze badges

Comments

danronmoon · Accepted Answer · 2024-01-28 18:58:14Z

22

Here is an example using shell (bash):

#!/bin/bash

# loop & print a folder recursively,
print_folder_recurse() {
    for i in "$1"/*;do
        if [ -d "$i" ];then
            echo "dir: $i"
            print_folder_recurse "$i"
        elif [ -f "$i" ]; then
            echo "file: $i"
        fi
    done
}


# try get path from param
path=""
if [ -d "$1" ]; then
    path=$1;
else
    path="/tmp"
fi

echo "base path: $path"
print_folder_recurse $path

edited Jan 28, 2024 at 18:58

danronmoon

3,8735 gold badges36 silver badges58 bronze badges

answered Mar 8, 2014 at 9:28

Eric

25.8k26 gold badges168 silver badges232 bronze badges

1 Comment

Mihir Over a year ago

Is it possible to achieve same without function?

Gilles 'SO- stop being evil' · Accepted Answer · 2017-07-13 15:57:53Z

19

This doesn't answer your question directly, but you can solve your problem with a one-liner:

find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +

Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:

find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete

edited Jul 13, 2017 at 15:57

Gilles 'SO- stop being evil'

109k38 gold badges217 silver badges263 bronze badges

answered Jan 9, 2011 at 11:32

Oliver Charlesworth

274k34 gold badges591 silver badges687 bronze badges

Comments

Gilles 'SO- stop being evil' · Accepted Answer · 2017-07-13 15:54:54Z

12

For bash (since version 4.0):

shopt -s globstar nullglob dotglob
echo **/*".ext"

That's all.
The trailing extension ".ext" there to select files (or dirs) with that extension.

Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).

Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.

edited Jul 13, 2017 at 15:54

Gilles 'SO- stop being evil'

109k38 gold badges217 silver badges263 bronze badges

answered Feb 19, 2016 at 9:48

user2350426

Comments

TJR · Accepted Answer · 2017-06-13 20:39:51Z

9

This method handles spaces well.

files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
  echo "$file"
done

Edit, fixes off-by-one

function count() {
    files="$(find -L "$1" -type f)";
    if [[ "$files" == "" ]]; then
        echo "No files";
        return 0;
    fi
    file_count=$(echo "$files" | wc -l)
    echo "Count: $file_count"
    echo "$files" | while read file; do
        echo "$file"
    done
}

edited Jun 13, 2017 at 20:39

answered Nov 9, 2012 at 4:09

TJR

3,7738 gold badges40 silver badges42 bronze badges

2 Comments

Lopa Over a year ago

I think "-n" flag after echo not needed. Just test it yourself: with "-n" your script gives wrong number of files. For exactly one file in directory it outputs "Count: 0"

Gilles 'SO- stop being evil' Over a year ago

This doesn't work with all file names: it fails with spaces at the end of the name, with file names containing newlines and with some file names containing backslashes. These defects could be fixed but the whole approach is needlessly complex so it isn't worth bothering.

ecotechie · Accepted Answer · 2020-02-05 01:40:04Z

2

This is the simplest way I know to do this: rm **/@(*.doc|*.pdf)

** makes this work recursively

@(*.doc|*.pdf) looks for a file ending in pdf OR doc

Easy to safely test by replacing rm with ls

answered Feb 5, 2020 at 1:40

ecotechie

1098 bronze badges

Comments

K_3 · Accepted Answer · 2016-10-08 18:09:03Z

1

The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else block.

function check {
        for file in $1/*      
        do
        if [ -d "$file" ]
        then
                check $file                          
        else
               ##check for the file
               if [ $(head -c 4 "$file") = "%PDF" ]; then
                         rm -r $file
               fi
        fi
        done     
}
domain=/home/ubuntu
check $domain

answered Oct 8, 2016 at 18:09

K_3

835 bronze badges

Comments

Zak · Accepted Answer · 2019-02-20 22:37:30Z

1

There is no reason to pipe the output of find into another utility. find has a -delete flag built into it.

find /tmp -name '*.pdf' -or -name '*.doc' -delete

answered Feb 20, 2019 at 22:37

Zak

12.7k21 gold badges66 silver badges110 bronze badges

Comments

TrevTheDev · Accepted Answer · 2020-04-13 07:20:06Z

0

The other answers provided will not include files or directories that start with a . the following worked for me:

#/bin/sh
getAll()
{
  local fl1="$1"/*;
  local fl2="$1"/.[!.]*; 
  local fl3="$1"/..?*;
  for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
    if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then 
      stat --printf="%F\0%n\0\n" -- "$inpath";
      if [ -d "$inpath" ]; then
        getAll "$inpath"
      #elif [ -f $inpath ]; then
      fi;
    fi;
  done;
}

edited Apr 13, 2020 at 7:20

answered Mar 25, 2019 at 7:13

TrevTheDev

2,8172 gold badges26 silver badges44 bronze badges

Comments

P Varga · Accepted Answer · 2023-07-24 11:55:43Z

0

Lots of answers here, but I was surprised that I couldn't find this very simple one:

rm -v **/*.pdf **/*.doc

Or add the -i option and rm will prompt you for each file.

Tested in fish, although it should work with most other shells, too.

Update: Also tested in zsh 5.9.

edited Jul 24, 2023 at 11:55

answered Jul 23, 2023 at 17:44

P Varga

20.4k14 gold badges77 silver badges118 bronze badges

Comments

Mostafa Wael · Accepted Answer · 2024-11-05 14:28:27Z

0

I think the most straightforward solution is to use recursion, in the following example, I have printed all the file names in the directory and its subdirectories.

You can modify it according to your needs.

#!/bin/bash    
printAll() {
    for i in "$1"/*;do # for all in the root 
        if [ -f "$i" ]; then # if a file exists
            echo "${i%/*}"  # Print filename only 
        elif [ -d "$i" ];then # if a directroy exists
            printAll "$i" # call printAll inside it (recursion)
        fi
    done 
}
printAll $1 # e.g.: ./printAll.sh .

OUTPUT:

> ./printAll.sh .
./demoDir/4
./demoDir/mo st/1
./demoDir/m2/1557/5
./demoDir/Me/nna/7
./TEST

It works fine with spaces as well!

Note: You can use echo $(basename "$i") # print the file name to print the file name without its path.

OR: Use echo ${i%/##*/}; # print the file name which runs extremely faster, without having to call the external basename.

edited Nov 5, 2024 at 14:28

answered Dec 21, 2021 at 17:32

Mostafa Wael

4,0561 gold badge28 silver badges32 bronze badges

2 Comments

Mihir Over a year ago

The parameter substitution to print file name is incorrect. (Not sure if I can correct in the answer).

Mostafa Wael Over a year ago

Thanks for your note, I corrected it ("$i" -> ${i%/*}). ${i} represents the entire path of the current file, and %/* removes everything from the end of the string up to, but not including, the last slash (/).

Veger · Accepted Answer · 2013-02-20 09:45:10Z

-2

Just do

find . -name '*.pdf'|xargs rm

edited Feb 20, 2013 at 9:45

Veger

38k11 gold badges110 silver badges118 bronze badges

answered Jan 9, 2011 at 11:32

Navi

8,7764 gold badges36 silver badges32 bronze badges

1 Comment

gniourf_gniourf Over a year ago

No, don't do this. This breaks if you have filenames with spaces or other funny symbols.

Amin NAIRI · Accepted Answer · 2020-01-27 13:16:37Z

-2

If you can change the shell used to run the command, you can use ZSH to do the job.

#!/usr/bin/zsh

for file in /tmp/**/*
do
    echo $file
done

This will recursively loop through all files/folders.

answered Jan 27, 2020 at 13:16

Amin NAIRI

2,52423 silver badges22 bronze badges

Collectives™ on Stack Overflow

How to loop through a directory recursively to delete files with certain extensions

16 Answers 16

9 Comments

7 Comments

8 Comments

Comments

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

9 Comments

7 Comments

8 Comments

Comments

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

Comments

Linked

Related