Revisions to Delete duplicates from another directory recursively

added 207 characters in body

Source Link

edited Jan 11, 2021 at 15:43

356.2k
42
737
1.1k

or, as requested, on one line:

find to_keep -type f -exec sh -c 'for pathname do set -- "$@" -o -name "${pathname##*/}"; shift; done; shift; find to_purge \( "$@" \) -type f -print' sh {} +

The in-line script constructs an OR-list of -name tests for the find command that it uses on its last line. The loop constructs this list in the positional parameters from the filename component of each pathname that the outer find has passed to it.

or, as requested, on one line:

find to_keep -type f -exec sh -c 'for pathname do set -- "$@" -o -name "${pathname##*/}"; shift; done; shift; find to_purge \( "$@" \) -type f -print' sh {} +

The in-line script constructs an OR-list of -name tests for the find command that it uses on its last line. The loop constructs this list in the positional parameters from the filename component of each pathname that the outer find has passed to it.

deleted 8 characters in body

Source Link

edited Jan 9, 2021 at 12:49

Kusalananda ♦

356.2k
42
737
1.1k

find to_keep -type f -exec sh -c '
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
   done; shift
    find to_purge \( "$@" \) -type f -print' sh {} +

#!/bin/sh

keepdir=$1
purgedir=$2

find "$keepdir" -type f -exec sh -c '
    dir=$1; shift
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
   done; shift
    find "$dir" \( "$@" \) -type f -print' sh "$purgedir" {} +

for pathname do
    sane=$( printf "%s\n" "${pathname##*/}" | sed "s/[[*?]/\\&/g" )
    set -- "$@" -o -name "$sane"
    shift
done

for pathname do
    sane=$( printf "%s\n" "${pathname##*/}" | sed "s/[[*?]/\\&/g" )
    set -- "$@" -o -name "$sane"
    shift
done; shift

This modification to the loop in the in-line sh -c script escapes the [, * and ? characters (otherwise used as filename globbing patterns). The script would now not deal with filenames that end in a newline (due to the command substitution used), but that might arguably be something that one could live with.

find to_keep -type f -exec sh -c '
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find to_purge \( "$@" \) -type f -print' sh {} +

#!/bin/sh

keepdir=$1
purgedir=$2

find "$keepdir" -type f -exec sh -c '
    dir=$1; shift
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find "$dir" \( "$@" \) -type f -print' sh "$purgedir" {} +

for pathname do
    sane=$( printf "%s\n" "${pathname##*/}" | sed "s/[[*?]/\\&/g" )
    set -- "$@" -o -name "$sane"
    shift
done

This modification to the loop in the in-line sh -c script escapes the [, * and ? characters (otherwise used as filename globbing patterns). The script would now not deal with filenames that end in a newline, but that might arguably be something that one could live with.

find to_keep -type f -exec sh -c '
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done; shift
    find to_purge \( "$@" \) -type f -print' sh {} +

#!/bin/sh

keepdir=$1
purgedir=$2

find "$keepdir" -type f -exec sh -c '
    dir=$1; shift
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done; shift
    find "$dir" \( "$@" \) -type f -print' sh "$purgedir" {} +

for pathname do
    sane=$( printf "%s\n" "${pathname##*/}" | sed "s/[[*?]/\\&/g" )
    set -- "$@" -o -name "$sane"
    shift
done; shift

This modification to the loop in the in-line sh -c script escapes the [, * and ? characters (otherwise used as filename globbing patterns). The script would now not deal with filenames that end in a newline (due to the command substitution used), but that might arguably be something that one could live with.

added 383 characters in body

Source Link

edited Jan 9, 2021 at 12:44

Kusalananda ♦

356.2k
42
737
1.1k

The following will find all regular files in or under ./to_keep and will call an in-line sh -c script with these in batches. For each batch of pathnames, the in-line script will call find once to find the regular files under ./to_purge that have the same names. The pathnames of these files under ./to_purge will be printed (to delete them, add -delete after -print).

find to_keep -type f -exec sh -c '
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find to_purge \( "$@" \) -type f -print' sh {} +

The in-line script constructs an OR-list of -name tests for the find command that it uses on its last line. The loop constructs this list in the positional parameters from the filename component of each pathname that the outer find has passed to it.

This deals with all allowed filenames, including filenames containing spaces, tabs and newlines. Again, to delete files, add -delete (or -exec rm {} +) after -print in the code.

As a short script that takes the "keep directory" and "purge directory" as command line arguments:

#!/bin/sh

keepdir=$1
purgedir=$2

find "$keepdir" -type f -exec sh -c '
    dir=$1; shift
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find "$dir" \( "$@" \) -type f -print' sh "$purgedir" {} +

The only issue with this code is that it will use the names in one directory as patterns for finding the names of files in the other directory. This means that if a file in the first directory is called *, all files in the second directory are removed. You can fix that protecting the filenames in the inner find:

for pathname do
    sane=$( printf "%s\n" "${pathname##*/}" | sed "s/[[*?]/\\&/g" )
    set -- "$@" -o -name "$sane"
    shift
done

This modification to the loop in the in-line sh -c script escapes the [, * and ? characters (otherwise used as filename globbing patterns). The script would now not deal with filenames that end in a newline, but that might arguably be something that one could live with.

The following will find all regular files in or under ./to_keep and will call an in-line sh -c script with these in batches. For each batch of pathnames, the in-line script will call find once to find the regular files under ./to_purge that have the same names. The pathnames of these files under ./to_purge will be printed (to delete them, add -delete after -print).

find to_keep -type f -exec sh -c '
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find to_purge \( "$@" \) -type f -print' sh {} +

The in-line script constructs an OR-list of -name tests for the find command that it uses on its last line. The loop constructs this list in the positional parameters from the filename component of each pathname that the outer find has passed to it.

This deals with all allowed filenames, including filenames containing spaces, tabs and newlines. Again, to delete files, add -delete (or -exec rm {} +) after -print in the code.

As a short script that takes the "keep directory" and "purge directory" as command line arguments:

#!/bin/sh

keepdir=$1
purgedir=$2

find "$keepdir" -type f -exec sh -c '
    dir=$1; shift
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find "$dir" \( "$@" \) -type f -print' sh "$purgedir" {} +

The following will find all regular files in or under ./to_keep and will call an in-line sh -c script with these in batches. For each batch of pathnames, the in-line script will call find once to find the regular files under ./to_purge that have the same names. The pathnames of these files under ./to_purge will be printed (to delete them, add -delete after -print).

find to_keep -type f -exec sh -c '
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find to_purge \( "$@" \) -type f -print' sh {} +

The in-line script constructs an OR-list of -name tests for the find command that it uses on its last line. The loop constructs this list in the positional parameters from the filename component of each pathname that the outer find has passed to it.

This deals with all allowed filenames, including filenames containing spaces, tabs and newlines. Again, to delete files, add -delete (or -exec rm {} +) after -print in the code.

As a short script that takes the "keep directory" and "purge directory" as command line arguments:

#!/bin/sh

keepdir=$1
purgedir=$2

find "$keepdir" -type f -exec sh -c '
    dir=$1; shift
    for pathname do
        set -- "$@" -o -name "${pathname##*/}"
        shift
    done
    shift
    find "$dir" \( "$@" \) -type f -print' sh "$purgedir" {} +

The only issue with this code is that it will use the names in one directory as patterns for finding the names of files in the other directory. This means that if a file in the first directory is called *, all files in the second directory are removed. You can fix that protecting the filenames in the inner find:

for pathname do
    sane=$( printf "%s\n" "${pathname##*/}" | sed "s/[[*?]/\\&/g" )
    set -- "$@" -o -name "$sane"
    shift
done

This modification to the loop in the in-line sh -c script escapes the [, * and ? characters (otherwise used as filename globbing patterns). The script would now not deal with filenames that end in a newline, but that might arguably be something that one could live with.

added 383 characters in body

Source Link

edited Jan 9, 2021 at 12:38

Kusalananda ♦

356.2k
42
737
1.1k

Loading

Source Link

answered Jan 9, 2021 at 12:31

Kusalananda ♦

356.2k
42
737
1.1k

Loading

Stack Exchange Network

Return to Answer