Download list of directory/file names only from web server index

Question

I'm trying to curl/wget a list of directories/files names available in a directory listing of a webserver.

For example from (randomly chosen) http://prodata.swmed.edu/download/, I'm trying to download:

bin
dev
etc
member
pub
usr
usr1
usr2

cUrl (curl http://prodata.swmed.edu/download/) gets me the whole HTML page, which I'd need to parse manually for all file/directory entries.

Is there a way to download the names of the available files/directories only, with curl/wget, without installing additional parser?

binarysta · Accepted Answer · 2020-06-06 14:00:45Z

6

HTTP protocol has no feature to request a "list of files" from an HTTP server.

curl / wget/ browser requests a URL, which contains an arbitrary request string and the server sends back some arbitrary data.

However you can extract the names with following commands

curl --silent http://prodata.swmed.edu/download/ | grep -o 'href=".*">' | sed 's/href="//;s/\/">//'  

bin
dev
etc
member
pub
usr
usr1
usr2

answered Jun 6, 2020 at 14:00

binarysta

3,33715 silver badges15 bronze badges

GTXBxaKgCANmT9D9 · Accepted Answer · 2022-01-28 15:16:57Z

2

curl -s http://example.com/files/ | grep -o 'href=".*">' | sed -e "s/href=\"//g" | sed -e "s/\"\>//g"

Gives me an experience like ls in a directory

answered Jan 28, 2022 at 15:16

GTXBxaKgCANmT9D9

1212 bronze badges

2 Answers 2