4

My question is very similar to this, but I have thousands of images on disk and I want to fast read their width and height in pixels without loading each file in memory.

On my Linux machine, I could do something like this for each file:

path_to_file <- 'img/13600061.jpg'
system(sprintf("file %s", path_to_file), intern = TRUE)

But the output of file can differ for jpg, jpeg and png files and then I need to catch the pixel info differently depending on the file. I was wondering if there is a general fast solution out there already.

3
  • Do you know anything more about the images? Are they all JPEG or PNG or TIF? DO you have multiple CPU cores available? What OS are you running? Do you have an NVME SSD? Commented Nov 5, 2021 at 21:27
  • the metadata should exist and be readable in O(1), if you use the right library. the worst possible solution, that will work, would be to run a "Image Magick" subprocess, catch stdout and parse it. and that's not so bad. Commented Nov 5, 2021 at 22:59
  • Hi @MarkSetchell, thanks for the questions. I use an Ubuntu 20.04.3 LTS, but I might need to run this on a Windows 10 machine as well. I do have multiple CPUs available, but hopefully, the task should not get that heavy. There are thousands of these images, stored in folders of around 2000 img each. Their extensions can be .jpg, .JPG, .jpeg, .png (I guess the .JPG is treated the same as .jpg?) I have a SSD, but a lot of these images are also stored remotely on a netwrok drive. Commented Nov 6, 2021 at 11:32

1 Answer 1

3

I think exiftool fits the bill nicely here. It runs on all platforms, is very controllable and crucially, it can recurse on its own so it doesn't incur the overhead of being started once per file.

As a rough first attempt, you'd want something like this if processing PNGs and JPEGs and recursing down starting at current directory, i.e. .

exiftool -csv -ImageHeight -ImageWidth -r -ext jpg -ext jpeg -ext png .

Sample Output

black.png,80,80
blue.png,80,80
c.jpg,1,1
deskew/skew40.png,800,800
deskew/gradient.png,800,800

You may want to add -q to exclude the summary if you are parsing the output.

As a rough guide, the above command runs in 9 seconds on a directory containing 10,000 images on my Mac.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for suggesting this tool. I would like to get the output back to R, and I am not that familiar with all the output arguments of exiftool. Something like this test <- system(paste("exiftool -q -csv -ImageHeight -ImageWidth", dir_path_img), intern = TRUE) returns a character vector with the first element containing the header. I think I can work with that, but if you have a better suggestion, feel free to add it.
Glad it is looking hopeful for you. I don't know R, so we are separated by a bit of an ocean! I presume R can parse a CSV though, so maybe you could do system("exiftool -ImageHeight ... > FileThatRcanParse.csv" which would require you to enable the shell option if there is one on system() in R.
Thanks, it seems I can do something like system2(command = "exiftool", args = c("-q", "-csv", "-ImageHeight", "-ImageWidth", dir_path_img), stdout = "test.csv") and then read directly that csv file. I put this option here as a comment in case is useful for some other R user. dir_path_img is a character vector of length 1 with the path to the directory containing the images, something like dir_path_img <- "data/img-types".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.