3

I am wondering what the most efficient methods for reading/writing image and PDF files as numpy arrays for processing.

So far I have seen scipy.ndimage.imread and using PIL and numpy, which yeild the following results:

import os
import glob
from scipy.ndimage import imread
from PIL import Image
import numpy as np
import timeit
iters = 2
def scipy_fun():
    for x in glob.glob("*.jpg"):
        px = imread(x)
def PIL_fun():
    for x in glob.glob("*.jpg"):
        with Image.open(x) as im:
            px = np.array(im)

print(timeit.Timer(scipy_fun).timeit(number=iters))
print(timeit.Timer(PIL_fun).timeit(number=iters))

running the script shows similar results with marginally better from scipy:

2.8794324089019234
3.0174482765699095

Are there any faster ways to do this?

2
  • 2
    Another library you could try is imageio. Commented Nov 3, 2016 at 7:12
  • Thanks! imageio appears to be a very good library. The output was scipy: ~4 secs imageio: ~3 secs PIL: ~4 secs Commented Nov 3, 2016 at 15:03

1 Answer 1

1

First, do this

pip install pdf2image

Then,

import numpy as np
from pdf2image import convert_from_path as read
import PIL
import cv2
#pdf in the form of numpy array to play around with in OpenCV or PIL
img = np.asarray(read('path to the pdf file')[0])#first page of pdf
Sign up to request clarification or add additional context in comments.

1 Comment

This approach is hell a lot smoother and more efficient than first saving the img and then using opencv to read it, so well done Ali and Santhosh!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.