Is there any way, in Python, of automatically detect the colors in a certain area of a PDF and either translate them to RGB or compare them to the legend and then get the color?
2 Answers
Felipe's approach didn't work for me, but I came up with this:
#!/usr/bin/env python
# -*- Encoding: UTF-8 -*-
import minecart
colors = set()
with open("file.pdf", "rb") as file:
document = minecart.Document(file)
page = document.get_page(0)
for shape in page.shapes:
if shape.fill:
colors.add(shape.fill.color.as_rgb())
for color in colors: print color
This will print a neat list of all unique RGB values in the first page of your document (you could extend it to all pages, of course).
Comments
Depending on where you want to extract the information from, you can use minecart. It has really robust support for colors and allows easy conversion to RGB. Though you can't input a coordinate and get the color value there, if you are trying to get color information from a shape you could do something like the following:
import minecart
doc = minecart.Document(open("my-doc.pdf", "rb"))
page = doc.get_page(0)
BOX = (.5 * 72, # left bounding box edge
9 * 72, # bottom bounding box edge
1 * 72, # right bounding box edge
10 * 72) # top bounding box edge
for shape in page.shapes:
if shape.check_in_bbox(BOX):
r, g, b = shape.fill.color.as_rgb()
# do stuff with r, g, b
[Disclaimer: I'm the author of minecart]