Segmenting text from images

Question

I want to extract certain type of text from images of ID cards:

As you can see, they have various lighting and sharpness conditions. Ultimate goal is to recognize the black texts. If they're well separated, I've managed to do it well with Tesseract OCR (this is VIE language by the way, in case you'd like to try it yourself with Tesseract). However, in the above examples, there are overlapped of the black texts and the blue texts, which confused Tesseract. So my current goal is to cleanly remove them, while not heavily distort the black blurry pixels so that Tesseract still works.

What are the most robust ways to do this? (Code examples in Python would be appreciated if possible.)

the most robust way is to get proper images. anything else is guess work. if you want to read texts on ID cards, have them presented to your camera in a repeatable way like placed against a glass plate for example. — Piglet
– Piglet, Commented Apr 20, 2020 at 17:05

Cerovec · Accepted Answer · 2020-04-20 17:42:32Z

1

You can try image segmentation using the color. If the color of a pixel is in the RGB area close to (0, 0, 0), then this pixel is likely a candidate to be a part of the relevant black text.

Another approach would be to check the Chrominance component of each pixel. The assumption is that black text has lower Chrominance and that this is the relevant piece of the picture.

The idea is to figure out parts of the image where likely candidates for relevant text are present, and then just white out whatever's not relevant.

However, these are quick and dirty solutions and they struggle when ID cards are photographed in different lighting situations, or if they are damaged, or if the devices used to capture photos have a wide range of cameras. or if there are slight variations in types of ID cards. We've worked on this problem quite a lot, specifically on ID documents. Eventually, our solution was to use machine learning on a large number of generated images and train the ML models to return just the relevant text from ID cards. It required a huge amount of work, but it paid off as we now have very reliable data extraction, and that includes IDs from Vietnam.

Disclaimer: I'm working at Microblink, where we develop commercial OCR products, one of them being for ID scanning.

answered Apr 20, 2020 at 17:42

Cerovec

1,32310 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pckben Over a year ago

Thanks, I've thought of generating the cards and add artificial noise and various types of distortion, and then train an end-to-end deep learning model on them, but that does sound like significant more work that I'm trying to avoid for now. I haven't tried YUV colorspace, would they be more robust in this case compared to HSV? I'll read more on it. Your product looks awesome! Do you have any information on sub-licensing? Would love to connect and discuss more over private messages. Thanks!

Cerovec Over a year ago

Both HSV and YUV are probably better than RGB colorspace. I would suggest using YUV for this particular case, because text is black (Y, Cr, and Cb are small), and background is light and colorful (Y and Cb are large). In HSV, H and S would depend a lot on lighting conditions. Feel free to contact us here: microblink.com/contact-us.

Collectives™ on Stack Overflow

Segmenting text from images

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related