I’m working on a project where I have to detect objects in a PDF document. After detecting the objects, I need to read the text at this location since it will be used as the object's name.
I’ve managed to detect the objects, I’ve used OpenCV to preprocess the image and want to use Tesseract to read the text from the image.
I’ve used a high resolution image in order to improve Tesseracts accuracy.
I’ve tried using a whitelist, a wordlist and a pattern to further improve tesseracts accuracy. Also, I’ve been playing around with different page segmentation modes, like PSM_SINGLE_WORD and PSM_SINGLE_BLOCK.
Sometimes tesseract reads the text correctly, e.g. the first image returns “T2,T3\n” using PSM_SINGLE_WORD (not using PSM_SINGLE_BLOCK, this returns “12,13\n”). However, in most cases it doesn’t return the correct text.
Preprocessed images for reference:

1st:
Word: "T2,T3\n"
Block: "12,13\n"
Expected: “T2,T3\n”
2nd:
Word: "T2,T3,TAR3\n"
Block: "12,13, 1T AR3\n"
Expected: “T2,T3,TAR3\n”
3rd:
Word: "TA8\n"
Block: "TAR8\n"
Expected: “TAR8\n”
4th:
Word: "T2TT\n"
Block: "12,13,14,\nTAR35TAR4\n"
Expected: “T2,T3,T4,\nTAR3,TAR4”
5th:
Word: "TTT2AA,RRT333A,,\n"
Block: "12,13,\nTAR35\nTA34\n"
Expected: “T2,T3,\nTAR3,\nTAR34\n”
6th:
Word: "T15\n"
Block: "TAR15\n"
Expected: “TAR15\n”
7th:
Word: "T\n"
Block: "111\n"
Expected: “T11\n”
As you can see, sometimes PSM_SINGLE_WORD returns better results, sometimes PSM_SINGLE_BLOCK does and sometimes neither returns the correct result.
Since I have quite a few of different variations in the images and I don’t understand why some characters are detected incorrectly (e.g. “,” as “5” in 4th) I’m looking for assistance in resolving this problem.
The relevant code snippet is the following:
Pix* pixImage = pixCreate(eroded.cols, eroded.rows, 8);
for (int y = 0; y < eroded.rows; y++) {
for (int x = 0; x < eroded.cols; x++) {
pixSetPixel(pixImage, x, y, eroded.at<uchar>(y, x));
}
}
QString dataDir = qApp->applicationDirPath() + QStringLiteral("/tessdata");
QString d = QDir::toNativeSeparators(dataDir);
tesseract::TessBaseAPI tess;
QString result;
// First pass: PSM_SINGLE_WORD
if (tess.Init(d.toLatin1(), "eng", tesseract::OEM_DEFAULT) == 0) {
tess.SetPageSegMode(tesseract::PSM_SINGLE_WORD);
tess.SetVariable("tessedit_char_whitelist", "TAR0123456789, ");
tess.SetVariable("user_words_file", "wordList.txt");
tess.SetVariable("user_patterns_file", "patterns.txt");
tess.SetVariable("load_system_dawg", "0");
tess.SetVariable("load_freq_dawg", "0");
tess.SetVariable("wordrec_enable_assoc", "0");
tess.SetVariable("use_only_my_words", "1");
tess.SetImage(pixImage);
QString wordResult = QString::fromUtf8(tess.GetUTF8Text());
qDebug() << "Word: " << wordResult;
result += wordResult;
tess.End();
}
// Second pass: PSM_SINGLE_BLOCK
if (tess.Init(d.toLatin1(), "eng", tesseract::OEM_DEFAULT) == 0) {
tess.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
tess.SetVariable("tessedit_char_whitelist", "TAR0123456789, ");
tess.SetVariable("user_words_file", "wordList.txt");
tess.SetVariable("user_patterns_file", "patterns.txt");
tess.SetVariable("load_system_dawg", "0");
tess.SetVariable("load_freq_dawg", "0");
tess.SetVariable("wordrec_enable_assoc", "0");
tess.SetVariable("use_only_my_words", "1");
tess.SetImage(pixImage);
QString blockResult = QString::fromUtf8(tess.GetUTF8Text());
qDebug() << "Block: " << blockResult;
result += blockResult;
tess.End();
}
pixDestroy(&pixImage);
Since this is my second question asked on Stack Overflow I might be missing some information, so please feel free to ask for anything you might required to help me.

tesseractcommand line application, too. Also, concerning the image above, is that what you feed to Tesseract or do you first transform in any additional way in the (non shown) C++ code?