tesseract.js - draw bounding box

Question

I am using Tesseract.js using a realtime video stream. I want to draw a box around the words that have been recognised. I found the below code but perhaps it's old so I can't get the bbox nor data.words properties. I only have data.text which then doesn't have bbox.

try {
    const { data } = await Tesseract.recognize(dataUrl, "eng");

    const found: string[] = [];

    const overlayCtx = overlay.getContext("2d");
    if (!overlayCtx) return;

    overlayCtx.clearRect(0, 0, overlay.width, overlay.height);

    data.words.forEach((word) => {
      const lowerWord = word.toLowerCase();
      overlayCtx.strokeStyle = "red";
      overlayCtx.lineWidth = 2;
      overlayCtx.strokeRect(
        word.bbox.x0,
        word.bbox.y0,
        word.bbox.x1 - word.bbox.x0,
        word.bbox.y1 - word.bbox.y0
      );
      overlayCtx.font = "16px sans-serif";
      overlayCtx.fillStyle = "red";
      overlayCtx.fillText(word, word.bbox.x0, word.bbox.y0 - 4);
    }
}

G-Force · Accepted Answer · 2025-05-12 17:56:28Z

0

The default return from Tesseract.recognize() is of type text. This is why you don't see the metadata with the locations you seek, for bounding boxes.

See this spec for reference:

I would choose either TSV or JSON as the output format, for which you can then parse the components to get the X, Y, width, and height values.

Here's also an example that can guide you on the flow:

import { createWorker } from 'tesseract.js';

async function recognizeText(imageURL) {
  const worker = await createWorker();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data } = await worker.recognize(imageURL, {
    outputFormat: 'tsv'
  });
  await worker.terminate();
  return data;
}

// Example usage:
const imageURL = 'your-image-url.png';
recognizeText(imageURL)
  .then(tsvData => {
    console.log(tsvData);
  })
  .catch(error => {
    console.error('Error:', error);
  });

answered May 12 at 17:56

G-Force

4003 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

CopyPaste Oct 16 at 18:19

Thank you for the hint! However, the syntax was different for me: const { data } = await worker.recognize(imageURL, { }, {tsv: true, blocks: true, hocr:true, json:true});

chrxtina · Accepted Answer · 2025-10-04 07:28:45Z

0

data.tsv kept coming back null so switching from v6 to v4 gave me data.words for bounding boxes.

edited Oct 4 at 7:28

answered Oct 4 at 7:10

chrxtina

436 bronze badges

Collectives™ on Stack Overflow

tesseract.js - draw bounding box

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related