Skip to content

[Bug]: render function's operationsFilter is unusable (with proposed fix) #20399

@brenhub24

Description

@brenhub24

Attach (recommended) or Link to PDF file

https://github.com/mozilla/pdf.js/blob/master/test/pdfs/basicapi.pdf

Web browser and its version

Chrome Version 141.0.7390.123 (Official Build) (64-bit)

Operating system and its version

Linux NB5CG24223JP 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

PDF.js version

5.4.296

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

I'm hesitate if I should report this as feature request or bug, but since this issue makes an API function cannot be really used, I decide to report it as a bug

I noticed that in latest versions, the page proxy's render function can be passed with an operationsFilter, see

* @property {OperationsFilter} [operationsFilter] - If provided, only

I need to render PDF documents selectively, removing all text items, only leave pictures and graphs, and realize this is just what I need. So I attempted to use this feature to skip all ...showText operations

My code was like this

    const DENIED_OPS: Set<number> = new Set([
        pdfjs.OPS.showText, // 44
        pdfjs.OPS.showSpacedText, // 45
        pdfjs.OPS.nextLineShowText, // 46
        pdfjs.OPS.nextLineSetSpacingShowText, // 47        
    ]);

    ... ...

    const operatorList = await page.getOperatorList();

    function operationsFilter(index: number) {
        const op = operatorList.fnArray[index];
        const shouldKeep = !DENIED_OPS.has(op);
        return shouldKeep;
    }

    ... ...

    await page.render({
        canvasContext: context,
        viewport: viewport,
        operationsFilter: operationsFilter
    } as RenderParameters).promise;

The result was 90% as expected - text items disappeared, graphs remained, but at some places some texts still remained, at unexpected places, and some other parts damaged

Further digging into the issue, eventually, I found that the operation list I got by calling page.getOperatorList() is not the same as the operation list the render function used for rendering. The reason is that the getOperatorList() method and the render() method of PDFPageProxy gets the intentArgs differently, and then use the different intentArgs to obtain the operator list. The operationsFilter function receives an index, but that index refers to the operators in the operator list used by the render function. However, user just cannot find a way to get the same operator list used by the render function (at least I cannot find.....)

See

const intentArgs = this._transport.getRenderingIntent(
and
/* isOpList = */ true

Eventually I solved this issue by directly patching the built artifact

--- node_modules/pdfjs-dist/legacy/build/pdf.mjs	2025-10-27 16:59:06.522295998 +1100
+++ node_modules/pdfjs-dist/legacy/build/pdf.mjs.patched	2025-10-27 16:58:47.430362471 +1100
@@ -16111,7 +16111,7 @@
         stepper.breakIt(i, continueCallback);
         return i;
       }
-      if (!operationsFilter || operationsFilter(i)) {
+      if (!operationsFilter || operationsFilter(i, operatorList)) {
         fnId = fnArray[i];
         fnArgs = argsArray[i] ?? null;
         if (fnId !== OPS.dependency) {

This patch passes both the index and the operator list to the filter function. With the correct list, my code then worked perfectly.

The exact location of the line in source code is here:

if (!operationsFilter || operationsFilter(i)) {

What is the expected behavior?

Since the page proxy's render function accepts an operationsFilter, it should allow user to effectively filter the operations...... however the current limitation of pdf.js's public API does not allow user to obtain the operator list used by the render function internally....

Please consider apply the same patch to make the operationsFilter actually useful

What went wrong?

the current limitation of pdf.js's public API does not allow user to obtain the operator list used by the render function internally

Link to a viewer

No response

Additional context

The patch in my report already provided a fix
However I'm not familiar with pdf.js's code base and build system, so I did not submit a PR
Please consider patch the code to make the operationsFilter feature actually useful

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions