-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Attach (recommended) or Link to PDF file
https://github.com/mozilla/pdf.js/blob/master/test/pdfs/basicapi.pdf
Web browser and its version
Chrome Version 141.0.7390.123 (Official Build) (64-bit)
Operating system and its version
Linux NB5CG24223JP 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
PDF.js version
5.4.296
Is the bug present in the latest PDF.js version?
Yes
Is a browser extension
No
Steps to reproduce the problem
I'm hesitate if I should report this as feature request or bug, but since this issue makes an API function cannot be really used, I decide to report it as a bug
I noticed that in latest versions, the page proxy's render function can be passed with an operationsFilter, see
Line 1243 in f56dc86
| * @property {OperationsFilter} [operationsFilter] - If provided, only |
I need to render PDF documents selectively, removing all text items, only leave pictures and graphs, and realize this is just what I need. So I attempted to use this feature to skip all ...showText operations
My code was like this
const DENIED_OPS: Set<number> = new Set([
pdfjs.OPS.showText, // 44
pdfjs.OPS.showSpacedText, // 45
pdfjs.OPS.nextLineShowText, // 46
pdfjs.OPS.nextLineSetSpacingShowText, // 47
]);
... ...
const operatorList = await page.getOperatorList();
function operationsFilter(index: number) {
const op = operatorList.fnArray[index];
const shouldKeep = !DENIED_OPS.has(op);
return shouldKeep;
}
... ...
await page.render({
canvasContext: context,
viewport: viewport,
operationsFilter: operationsFilter
} as RenderParameters).promise;
The result was 90% as expected - text items disappeared, graphs remained, but at some places some texts still remained, at unexpected places, and some other parts damaged
Further digging into the issue, eventually, I found that the operation list I got by calling page.getOperatorList() is not the same as the operation list the render function used for rendering. The reason is that the getOperatorList() method and the render() method of PDFPageProxy gets the intentArgs differently, and then use the different intentArgs to obtain the operator list. The operationsFilter function receives an index, but that index refers to the operators in the operator list used by the render function. However, user just cannot find a way to get the same operator list used by the render function (at least I cannot find.....)
See
Line 1451 in 520363b
| const intentArgs = this._transport.getRenderingIntent( |
Line 1637 in 520363b
| /* isOpList = */ true |
Eventually I solved this issue by directly patching the built artifact
--- node_modules/pdfjs-dist/legacy/build/pdf.mjs 2025-10-27 16:59:06.522295998 +1100
+++ node_modules/pdfjs-dist/legacy/build/pdf.mjs.patched 2025-10-27 16:58:47.430362471 +1100
@@ -16111,7 +16111,7 @@
stepper.breakIt(i, continueCallback);
return i;
}
- if (!operationsFilter || operationsFilter(i)) {
+ if (!operationsFilter || operationsFilter(i, operatorList)) {
fnId = fnArray[i];
fnArgs = argsArray[i] ?? null;
if (fnId !== OPS.dependency) {
This patch passes both the index and the operator list to the filter function. With the correct list, my code then worked perfectly.
The exact location of the line in source code is here:
Line 794 in 520363b
| if (!operationsFilter || operationsFilter(i)) { |
What is the expected behavior?
Since the page proxy's render function accepts an operationsFilter, it should allow user to effectively filter the operations...... however the current limitation of pdf.js's public API does not allow user to obtain the operator list used by the render function internally....
Please consider apply the same patch to make the operationsFilter actually useful
What went wrong?
the current limitation of pdf.js's public API does not allow user to obtain the operator list used by the render function internally
Link to a viewer
No response
Additional context
The patch in my report already provided a fix
However I'm not familiar with pdf.js's code base and build system, so I did not submit a PR
Please consider patch the code to make the operationsFilter feature actually useful