Skip to content

[Bug]: The content stream being parsed is incorrect. Instead of obtaining the updated version of the pdf increment, the old version was taken. #20280

@klwjlalala

Description

@klwjlalala

Attach (recommended) or Link to PDF file

多分辨率的自动MRI图像配准.pdf
This pdf has an increentially updated xref table, but it does not delete the old obj objects, resulting in two OBJs with the same serial number (such as 60obj) inside. According to the pdf specification, the latter obj should be taken, but in reality, only the first one is taken.

Web browser and its version

Firefox version 142

Operating system and its version

windows 11

PDF.js version

The latest version of the master branch

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

After loading the pdf, view the objects on each page and then compare them with the objects in the pdf bytes.

Image

What is the expected behavior?

Consistent with the underlying pdf, after the incremental update of the first page, five contents objects should be associated. However, what is obtained now is that the old version only associated one object.

What went wrong?

The fault-tolerant mechanism for non-standard PDFS is insufficient, and the PDFbox library in java can parse them correctly.

Link to a viewer

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions