0

If azure stream is passed to Conversion.ToImages then conversion fails with an error. However it succeeds if azure stream is copied to file system first and FileStream is used instead. The problem does not occur for all pdf files, only specific ones (link below).

Tried PDFtoImage and Syncfusion.PdfToImageConverter.Net libraries which are based on pdfium engine and both fail with same error.

Pdf download link

Example code provided using PDFtoImage

public async Task<List<SKBitmap>> TestConvertPdfToImagesAsync()
{
    var blobServiceClient = new BlobServiceClient("?");
    
    BlobClient blobClient = blobServiceClient
        .GetBlobContainerClient("test")
        .GetBlobClient("map.pdf");
    
    var bitmaps = new List<SKBitmap>();
    
    using (Stream pdfStream = await blobClient.OpenReadAsync())
    {
        foreach (SKBitmap bitmap in Conversion.ToImages(pdfStream, true, null, new RenderOptions { Dpi = 220 }))
        {
            bitmaps.Add(bitmap);
        }
    }
    
    return bitmaps;
}

This is the error I'm receiving

System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception. at PDFtoImage.Internals.NativeMethods.Imports.FPDF_LoadPage(IntPtr document, Int32 page_index) at PDFtoImage.Internals.NativeMethods.FPDF_LoadPage(IntPtr document, Int32 page_index) at PDFtoImage.Internals.PdfFile.PageData..ctor(IntPtr document, IntPtr form, Int32 pageNumber) at PDFtoImage.Internals.PdfFile.RenderPDFPageToBitmap(Int32 pageNumber, IntPtr bitmapHandle, Int32 boundsOriginX, Int32 boundsOriginY, Int32 boundsWidth, Int32 boundsHeight, Int32 rotate, FPDF flags, Boolean renderFormFill) at PDFtoImage.Internals.PdfDocument.RenderSubset(PdfFile file, Int32 page, Single width, Single height, PdfRotation rotate, FPDF flags, Boolean renderFormFill, SKColor backgroundColor, Nullable1 bounds, Single originalWidth, Single originalHeight, CancellationToken cancellationToken) at PDFtoImage.Internals.PdfDocument.Render(Int32 page, Nullable1 requestedWidth, Nullable1 requestedHeight, Single dpiX, Single dpiY, PdfRotation rotate, FPDF flags, Boolean renderFormFill, SKColor backgroundColor, Nullable1 bounds, Boolean useTiling, Boolean withAspectRatio, Boolean dpiRelativeToBounds, CancellationToken cancellationToken) at PDFtoImage.Conversion.RenderImpl(PdfDocument pdfDocument, Int32 page, FPDF renderFlags, RenderOptions options) at PDFtoImage.Conversion.ToImagesImpl(PdfDocument pdfDocument, RenderOptions options, IEnumerable1 pages)+MoveNext() at PDFtoImage.Conversion.ToImagesImpl(Stream pdfStream, Boolean leaveOpen, String password, RenderOptions options, IEnumerable1 pages)+MoveNext() at PDFtoImage.Conversion.ToImages(Stream pdfStream, IEnumerable`1 pages, Boolean leaveOpen, String password, RenderOptions options)+MoveNext() at Biometric.Tests.Tests.PdfToImageCovert()

1
  • Copy the Azure blob stream to a MemoryStream or temporary file first before passing it to Conversion.ToImages. Commented May 20 at 17:51

1 Answer 1

0

Azure stream is passed to Conversion.ToImages then conversion fails with an error using .net.

I had understood that passing the Azure Blob stream directly to Conversion.ToImages caused an SEHException, but only for some PDFs.

The issue is due to how the underlying PDFium engine handles streams especially ones coming directly from Azure Blob Storage.

Instead of passing the stream from blobClient.OpenReadAsync() directly, I had first copied the content into a MemoryStream and then passed that to Conversion.ToImages.

Program.cs

class Program
{
    static async Task Main(string[] args)
    {        
        var config = JsonSerializer.Deserialize<Dictionary<string, string>>(File.ReadAllText("appsettings.json"));
        var blobServiceClient = new BlobServiceClient(config["BlobConnectionString"]);
        var blobClient = blobServiceClient
            .GetBlobContainerClient(config["ContainerName"])
            .GetBlobClient(config["PdfFileName"]);
        var bitmaps = new List<SKBitmap>();
        using (Stream azureStream = await blobClient.OpenReadAsync())
        using (var memoryStream = new MemoryStream())
        {
            await azureStream.CopyToAsync(memoryStream);
            memoryStream.Position = 0;        
            foreach (SKBitmap bitmap in Conversion.ToImages(memoryStream, true, null, new RenderOptions { Dpi = 220 }))
            {
                bitmaps.Add(bitmap);
            }
        }
        Console.WriteLine($"Converted {bitmaps.Count} pages to images.");
                for (int i = 0; i < bitmaps.Count; i++)
        {
            string outputPath = $"output_page_{i + 1}.png";
            using var imageStream = File.OpenWrite(outputPath);
            bitmaps[i].Encode(imageStream, SKEncodedImageFormat.Png, 100);
            Console.WriteLine($"Saved: {outputPath}");
        }
    }
}

appsettings.json

{
  "BlobConnectionString": "<Your conection string>",
  "ContainerName": "<Your Container name>",
  "PdfFileName": "<Your pdfname>"
}

Output:

Image

The pdf blob converted to image successfully as shown below.

Image1

Sign up to request clarification or add additional context in comments.

2 Comments

The customer often sends large pdf files like construction blueprints, schemas and loading it in memory or copying to file system sounds like a waste of resources and latency. I was rather looking for a more optimal solution. Can somebody perhaps suggest a more flexible pdf to image library?
For large PDFs like blueprints, the most optimal solution is to use Aspose.PDF or host a Ghostscript/Poppler-based microservice to handle conversion efficiently without loading entire files into memory @Linas

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.