1

I am trying to build a container image for a Node.js Lambda function. My base image is like this:

FROM public.ecr.aws/lambda/nodejs:20

COPY index.js ${LAMBDA_TASK_ROOT}

CMD [ "index.handler" ]

However, my Node.js function also uses pdf2htmlEX package. One way to install it is with apt-get. Running apt-get in the above dockerfile will return an error "command not found". Understandable, because apt-get is not available in the Node.js image from AWS.

Maybe that's not be the way to do it. Ultimately, how do I get a Node.js Lambda function to execute a Linux package (pdf2htmlEX in this case)?


Update 1: After testing out various options and combinations, I created an alternative base image in order to install pdf2htmlEX, node.js and npm :

ARG FUNCTION_DIR="/function"
FROM ubuntu:18.04
ARG FUNCTION_DIR
ENV NODE_VERSION=16.13.0
RUN apt-get update
COPY ./pdf2htmlEX.deb /tmp
# Install pdf2htmlEX and node.js and related packages
RUN apt-get install -y /tmp/pdf2htmlEX.deb curl cmake autoconf libtool
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
ENV NVM_DIR=/root/.nvm
RUN . "$NVM_DIR/nvm.sh" && nvm install ${NODE_VERSION}
RUN . "$NVM_DIR/nvm.sh" && nvm use v${NODE_VERSION}
RUN . "$NVM_DIR/nvm.sh" && nvm alias default v${NODE_VERSION}
ENV PATH="/root/.nvm/versions/node/v${NODE_VERSION}/bin/:${PATH}"
RUN mkdir -p ${FUNCTION_DIR}
COPY index.js package.json ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
RUN npm install aws-lambda-ric
CMD [ "index.handler" ]

The build failed when installing aws-lambda-ric, the runtime interface client required when using an alternative base image. The error log is too long to post here, so maybe it's not the right config.


Update 2: Another attempt using node-20:buster:

ARG FUNCTION_DIR="/function"

FROM node:20-buster as build-image

# Include global arg in this stage of the build
ARG FUNCTION_DIR

COPY ./pdf2htmlEX.deb /tmp
# Install build dependencies
RUN apt-get update && \
apt-get install -y \
/tmp/pdf2htmlEX.deb

Got another type of error:

The following packages have unmet dependencies:
pdf2htmlex : Depends: libjpeg-turbo8 but it is not installable
Unable to correct problems, you have held broken packages.
2
  • Your problem is not with docker or lambda. It is with pdf2htmlex. I cannot even install it on a fresh ubuntu os. For html to pdf there is a lot of tools well supported. Commented Jun 2, 2024 at 14:09
  • @JRichardsz This tool is for converting pdf to html though. This offers finer control for online publishing, quite different from converting html to pdf. This is the best open source tool for this purpose that I can find. Any other recommendations? Commented Jun 4, 2024 at 16:03

1 Answer 1

2

Summary

  • public images from AWS for lambda with nodejs are based on centos or fedora:
    • FROM public.ecr.aws/lambda/nodejs:20
    • FROM amazon/aws-lambda-nodejs:18
  • the tool pdf2htmlEX don't support centos nor fedora
  • If you build your own docker image from ubuntu without following the aws specifications, will throw errors or unexpected behaviors if you deploy it on a real aws account
  • The specs that I discovered are: read only os expect /tmp and aws-lambda-ric is required
  • To build in your localhost being compatible with the real aws servers needs the usage of aws-lambda-runtime-interface-emulator

Attempts

  • I tried to convert the .deb to .rpm (using alien tool) and then install it on the centos image. After fix almost all the errors, one last error was fatal: Some library about jpg

Steps

  • Build a base image compatible with real aws servers and local testing
  • At the end, install the pdf2htmlEX tool with pre requisites

Folder and files

I tried this image locally and in a real aws account.

The folder should looks like

enter image description here

Dockerfile

# Get a base image
FROM public.ecr.aws/ubuntu/ubuntu:22.04

# Set some defaults
ARG LAMBDA_TASK_ROOT="/app"
ARG LAMBDA_RUNTIME_DIR="/usr/local/bin"
ARG PLATFORM="linux/amd64"

RUN groupadd --gid 1000 node; \
    useradd --uid 1000 --gid node --shell /bin/bash --create-home node

# node
ENV NVM_DIR /usr/local/nvm
ENV NODE_VERSION v20.13.0
RUN mkdir -p /usr/local/nvm && apt-get update && echo "y" | apt-get install curl
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
RUN /bin/bash -c "source $NVM_DIR/nvm.sh && nvm install $NODE_VERSION && nvm use --delete-prefix $NODE_VERSION"
ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/bin
ENV PATH $NODE_PATH:$PATH

WORKDIR /app

## Install aws-lambda-ric
RUN apt-get update; \
    apt-get install -y \
        g++ \
        make \
        cmake \
        unzip \
        libcurl4-openssl-dev \
        autoconf \
        automake \
        build-essential \
        libtool \
        m4 \
        python3 \
        unzip \
        libssl-dev; \       
    rm -rf /var/lib/apt/lists/*;
RUN npm install aws-lambda-ric -g

# Copy function code
COPY ./app/ ${LAMBDA_TASK_ROOT}/
RUN npm install 


# Prevent this warn
# npm WARN logfile Error: ENOENT: no such file or directory, scandir '/home/sbx_user1051/.npm/_logs'
# https://stackoverflow.com/a/73394694/3957754
RUN mkdir -p /tmp/.npm/_logs
ENV npm_config_cache /tmp/.npm

# (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
WORKDIR ${LAMBDA_TASK_ROOT}
ADD "https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie" "/usr/bin/aws-lambda-rie"
COPY entry.sh /
RUN chmod 755 "/usr/bin/aws-lambda-rie" "/entry.sh"

## Install pdf2htmlEX

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update; \
    apt-get -y install tzdata libjpeg-turbo8 wget gpg curl xz-utils jq libglib2.0-dev libcairo2-dev

RUN curl -s https://api.github.com/repos/pdf2htmlEX/pdf2htmlEX/releases/latest  | jq -r '.assets[] | select(.name=="pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-bionic-x86_64.deb").browser_download_url'  | wget -qi - -O /tmp/pdf2htmlEX.deb

RUN dpkg -i /tmp/pdf2htmlEX.deb
RUN pdf2htmlEX -v


ENTRYPOINT [ "/entry.sh" ]
CMD [ "app.handler" ]

entry.sh

#!/bin/sh
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
  exec /usr/bin/aws-lambda-rie npx aws-lambda-ric $1
else
  exec npx aws-lambda-ric $1
fi

app.js

const exec = require('util').promisify(require('child_process').exec);

exports.handler = async (event, context) => {

    let out = await exec(`pdf2htmlEX -v`).catch(e => e);

    console.log("pdf2htmlEX command", JSON.stringify(out));

    return {
        statusCode: 200,
        body: {
            code: 200,
            message: "Hell!!"
        }
    }; 
}

pdf2htmlEX

As a proof that pdf2htmlEX was installed I printed the pdf2htmlEX -v

in localhost

enter image description here

inside the container

enter image description here

real aws account

enter image description here

dockerhub

I publish the image so is ready to use

https://hub.docker.com/repository/docker/jrichardsz/aws-lambda-nodejs/general

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much for taking the time to build the image.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.