0

I've got a node server setup which accepts pdf uploads and uploads them to a hedera blockchain testnet.

app.post('/upload', upload.single('pdf'), function (req. res) {
    uploadToBlockchain(req.file)
    res.send('finished')
})

and a python script which does some processing on a file given a file buffer:

from Transformer import Transformer as tf

with open('file_path', 'rb') as pdf_file:
    images = pdf_file.read()

a = tf.bytes_to_hash_array(images)
print(a)

Is there some way to link these two up, so that I can send the PDF data I received from node(req.file) and pipe it to the python script to process and return the results? I've tried a few things with child process but haven't managed to get the file open in python.

I've tried in the node server.js:

var pythonOut;
const python = spawn('python', ['Scripts/convert_pdf.py', req.file]);
python.stdout.on('data', function (data) {
    pythonOut = data;
});
python.on('close', (code) => {
    console.log(`child process close all stdio with code ${code}`);
    console.log(pythonOut);
});
uploadToBlockchain(pythonOut);

Which if I read using sys.argv[1] in Python, gives me an [object Object] which I'm not sure how to open as a file to send to my transformer object. I've also tried sending req.file.buffer as an argument, but it simply sends a string of the buffer information rather than the actual bytes in the buffer.

2
  • What child process command have you tried? Commented Sep 8, 2021 at 17:58
  • @Alex028502 added in main post Commented Sep 8, 2021 at 18:21

1 Answer 1

1

Your received [object Object] in your argv because the spawn accepts only an array of strings as arguments, while you passed an object(req.file).

Unfortunately i cannot help you with python code, but i can help with the node part and what you have to do:

  1. You need to output in convert_pdf.py a base64 version of your results - just make sure your script runs ok and you have only the encoded base64 output ! ( or skip this and read the 3rd point )
  2. Prepare your spawn process:

var pythonOut;
/**
 * Create a root function.
 * It's better to put an absolute path if you don't know the cwd(current working directory)
 * The arguments passed to root(x,y,z, ...n) will resolve relative to the __dirname (parent folder of the running node script file - i assume index.js - )
 */
const root = (...input) => require('path').resolve(__dirname, ...input);

const pythonScript = root("make/sure/the/path/is/valid", "Scripts/convert_pdf.py");
// If you're using Multer in memory use the line number 1
// If you're using Multer with an upload path use line number 2
const uploadedFile = req.file.buffer.toString('base64'); // 1
//const uploadedFile = req.file.destination; // 2 - this might be what you need, looking at your python code

console.log("pythonScript", pythonScript);
console.log("uploadedFile", uploadedFile);

// If you're running under windows and nodepad/defaultEditor opens by itself with the script contents, add the ".exe" extension to command e.g. python.exe
const command = 'python'; 

const {spawn} = require('child_process');
const python = spawn(command, [pythonScript, uploadedFile], {
  shell: true, // use the local shell
  env: {
    // Send node environment to the python process, it might need something from it, or not
    ...process.env
    // or add additional "key": "value" and send it to python env
  },
  // Here is the folder from where to run the python command, by default its process.cwd()
  // You don't need cwd now since pythonScript and uploadedFile have absolute paths
  // Set this correctly if you're using relative paths in your command arguments.
  // cwd: root("path/to/", "Scripts", "folder")
});

let storeDataChunks = [];

// I'm removing the new lines from the shell and storing the rows in storeDataChunks
const output = (data) => {
  let str = data.toString().split(/(\r\n|\n)+/).filter(i => i.trim().length);
  storeDataChunks = storeDataChunks.concat(str);
};
python.stdout.on('data', output);
python.stderr.on('data', (data) => {
  console.log('[errors from python]', data.toString());
});
python.on('close', (code) => {
  console.log(`child process close all stdio with code ${code}`);
  let pythonOut = storeDataChunks.join(''); // this will be the base64 string sent from python
  
  // Important: if you plan on sending a whole file such as "images/media with headers" from python do not use .toString() on the buffer.
  let decode = Buffer.from(pythonOut, 'base64').toString(); //decode base64
  decode = JSON.parse(decode); // You don't need this line if it's not an object
  console.log(decode);

  // Run here your upload function when the process is done.
  uploadToBlockchain(decode);
});

  1. As a test print the python code from below in convert_pdf.py, it was generated in the browser console:

browser: btoa(JSON.stringify(["hello\nworld", 2, {a: 3}, null]))

python: print("WyJoZWxsb1xud29ybGQiLDIseyJhIjozfSxudWxsXQ==")

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, just needed the b64 conversion of the file into convert_pdf.py to get it working, the code I tried before works with python just piping non b64 encoded back into node

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.