Upload attachment to Confluence using Nifi ExecuteScript in Python

Question

I'm trying to upload a PDF file to Confluence using Nifi's ExecuteScript processor. I can upload the file successfully, but when I download and open it, it's BLANK. There must be something wrong with my conversion. Can anyone please help check?

So this is how I do it:

download the PDF file from an internal API
ExecuteScript Groovy - to convert the flowfile content to attribute

import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets

flowFile = session.get()
if(!flowFile)return
def text = ''
session.read(flowFile, {inputStream ->
  text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
} as InputStreamCallback)

flowFile = session?.putAttribute(flowFile, "file_content", text)
session.transfer(flowFile, /*ExecuteScript.*/ REL_SUCCESS)

3. ExecuteScript Python - to upload PDF file to Confluence

Here's my code for #3. I think something's wrong here -->

import json
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
from org.apache.nifi.processor.io import OutputStreamCallback

class OutputWrite(OutputStreamCallback):
  def __init__(self, obj):
  self.obj = obj
  def process(self, outputStream):
     outputStream.write(bytearray(json.dumps(self.obj).encode('utf-8')))

flowFile = session.get()
if (flowFile != None):
  url = 'https://myconfluence.com/rest/api/content/12345/child/attachment'
  auth = 'myauthorization'
  file_name = 'mypdf.pdf'
  file_content = flowFile.getAttribute('file_content')

  s = requests.Session()

  m = MultipartEncoder(fields={'file': (file_name, file_content, 'application/pdf')})
  headers = {"X-Atlassian-Token":"nocheck", "Authorization":auth, "Content-Type":m.content_type}

  r = s.post(url, data=m, headers=headers, verify=False)

  session.write(flowFile, OutputWrite(json.loads(r.text)))
  session.transfer(flowFile, REL_SUCCESS)
  session.commit()

UPDATE 06/28/2019

I decided to follow Peter's advice and merge codes 1 and 2. It's still not working. Before, the PDF file is 2MB, but it's BLANK. Now, its size is 0KB. Any help would be greatly appreciated!

import json
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
from org.apache.nifi.processor.io import OutputStreamCallback
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import InputStreamCallback

class PyInputStreamCallback(InputStreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)

class OutputWrite(OutputStreamCallback):
    def __init__(self, obj):
        self.obj = obj
    def process(self, outputStream):
        outputStream.write(bytearray(json.dumps(self.obj).encode('utf-8')))

text = ''
flowFile = session.get()
if(flowFile != None):
    session.read(flowFile, PyInputStreamCallback())
    confluence_attachment_api = flowFile.getAttribute('confluence_attachment_api')
    confluence_authorization = flowFile.getAttribute('confluence_authorization')
    file_name = flowFile.getAttribute('file_name')

    s = requests.Session()
    m = MultipartEncoder(fields={'file': (file_name, text, 'application/pdf')})
    headers = {"X-Atlassian-Token":"nocheck", "Authorization":confluence_authorization, "Content-Type":m.content_type}
    r = s.post(confluence_attachment_api, data=m, headers=headers, verify=False)

    session.write(flowFile, OutputWrite(json.loads(r.text)))
    session.transfer(flowFile, REL_SUCCESS)
    session.commit()

just add some prints to console in your code to see where you loosing the data. the only point that i can see - you are reading the file as a text with PyInputStreamCallback but there could be a binary content.. — daggett
– daggett, Commented Jul 9, 2019 at 9:23

Peter · Accepted Answer · 2019-06-28 18:04:15Z

1

It doesn't look like you are actually sending the FlowFile contents. Instead, you are sending an attribute named file_content as the file contents, which probably isn't what you intended

You will need to do a session.read to get the file stream. The below code doesn't work as is, but shows how you can get access to the stream.

class PyInputStreamCallback(InputStreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream):
    m = MultipartEncoder(fields={'file': (file_name, inputStream, 'application/pdf')})

session.read(flowFile, PyInputStreamCallback())

Ref: https://community.hortonworks.com/articles/75545/executescript-cookbook-part-2.html

answered Jun 28, 2019 at 18:04

Peter

9,7127 gold badges66 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Raii Over a year ago

Hello @Peter, I think I did that here --> "ExecuteScript Python - to convert the flowfile content to attribute". It's in [file_content]. I'll edit my question and include the code (not just the screenshot).

Peter Over a year ago

@Raii - Sorry, the image was a little small, so I made the assumption that it was the same processor your using to Post the data. Anyways, I would suggest not doing it that way. Just combine them together into a single script, see how that goes.

Raii Over a year ago

Hello @Peter, I did my best and merged the code. I posted it as an update above. Can you help check if you have time? Thanks!

Collectives™ on Stack Overflow

Upload attachment to Confluence using Nifi ExecuteScript in Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related