0

I have a spark job that outputs individual json files to a storage account. I'm trying to use Stream Analytics (SA) to read the JSON and post an event in Event Hub. It seems like it should be super simple using the no-code editor. I just define my Input (ADLS Gen2) and my output (EventHub). SA can preview the data in the JSON files and all the test connections to input and output are successful. However, when I start the job and I create files in the folder path, SA sees them, I see the number of inputs as I might expect in the metrics, but I see no output events and my watermark delay just keeps going up and up. I don't see any errors other than an hour later where it says there is some sort of timeout. I'm only pushing like 12 files at a time, I'd be hard-pressed to say volume is an issue here.

All the documentation I see online is about moving data from EH to Storage. Nothing on the reverse. I'm just wondering if my json output is messed up somehow.

My SA query is about as simple as it can get. But maybe that's part of the problem:

SELECT * INTO eventhub FROM JsonFiles

It seems super hard to troubleshoot this thing. I can't see inputs, outputs and doesn't seem to generate errors, just, hey, your watermark delay keeps going up and you have no output events. WHY don't I have output events SA? I think the watermark delay means I have events to output, but I haven't output them yet. Help?

1
  • to use Azure Stream Analytics (ASA) to read JSON files from ADLS Gen2 and write them to Event Hub, especially when going in this "reverse" direction (compared to the more common EH → ADLS path). Commented May 21 at 12:22

1 Answer 1

0

So I figured this out myself. My EventHub had a Cleanup policy of "Compact" and not "Delete". Apparently there is a requirement when pushing messages to an EventHub with "Compact" cleanup policy to have a PartitionKey included, which I was not including. The only way I found this out was the LogAnalytics table named AZMSDiagnosticErrorLogs. It had a single error repeated:

compacted event hub does not allow null message key.

There were no error messages anywhere else that I could find.

So to fix, in my Stream Analytics output settings, I included a column for the Partition key column.

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

The image link is broken
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.