I have a spark job that outputs individual json files to a storage account. I'm trying to use Stream Analytics (SA) to read the JSON and post an event in Event Hub. It seems like it should be super simple using the no-code editor. I just define my Input (ADLS Gen2) and my output (EventHub). SA can preview the data in the JSON files and all the test connections to input and output are successful. However, when I start the job and I create files in the folder path, SA sees them, I see the number of inputs as I might expect in the metrics, but I see no output events and my watermark delay just keeps going up and up. I don't see any errors other than an hour later where it says there is some sort of timeout. I'm only pushing like 12 files at a time, I'd be hard-pressed to say volume is an issue here.
All the documentation I see online is about moving data from EH to Storage. Nothing on the reverse. I'm just wondering if my json output is messed up somehow.
My SA query is about as simple as it can get. But maybe that's part of the problem:
SELECT * INTO eventhub FROM JsonFiles
It seems super hard to troubleshoot this thing. I can't see inputs, outputs and doesn't seem to generate errors, just, hey, your watermark delay keeps going up and you have no output events. WHY don't I have output events SA? I think the watermark delay means I have events to output, but I haven't output them yet. Help?
