0

I'm trying to create a Cosmo DB stored procedure to return the results of a relatively simple SQL statement. If it were purely SQL I would be fine but since I know nothing about JavaScript I'm struggling mightily. Any help would be very much appreciated. Here is the SQL query:

SELECT distinct cx.ID, cxa.FieldValue as PartNo, cx.TransactionDate, cx.TransactionStatus
FROM c
JOIN cx in c.File.Transactions
JOIN cxa in cx.AppDetails
JOIN
(
 SELECT cx2.ID, cxa2.FieldValue as PartNo, max(cx2.TransactionDate) as TransactionDate
 FROM c
 JOIN cx2 in c.File.Transactions
 JOIN cxa2 in cx2.AppDetails
 WHERE c.File.Category= 'BatchParts' and cxa2.FieldName ='PartNo'
 GROUP BY cx2.ID,cxa2.FieldValue
) B 
WHERE c.File.Category= 'BatchParts' and cxa.FieldName ='PartNo'
4
  • 1
    Don't use stored procedures for queries in Cosmos DB. Stored procedures only run on the primary partition so can only access 1/4 of all of the throughput provisioned. You should only use stored procedures when you need to insert multiple items in a transaction or need bounded execution. Commented Nov 9, 2020 at 15:52
  • Ah, yes I was beginning to think it was impossible. My requirement is to summarize transaction data from Cosmo DB and import to PowerBI. If I have a series of transactions but I only want to get the most recent transaction along with other pertinent fields like status, I can use the MAX function on TransactionDate and GROUP BY ID and PartNo and then join that back to the transaction detail data but the query itself always times out. Should I be using a UDF to optimize performance or should I just be using something like Databricks and Spark SQL to transform the data and bring into PowerBI? Commented Nov 9, 2020 at 19:42
  • 1
    I don't really have enough data to tell you which way you should go. Generally speaking, Cosmos is not a database suitable for heavy analytics. It sounds like you would benefit from using Synapse Link with SQL Serverless which you can then connect to PowerBI to visualize your data. You can learn more here, learn.microsoft.com/en-us/azure/cosmos-db/synapse-link Commented Nov 9, 2020 at 22:50
  • 1
    Added this as an answer as I see this question somewhat frequently and Synapse Link is fairly new and is what is recommended for large, complex analytics-type queries. Commented Nov 9, 2020 at 22:54

2 Answers 2

2

This type of query, if timing out in a stored procedure or via SDK, is probably best handled using Synapse Link. Stored procedures are bad candidates for queries because they only operate on the master replica (there are 4 of them). Because throughput is allocated equally across all four replicas, stored procedures only get 1/4 of the provisioned throughput.

Synapse Link is designed to be used in this sort of scenario where you have large, complex, analytical type queries and want to visualize your data using Power BI. To learn more about Cosmos DB and Synapse see, What is Azure Synapse Link for Azure Cosmos DB (Preview)?

Sign up to request clarification or add additional context in comments.

Comments

1

You can try something like this:

function getItems(category,fieldName) {
    var collection = getContext().getCollection();

    var query = 'SELECT distinct cx.ID, cxa.FieldValue as PartNo, cx.TransactionDate, cx.TransactionStatus ' +
                'FROM c ' +
                'JOIN cx in c.File.Transactions ' +
                'JOIN cxa in cx.AppDetails ' +
                'JOIN ' +
                '( ' +
                    'SELECT cx2.ID, cxa2.FieldValue as PartNo, max(cx2.TransactionDate) as TransactionDate ' +
                    'FROM c ' +
                    'JOIN cx2 in c.File.Transactions ' +
                    'JOIN cxa2 in cx2.AppDetails ' +
                    'WHERE c.File.Category= @Category and cxa2.FieldName = @FieldName ' +
                    'GROUP BY cx2.ID,cxa2.FieldValue ' +
                ') B ' + 
                'WHERE c.File.Category= @Category  and cxa.FieldName = @FieldName';

    var filterQuery =
    {
        'query' : query,
        'parameters' : [{'name':'@Category', 'value':category},{'name':'@FieldName', 'value':fieldName}] 
    };

    var isAccepted = collection.queryDocuments(
        collection.getSelfLink(),
        filterQuery,
    function (err, feed, options) {
        if (err) throw err;

        if (!feed || !feed.length) {
            var response = getContext().getResponse();
            response.setBody('no docs found');
        }
        else {
            var response = getContext().getResponse();
            var body = feed;
            response.setBody(JSON.stringify(body));
        }
    });

    if (!isAccepted) throw new Error('The query was not accepted by the server.');
}

By the way, when you invoke stored procedure, you need to pass partition key value. And you can only get the data from this partition. You can refer to this doc and this.

2 Comments

Thanks very much for the help Steve! It works great. I am trying to call this stored procedure from within Power BI, is that possible? If not, I was thinking of converting this stored procedure into a UDF to call it using SQL but I kept getting errors. Is this not correct? SELECT udf.GetLatestPartBySource('BatchPart','PartNo') as Batches FROM c JOIN cx in c.File.Transactions WHERE c.File.Category= 'BatchPart'
After speaking with some colleagues they said it is possible to execute the stored proc with in-line SQL to pass to PowerBI but I need to hardcode the parameters. Not sure if that's correct but I'll update my findings once I can test

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.