1

I am coding Databricks SQL with the following code:

SELECT
activityid,
concat_ws(ts_inititals, ', ') AS attendees
FROM 
    (SELECT 
     activityparty.activityid,
     systemuser.systemuserid,
     systemuser.ts_inititals 
     FROM baseorigination.activityparty
     LEFT JOIN baseorigination.systemuser
     ON activityparty.partyid = systemuser.systemuserid
     GROUP BY activityparty.activityid, systemuser.systemuserid, systemuser.ts_inititals ) attendees
GROUP BY activityid

As you can see the code uses a subquery, which is fine when executing the code on MS SQL, however, I'm getting the following error on Databricks

Error in SQL statement: AnalysisException: expression 'attendees.ts_inititals' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

Can someone let me know how to re-write the code to overcome this issue?

1 Answer 1

1

You may use ARRAY_SORT() and COLLECT_SET():

SELECT
    activityid,
    array_join(collect_set(ts_inititals), ', ') AS attendees
FROM
(
    SELECT DISTINCT
        a.activityid,
        s.systemuserid,
        s.ts_inititals 
     FROM baseorigination.activityparty a
     LEFT JOIN baseorigination.systemuser s
         ON a.partyid = s.systemuserid
) attendees
GROUP BY activityid;

Note that I have replaced the GROUP BY in the inner query with a distinct select.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.