0

I’m trying to use sql query on azure-databricks with distinct sort and aliases

SELECT DISTINCT album.ArtistId AS my_alias 
FROM album ORDER BY album.ArtistId

The problem is that if I add alias then I can not use not aliased name in the order by clause. ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.
If I remove distinct it also works.

Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
   +- Project [ArtistId#2506 AS my_alias#2500]
      +- SubqueryAlias spark_catalog.chinook.album
         +- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet

Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to SQL dialects. And I can not find any documentation about it.

Is there any way to make this query run as is, without modification maybe by changing some execution flags?

4
  • You could always duplicate the column and don't alias it, shouldn't add on too much query time Commented Sep 7, 2021 at 8:24
  • The query is generated in sqlalchemy, I prefer to keep code same for all sql dialects. Commented Sep 7, 2021 at 8:37
  • Also aliases are required when a query contains multiple joins between two tables. Table1 join Table2 join Table1, then to get distinct column names aliases are required. And for self joins. Commented Sep 8, 2021 at 9:53
  • Actually no, they are not required, only table name aliases are needed, not column names. Commented Sep 25, 2021 at 12:21

2 Answers 2

1

When tried, I also got the same error.

As a workaround

  1. You can give the column position value (number) in the Order By clause.

    select distinct teacher_id, name, age, id as student_id 
    from student 
    order by 4
    

enter image description here

  1. You can give the column alias name in the Order By.

    select distinct teacher_id, name, age, id as student_id 
    from student 
    order by student_id
    

Please feel free to post your question in Microsoft Q&A forum where the product team will monitor them closely.

Sign up to request clarification or add additional context in comments.

Comments

0

I ended up just disabling column aliases, but keeping table name aliases. So queries with self join looks like this:

select c2.id from Customer as c1 join Customer as c2 on c1.id = c2.id

No aliases and no problems with self joins.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.