5

I'm confused. How could you explain this diffenece in variable concatenation with ORDER BY?

declare @tbl table (id int);
insert into @tbl values (1), (2), (3);

declare @msg1 varchar(100) = '', @msg2 varchar(100) = '',
    @msg3 varchar(100) = '',    @msg4 varchar(100) = '';

select @msg1 = @msg1 + cast(id as varchar) from @tbl
order by id;

select @msg2 = @msg2 + cast(id as varchar) from @tbl
order by id+id;

select @msg3 = @msg3 + cast(id as varchar) from @tbl
order by id+id desc;

select TOP(100) @msg4 = @msg4 + cast(id as varchar) from @tbl
order by id+id;

select
    @msg1 as msg1,
    @msg2 as msg2,
    @msg3 as msg3,
    @msg4 as msg4;

Results

msg1  msg2  msg3  msg4
----  ----  ----  ----
123   3     1     123  
6
  • Please see "Should questions include “tags” in their titles?", where the consensus is "no, they should not"! Commented Mar 18, 2015 at 13:20
  • Hmm, looks like a bug Commented Mar 18, 2015 at 13:25
  • 4
    It is bizarre behavior as a result of doing something inappropriate. Variables should not be assigned based on rows that return multiple values. Personally, I think this should give an error, but there may be a good reason for it not doing so. Commented Mar 18, 2015 at 13:27
  • 5
    This has long been documented not to work: "Do not use a variable in a SELECT statement to concatenate values (that is, to compute aggregate values). Unexpected query results may occur. This is because all expressions in the SELECT list (including assignments) are not guaranteed to be executed exactly once for each output row" Commented Mar 18, 2015 at 13:30
  • 1
    Going from Damien's link, the KB article linked there applies directly to this question: support.microsoft.com/en-us/kb/287515. It is noted there that using any operators or expressions in the ORDER BY clause will cause this unexpected behavior. Commented Mar 18, 2015 at 13:39

4 Answers 4

6

As many have confirmed, this is not the right way to concatenate all the rows in a column into a variable - even though in some cases it does "work". If you want to see some alternatives, please check out this blog.

According to MSDN (applies to SQL Server 2008 through 2014 and Azure SQL Database) , the SELECT should not be used to assign local variables. In the remarks, it describes how, when you do use the SELECT, it attempts to behave. The interesting points to note:

  • While typically it should only be used to return a single value to a variable, when the expression is the name of the column, it can return multiple values.
  • When the expression does return multiple values, the variable is assigned the last value that is returned.
  • If no value is returned, the variable retains its original value (not directly relevant here, but worth noting).

The first two points here are key - concatenation happens to work because SELECT @msg1 = @msg1 + cast(id as varchar) is essentially SELECT @msg1 += cast(id as varchar), and as the syntax notes, += is an accepted compound assignment operator on this expression. Please note here that it should not be expected this operation to continue to be supported on VARCHAR and to do string concatenation - just because it happens to work in some situations doesn't mean it is ok for production code.

The bottom line as to the underlying reason is whether the Compute Scalar that runs on the select expression uses the original id column or an expression of the id column. You probably can't find any docs on why the optimizer might choose the specific plans for each query, but each example highlights different use cases that allow the msg value to be evaluated from the column (and therefore multiple rows being returned and concatenated) or expression (and therefore only the last column).

  1. @msg1 is '123' because the Compute Scalar (the row-by-row evaluation of the variable assignment) occurs after the Sort. This allows the scalar computation to return multiple values on the id column concatenating them through the += compound operator. I doubt there is specific documentation why, but it appears the optimizer chose to do the sort before the scalar computation because the order by was a column and not an expression.

  2. @msg2 is '3' because the Compute Scalar is done before the sort, which leaves the @msg2 in each row just being the ('' + id) - so never concatenated, just the value of the id. Again, probably not any documentation why the optimizer chose this, but it appears that since the order by was an expression, perhaps it needed to do the (id+id) in the order by as part of the scalar computation before it could sort. At this point, your original column is no longer referencing the source column, but it has been replaced by an expression. Therefore, as MSDN stated, your first column points to an expression, not a column, so the behavior assigns the last value of the result set to the variable in the SELECT. Since you sorted ASC, you get '3' here.

  3. @msg3 is '1' for the same reason as example 2, except you ordered DESC. Again, this becomes an expression in the evaluation - not the original column, so therefore the assignment gets the last value of the DESC order, so you get '1'.

  4. @msg4 is '123' again because the TOP operation forces an initial scalar evaluation of the ORDER BY so that it can determine your top 100 records. This is different than examples 2 and 3 in which the scalar computation contained both the order by and select computations which caused each example to be an expression and not refer back to the original column. Example 4 has the TOP separating the ORDER BY and SELECT computations, so after the SORT (TOP N SORT) is applied, it then does the scalar computation for the SELECT columns in which at this point you are still referencing the original column (not an expression of the column), and therefore it returns multiple rows allowing the concatenation to occur.

Sources:

Sign up to request clarification or add additional context in comments.

1 Comment

Very detail ansver. Thank you
2

SQL Server will calculate the results, then sort them, then return them. In the case of assigning a variable, only the first result will be used to populate your variable. You are receiving the first value from the sorted result sets, which can move around the order SQL Server will scan the records as well as the position within the results.

TOP will always produce special query plans as it immediately forces SQL Server to stick to the natural ordering of the results instead of producing query plans that would statistically reduce the number of records it must read.

To explain the differences, you'll have to refer to how SQL Server decided to implicitly sort the values to optimize the query.

Query 1

Insert -> Table Insert -> Constant Scan

Query 2

SELECT -> Compute Scalar -> Sort -> Table Scan

Query 3, and 4

SELECT -> Sort -> Compute Scalar -> Table Scan

Query 5 and 6 (using TOP)

SELECT -> Compute Scalar -> Sort (Top N) -> Compute Scalar -> Table Scan

I added Query 6:

select top (100)
    @msg5 = @msg5 + cast(id as varchar)
from @tbl
order by id+id desc

4 Comments

Final results shouldn't be different with different execution plans.
@GiorgiNakeuri Normally you'd be correct, however this isn't your normal scenario. Since only the first result is used, how SQL Server decided to form the results will affect which result is used.
@GiorgiNakeuri If you're working within the defined operations, yes. This is not the case here - don't expect stuff not to break when you're using dirty hacks :)
Sorry, I had combined two query plans which shouldn't have been, my answer should be clearer now.
1

All I can see is there is a difference in the execution plans. They all start with SELECT and end with Table Scan. The difference is in between, the Compute Scalar and the Sort.

@Msg1 has Compute Scalar then Sort. Results: 123
@Msg2 has Sort then Compute Scalar. Results: 3
@Msg3 has Sort then Compute Scalar. Results: 1

The fourth one is different because of the top. It still starts with select and ends with table scan, but it's different in the middle. It uses a different sort.

@Msg4 has Compute Scalar then Sort(Top N Sort) then Compute Scalar

Comments

0

You're not supposed to set variables in a select that returns more than a single row. Consider this code:

select top 1 @msg1 = @msg1 + cast(id as varchar) from @tbl
order by id;

select top 1 @msg2 = @msg2 + cast(id as varchar) from @tbl
order by id+id;

select top 1 @msg3 = @msg3 + cast(id as varchar) from @tbl
order by id+id desc;

select top 1 @msg4 = @msg4 + cast(id as varchar) from @tbl
order by id+id;

Producing 1, 1, 3 and 1, respectively.

I'm pretty surprised it doesn't cause an exception, I was quite sure it used to forbid this outright.

The underlying point is still the same: the SQL engine isn't just executing some commands procedurally, one by one, as you might expect. It will build an execution plan that is tailored to be as efficient as possible (given many constraints).

On the other hand, assigning a variable is inherently procedural, and requires an explicit execution / evaluation order to work correctly.

You're combining the two approches - select id from @tbl order by id is a non-procedural query, but select @id = id from @tbl order by id is a mix of the procedural @id = id, and the very much non-procedural select.

9 Comments

This is a technique for concatenating multiple rows of the same column into a single variable and is valid. The OP is asking why the behavior of this technique differs based on the order by.
@JasonW I'd like to see the reference for that. Do you really expect this to behave in a reasonable manner, even when the SQL server decides to parallelize the query? Or when it's distributed over a server cluster, or partitioned tables and indices?
@JasonW - "valid" in terms of "will not generate errors", yes. "valid" in terms of "does what the OP expected", not so much. See reference in my comment to the question.
I'm not arguing that is the best way or even a good way - just that it is something that works that many people use.
@JasonW Something that sometimes works, when the stars are right. It's not a "technique", it's a dirty hack.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.