2

I have TABLE as a source with just 1 row, like this in Azure data factory with 336 columns:

1 2 3 4 5 6 7 8 9
value1 value2 value3 value4 value5 value6 value7 value8 value9

And a want to combine every 3 columns into the first 3:

1 2 3
value1 value2 value3
value4 value5 value6
value7 value8 value9

What is the alternative to using Select on every 3 columns and then Join as it is long process with this many columns?

3
  • Is order important for you ? Do you have any primary key? Commented May 11, 2021 at 10:52
  • The values are generic double numbers...no primary key,I just want to combine every 3 columns into the first 3 Commented May 12, 2021 at 13:01
  • I think you have good answer :) Commented May 12, 2021 at 13:02

1 Answer 1

2

If your datasource is Azure SQL DB, you could conventional SQL to transform the row with a combination of UNVPIVOT, PIVOT and some of the ranking functions to help group the data. A simple example:

DROP TABLE IF EXISTS #tmp;

CREATE TABLE #tmp (
    col1    VARCHAR(10),
    col2    VARCHAR(10),
    col3    VARCHAR(10),
    col4    VARCHAR(10),
    col5    VARCHAR(10),
    col6    VARCHAR(10),
    col7    VARCHAR(10),
    col8    VARCHAR(10),
    col9    VARCHAR(10)
);


INSERT INTO #tmp 
VALUES ( 'value1', 'value2', 'value3', 'value4', 'value5', 'value6', 'value7', 'value8', 'value9' )



SELECT [1], [2], [0] AS [3]
FROM
    (
    SELECT
        NTILE(3) OVER( ORDER BY ( SELECT NULL ) ) nt,
        ROW_NUMBER() OVER( ORDER BY ( SELECT NULL ) ) % 3 groupNumber,
        newCol
    FROM #tmp
    UNPIVOT ( newCol for sourceCol In ( col1, col2, col3, col4, col5, col6, col7, col8, col9 ) ) uvpt
    ) x
PIVOT ( MAX(newCol) For groupNumber In ( [1], [2], [0] ) ) pvt;

Tweak the NTILE value depending on the number of columns you have - it should be the total number of columns you have divided by 3. For example if you have 300 columns, the NTILE value should be 100, if you have 336 columns it should be 112. A bigger example with 336 columns is available here.

Present the data to Azure Data Factory (ADF) either as a view or use the Query option in the Copy activity for example.

My results:

results

If you are using Azure Synapse Analytics then another fun way to approach this would be using Synapse Notebooks. With just three lines of code, you can get the table from the dedicated SQL pool, unpivot all 336 columns using the stack function and write it back to the database. This simple example is in Scala:

val df  = spark.read.synapsesql("someDb.dbo.pivotWorking")

val df2 = df.select( expr("stack(112, *)"))

// Write it back
df2.write.synapsesql("someDb.dbo.pivotWorking_after", Constants.INTERNAL)

I have to admire the simplicity of it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.