5

Here's my situation

I have Azure Table as a source, and my target is Azure SQL database. Source table looks something like this:

id file_name metadata
1 file_1.txt {"Company": { "id": 555, "Name": "Company A" }, "quality": [{"quality": 3, "file_name": "file_1.txt"}, {"quality": 4, "file_name": "unkown"}]}
2 file_2.txt {"Company": { "id": 231, "Name": "Company B" }, "quality": [{"quality": 4, "file_name": "file_2.txt"}, {"quality": 3, "file_name": "unkown"}]}
3 file_3.txt {"Company": { "id": 111, "Name": "Company C" }, "quality": [{"quality": 5, "file_name": "unknown"}, {"quality": 4, "file_name": "file_3.txt"}]}

The target table is supposed to look like this:

id file_name company quality
1 file_1.txt Company A 3
2 file_2.txt Company B 4
3 file_3.txt Company C 4

That means that I need to parse the data from this string to get the new column values, as well as use quality value depending on the file_name column from the source.

The first thing I've done is created a Copy pipeline to transfer the data 1 to 1 from Azure Tables to parquet file on Azure Data Lake Store so I can use it as a source in Data Flow. Next, the idea was to use derived column and use some expression to get the data but as far as I can see, there's no expression that treats this string as a JSON object.

So, the next idea was to maybe add a step before this process where I would extract the contents of metadata column to a separate file on ADLS and use that file as a source or lookup and define it as a JSON file to begin with. This would imply that I need to add id value to the JSON file so I'm able to tie the data back to the record.

The first solution looked more promising as the idea, but if that's not an option, I'll look into other possible solutions.

Thanks

6
  • Thanks @qucikshare,I will check if for you, Commented Mar 3, 2021 at 0:33
  • Hi @qucikshare, it's very hard to achieve that in Data Factory. Some suggestions are that you build a stored procedure in Azure SQL database to deal with the source data. The logic may be very complex. Or with function or code level to do that. Commented Mar 3, 2021 at 6:21
  • I've managed to parse the JSON string using parse component in Data Flow, I found a good video on YT explaining how that works. I'll post an answer when I'm done so it's here for reference. Commented Mar 3, 2021 at 15:00
  • I tried in Data Flow and can't build the expression. Hope you can do that and share it to us. Thank you. Commented Mar 4, 2021 at 1:27
  • 1
    I didn't really understand how the parse activity works. This video youtube.com/watch?v=r7O7AJcuqoY helped me to better understand. @JosephXu, your answer is thorough and it seems to do what I wanted, so I'll accept it. But in my case, I used parse and flatten activities to normalize the data, sinked that in a couple of tables and then used SQL joins to get what I needed. Commented Mar 15, 2021 at 15:07

2 Answers 2

6

My debug result is as follows: enter image description here

  1. I think you can use OPENJASON to parse the JSON String.
    select t.id, A.Company,A.quality,A.file_name,A.quality_s,A.file_name_s
        from dbo.test t
        CROSS APPLY OPENJSON(t.metadata) 
        WITH (
            company varchar(255) '$.Company.Name',
            quality varchar(255) '$.quality[0].quality',
            file_name varchar(255) '$.quality[0].file_name',
            quality_s varchar(255) '$.quality[1].quality',
            file_name_s varchar(255) '$.quality[1].file_name'
        ) A

The query result is as follows:
enter image description here
Then use data flow then do further processing. I will show u details when I back to my PC.

  1. Use Copy activity in ADF, copy the query result into a csv. enter image description here

  2. Use data flow to process this csv file. Set the Copy activity generated csv file as the source, data preview is as follows: enter image description here

  3. Use DerivedColumn1 to generate new columns,
    FileName : case(equalsIgnoreCase(file_name,'unknown'),file_name_s,file_name),
    QualityS: case(equalsIgnoreCase(file_name,'unknown'),quality_s,quality) enter image description here Data preview is as follows: enter image description here

  4. Use Select1 activity to filter columns which we want。 enter image description here Data preview is as follows: enter image description here

  5. Then we can sink the result to a SQL table.

Sign up to request clarification or add additional context in comments.

Comments

2

You should use a Parse transformation. It is meant for parsing JSON from a column of data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.