0

I have a set of excel files inside ADLS. The format looks similar to the one below:

enter image description here

The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.

Can any help me with what should be expressions in the Derived column / Select?

2 Answers 2

3

My Excel file:

ExcelFile

Source Data set settings (give A5 in range and select first row as header): SourceDataSetProperties

Make sure to refresh schema in the source data set. Schema

After schema refresh, if you preview the source data, you will be seeing all rows from row number 5. This will include footer too which we can filter in data flow.

Preview Next, add a filter transformation with below expression

  !startsWith(sno,'dummy') && sno!=''

this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''

Final Preview after filter:

Preview After Filtering footer

Sign up to request clarification or add additional context in comments.

Comments

1

How about this? Under the 'Source' tab, choose the number of lines you want to skip.

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.