1

I am unable to find a way to create an external table in Azure SQL Data Warehouse (Synapse SQL Pool) with Polybase where some fields contain embedded commas.

For a csv file with 4 columns as below:

myresourcename, 
myresourcelocation, 
"""resourceVersion"": ""windows"",""deployedBy"": ""john"",""project_name"": ""test_project""",
"{  ""ResourceType"": ""Network"",  ""programName"": ""v1""}" 

Tried with the following Create External Table statements.

CREATE EXTERNAL FILE FORMAT my_format
WITH (  
    FORMAT_TYPE = DELIMITEDTEXT,
    FORMAT_OPTIONS(
        FIELD_TERMINATOR=',',
        STRING_DELIMITER='"',
        First_Row = 2
    )    
); 
CREATE EXTERNAL TABLE my_external_table
(
resourceName VARCHAR,
resourceLocation    VARCHAR,
resourceTags    VARCHAR,
resourceDetails VARCHAR
)  
WITH (
    LOCATION = 'my/location/',
    DATA_SOURCE = my_source,  
    FILE_FORMAT = my_format
)

But querying this table gives the following error:

Failed to execute query. Error: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Too many columns in the line.

Any help will be appreciated.

1
  • Do you have to DELIMITEDTEXT format? I prefer to use parquet.snappy as the compression is better and data quality issues is almost non existent. Otherwise I consider that data as garbage and set my reject threshold up from 0 to 50 or something inline with that Commented Nov 2, 2020 at 10:24

1 Answer 1

1

Currently this is not supported in polybase, need to modify the input data accordingly to get it working.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.