1

Recently I found a problem when uploading excel file to snow flake using SAS.

My excel include 2 columns 1400 rows, in the first column only 11 rows have 2 decimal place, in the second column, most of the data have 2 decimal place.

When I upload the file to snowflake, in the first column all 11 rows with 2 decimal place have been rounded to the nearest whole number. but in the second column all decimal numbers remains.

I heard that the Power Query will check the first 200 rows, if all the first 200 rows are whole number it will round following decimal to whole number. What about SAS? Will SAS also check certain amount of rows to decide the data format for the rest of the file, and how many rows will SAS check?

The code I used to upload the excel

FILENAME REFFILE '/sample.xlsx';

PROC IMPORT DATAFILE=REFFILE
    DBMS=XLSX
    OUT=WS_CRA.output;
    GETNAMES=YES;
RUN;
7
  • How are you loading this data? If there's type auto-detection, probably you'll want to do explicit column typing instead. Commented Aug 15, 2022 at 21:45
  • I am using the auto-detection. The challenge is in some case I have more than 100 columns, and I am trying to avoid enter the data format for all columns Commented Aug 15, 2022 at 21:55
  • I'm still not sure how you are uploading these Excel files to Snowflake. Can you share more details? Commented Aug 15, 2022 at 21:57
  • Just added the code I use. Many thanks! Commented Aug 15, 2022 at 22:02
  • 1
    I suspect your have two separate problems. Converting the XLSX into a SAS dataset and then loading the SAS dataset into Snowflake. If the XLSX engine is creating the right TYPE of variable (numeric versus character) then just changing the format attached to the variable to match what Snowflake expects should prevent the upload to Snowflake from converting the numbers into integers. Commented Aug 16, 2022 at 0:20

1 Answer 1

2

The data is being loaded with SAS.

Their documentation says:

For some input data sources, such as a Microsoft Excel workbook, the first eight rows of data are scanned. The most prevalent data type (numeric or character) is used for a column. This is the default. If most of the data in the first eight rows is missing, SAS defaults to the CHAR data type and any subsequent numeric data for that column is set to missing. (You can change the default from 8 to 0 in the Windows registry; 0 causes all the rows in the column to be scanned to determine the type.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acpcref/p0jf3o1i67m044n1j0kz51ifhpvs.htm

So if you are able to change the default from 8 to 0 in the Windows registry, all rows will be scanned. Otherwise 8.

Sign up to request clarification or add additional context in comments.

2 Comments

I don't think the XLSX engine limits the number of values it checks. It certainly does not require any options specified in the Windows registry since it does not use any Microsoft code and runs the same on Unix as it does on Windows.
Hi Tom, thank you for your updates, could you explain why those 12 rows with decimal in the 1st column were rounded to the whole number ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.