How many rows will SAS check to determine if the input data contains decimal?

Question

Recently I found a problem when uploading excel file to snow flake using SAS.

My excel include 2 columns 1400 rows, in the first column only 11 rows have 2 decimal place, in the second column, most of the data have 2 decimal place.

When I upload the file to snowflake, in the first column all 11 rows with 2 decimal place have been rounded to the nearest whole number. but in the second column all decimal numbers remains.

I heard that the Power Query will check the first 200 rows, if all the first 200 rows are whole number it will round following decimal to whole number. What about SAS? Will SAS also check certain amount of rows to decide the data format for the rest of the file, and how many rows will SAS check?

The code I used to upload the excel

FILENAME REFFILE '/sample.xlsx';

PROC IMPORT DATAFILE=REFFILE
    DBMS=XLSX
    OUT=WS_CRA.output;
    GETNAMES=YES;
RUN;

How are you loading this data? If there's type auto-detection, probably you'll want to do explicit column typing instead. — Felipe Hoffa
– Felipe Hoffa, Commented Aug 15, 2022 at 21:45
I am using the auto-detection. The challenge is in some case I have more than 100 columns, and I am trying to avoid enter the data format for all columns — Yumeng Xu
– Yumeng Xu, Commented Aug 15, 2022 at 21:55
I'm still not sure how you are uploading these Excel files to Snowflake. Can you share more details? — Felipe Hoffa
– Felipe Hoffa, Commented Aug 15, 2022 at 21:57
I suspect your have two separate problems. Converting the XLSX into a SAS dataset and then loading the SAS dataset into Snowflake. If the XLSX engine is creating the right TYPE of variable (numeric versus character) then just changing the format attached to the variable to match what Snowflake expects should prevent the upload to Snowflake from converting the numbers into integers. — Tom
– Tom, Commented Aug 16, 2022 at 0:20

Felipe Hoffa · Accepted Answer · 2022-08-15 22:51:25Z

2

The data is being loaded with SAS.

Their documentation says:

For some input data sources, such as a Microsoft Excel workbook, the first eight rows of data are scanned. The most prevalent data type (numeric or character) is used for a column. This is the default. If most of the data in the first eight rows is missing, SAS defaults to the CHAR data type and any subsequent numeric data for that column is set to missing. (You can change the default from 8 to 0 in the Windows registry; 0 causes all the rows in the column to be scanned to determine the type.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acpcref/p0jf3o1i67m044n1j0kz51ifhpvs.htm

So if you are able to change the default from 8 to 0 in the Windows registry, all rows will be scanned. Otherwise 8.

answered Aug 15, 2022 at 22:51

Felipe Hoffa

59.8k23 gold badges185 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tom Over a year ago

I don't think the XLSX engine limits the number of values it checks. It certainly does not require any options specified in the Windows registry since it does not use any Microsoft code and runs the same on Unix as it does on Windows.

Yumeng Xu Over a year ago

Hi Tom, thank you for your updates, could you explain why those 12 rows with decimal in the 1st column were rounded to the whole number ?

Collectives™ on Stack Overflow

How many rows will SAS check to determine if the input data contains decimal?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related