4

I have a pandas dataframe with almost 56 columns and 120000 row.

I would like to implement validation only on some columns and not for all of them.

I followed article at https://tmiguelt.github.io/PandasSchema/

When i did like something below function, it throws an error as

"Invalid number of columns. The schema specifies 2, but the data frame has 56"

def DoValidation(self, df):
    null_validation = [CustomElementValidation(lambda d: d is not np.nan, 'this field cannot be null')]

    schema = pandas_schema.Schema([Column('ItemId', null_validation)],
                                   [Column('ItemName', null_validation)])
    errors = schema.validate(df)
    if (len(errors) > 0):
        for error in errors:
            print(error)
        return False
    return True

Am i doing something wrong ?

What is the correct way to validate specific column in a dataframe ?

Note: I have to implement different type of validations like decimal, length, null check validations etc on different columns and not just null check validation as show in function above.

1
  • because schema only has two columns in the list, like pyspark you need to define all 56 of the columns into the schema's before passing in the function. Commented Jan 20, 2020 at 16:52

2 Answers 2

4

As Yuki Ho mentioned in his answer, by default you have to specify as many columns in the schema as your dataframe.

But you can also use the columns parameter in schema.validate() to specify which columns to check. Combining that with schema.get_column_names() you can do the following to easily avoid your issue.

schema.validate(df, columns=schema.get_column_names())
Sign up to request clarification or add additional context in comments.

Comments

0

Error goes as "Invalid number of columns. The schema specifies 2, but the data frame has 56" because you have 56 columns. You might have to validate all of those 56 or create a new df containing the columns you want to specify.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.