0

What I want to do is look for a specific pattern. 1 letter, a dash, followed by a year and letter like "A-2012A". After that, the rest of the column's value can be anything. I want to confirm this first part. And return a true/false value. Is it possible?

pattern letter-yearletter

String validation on one column with regular expression.

example_column_1

DNA \ Assay
A-2000X-27
A-2000X-32
A-2000X-45
A-2000X-48
A-2000X-80
truth_value = df['DNA \ Assay'].str.match(r'').astype(bool)

Sample, with nothing in the r'' regular expression.

My expected output would be True

example_column_2

DNA \ Assay
Embryo FTA-Code-ID-2
Embryo FTA-Code-ID-3
Embryo FTA-Code-ID-4
Embryo FTA-Code-ID-5
Embryo FTA-Code-ID-6

My expected output with example_column_2 would be False

1 Answer 1

1

Use a regex:

df['valid'] = df['DNA \\ Assay'].str.match(r'[A-Z]-\d{4}[A-Z]', case=False)

output:

  DNA \ Assay  valid
0  A-2000X-27   True
1  A-2000X-32   True
2  A-2000X-45   True
3  A-2000X-48   True
4  A-2000X-80   True

If you want to validate all values:

df['DNA \\ Assay'].str.match(r'[A-Z]-\d{4}[A-Z]', case=False).all()

output: True

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.