I have made a dataframe using a function I created
data = generate_xml()
I then make a subset of the dataframe based on the column names which in this case are called WalmartIDS and ASINS. Below is also an example of what the dataframe looks like
walmartIDS = data.loc[:,['WalmartIDS','ASINS']]
>>
WalmartIDS ASINS
602 20511489 B077BS6737
603 10311487 B077BMHVG7
604 10311302 B077BRTYCS
605 152381151 B077YW9PTQ
606 The-Holiday-Aisle-Projection-Kaleidoscope-Spider-Airblown-Inflatable- Halloween-Decoration-THDA5581.html B076CNN6K5
607 51409868 B0756DMVSC
608 51410962 B0756FKLCV
609 51411020 B0756F3F6J
610 51411529 B0756FDM74
611 915505165 B076W25SDZ
612 400796633 B076VM75ZF
As you can see sometimes bad data will get into the WalmartIDS column. So I want to filter this out by deleting all of the rows in the newly created walmartIDS dataframe where the WalmartIDS column contains characters other than integers. I don't want to alter the data version of the data frame because it is the raw data.
walmartIDS[walmartIDS.WalmartIDS != '^[-+]?[0-9]+$']
However, the above solution doesn't seem to do anything, and I can in fact still see the bad data, (in the example this is row 606) where it should have been deleted.
What is the proper way to do this?