2

Here are the first 20 lines of my text file, I have such 50K lines.

prov_type|prov_type_desc
0|FAMILY PRACTICE/CLINIC
1|FAMILY PRACTICE
2|ALLERGIST
3|DERMATOLOGIST
4|INTERNIST
5|NEUROLOGIST
6|NEUROSURGEON
7|OB/GYN
8|OPTHAMOLOGIST
9|ORTHOPEDIST
10|OTOLARYNGOLOGIST
11|PATHOLOGIST
12|PEDIATRICIAN
13|PLASTIC SURGEON
14|COLON AND RECTAL SURGERY
15|PSYCHIATRIST
16|RADIOLOGIST
17|SURGEON
18|THORACIC SURGEON
19|UROLOGIST
20|ANESTHESIOLOGIST

I'm reading like this,

ovations = pd.read_csv("Ovations.txt",sep='|',dtype=object)
ovations.rename(columns={'prov_type_desc':'specialty'},inplace=True)

I wrote a dictionary in order to match specialty, Here is the dict.

options = {'FAMILYPRACTICESELF-REFFERAL' : 'FAMILY PRACTICE',
'FAMILYPRACTICESPECIALIST' : 'FAMILY PRACTICE',
'FAMILYPRACTICE/CLINIC' : 'FAMILY PRACTICE',
'GENERALPRACTICE' : 'FAMILY PRACTICE',
'ALLERGY' : 'ALLERGIST',
'ALLERGYANDIMMUNOLOGY' : 'ALLERGIST',
'ALLERGY&IMMUNOLOGY' : 'ALLERGIST',
'ALLERGY/IMMUNOLOGY' : 'ALLERGIST',
'CARDIOLOGY' : 'CARDIOLOGIST',
'CARDIOLOGYGROUP' : 'CARDIOLOGIST',
'CARDIOVASCULARDISEASE' : 'CARDIOLOGIST',
'COLON&RECTALSURGERY' : 'COLON AND RECTAL SURGERY',
'COLON/RECTALSURGERY' : 'COLON AND RECTAL SURGERY',
'COLORECTALSURGERY' : 'COLON AND RECTAL SURGERY',
'DERMATOLOGYGROUP' : 'DERMATOLOGIST',
'DERMATOLOGY' : 'DERMATOLOGIST',
'ENDOCRINOLOGY,DIABETES,ANDMETABOLISM' : 'ENDOCRINOLOGIST',
'ENDOCRINOLOGY' : 'ENDOCRINOLOGIST',
'ENDODONDIST' : 'ENDODONTICS',
'GASTROENTEROLOGY' : 'GASTROENTEROLOGIST',
'GASTROENTEROLOGYGROUP' : 'GASTROENTEROLOGIST',
'GENETICCOUNSELOR' : 'GENETIC TESTING/COUNSELING CENTER',
'GENETICS,CLINICAL(MD)' : 'GENETIC TESTING/COUNSELING CENTER',
'GENETICS,CLINICALMOLECULAR' : 'GENETIC TESTING/COUNSELING CENTER',
'HEMATOLOGYONCOLOGY' : 'HEMATOLOGY/ONCOLOGY',
'HEMATOLOGIST' : 'HEMATOLOGY/ONCOLOGY',
'HEMATOLOGY' : 'HEMATOLOGY/ONCOLOGY',
'HEMATOLOGYGROUP' : 'HEMATOLOGY/ONCOLOGY',
'HEMATOLOGY-ONCOLOGY' : 'HEMATOLOGY/ONCOLOGY',
'HEMATOLOGY-ONCOLOGYGROUP' : 'HEMATOLOGY/ONCOLOGY',
'HOSPICE&PALLATIVEMED' : 'HOSPICE',
'HOSPITALOP/LAB/XRAY' : 'HOSPITAL',
'HOSPITALIST' : 'HOSPITAL',
'INFECTIOUSDISEASEMEDICINE' : 'INFECTIOUS DISEASE',
'INTERNALMED' : 'INTERNAL MEDICINE',
'INTERNALMEDICINESPECIALIST' : 'INTERNAL MEDICINE',
'INTERNIST' : 'INTERNAL MEDICINE',
'INFECTIOUSDISEASESEPCIALIST' : 'INFECTIOUS DISEASE',
'NEPHROLOGY' : 'NEPHROLOGIST',
'NEUROLOGY' : 'NEUROLOGIST',
'OBSTETRICS' : 'OBSTETRICS AND GYNECOLOGY',
'OBSTETRICS&GYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY',
'OBSTETRICS/GYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY',
'OB/GYNGROUP' : 'OBSTETRICS AND GYNECOLOGY',
'OBSTETRICSGYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY',
'OBGYNECOLOGISTSPECIALTY' : 'OBSTETRICS AND GYNECOLOGY',
'OB/GYN' : 'OBSTETRICS AND GYNECOLOGY',
'OB/GYNSELFREFCAP' : 'OBSTETRICS AND GYNECOLOGY',
'GYNECOLOGY' : 'OBSTETRICS AND GYNECOLOGY',
'ONCOLOGY' : 'ONCOLOGIST',
'GYNECOLOGICONCOLOGY' : 'ONCOLOGIST',
'GYNECOLOGICALONCOLOGY' : 'ONCOLOGIST',
'GYNECOLOGICAL/ONCOLOGY' : 'ONCOLOGIST',
'OPHTHALMOLOGY' : 'OPTHAMOLOGIST',
'OTOLARYNGOLOGY' : 'OTOLARYNGOLOGIST',
'OTOLARYNGOLOGY(ENT)' : 'OTOLARYNGOLOGIST',
'PATHOLOGY' : 'PATHOLOGIST',
'PATHOLOGYSERVICES' : 'PATHOLOGIST',
'PATHOLOGY,ANATOMIC' : 'PATHOLOGIST',
'CYTOPATHOLOGY' : 'PATHOLOGIST',
'PATHOLOGY,ANATOMICAL&CLINICAL' : 'PATHOLOGIST',
'PATHOLOGY,BLOOD BANKING/TRANSFUSIONMED' : 'PATHOLOGIST',
'PATHOLOGY,CLINICAL' : 'PATHOLOGIST',
'PATHOLOGY,CYTOPATHOLOGY' : 'PATHOLOGIST',
'PATHOLOGY,DERMATOPATHOLOGY' : 'PATHOLOGIST',
'PATHOLOGY,HEMATOLOGY' : 'PATHOLOGIST',
'PATHOLOGY,IMMUNOPATHOLOGY' : 'PATHOLOGIST',
'PATHOLOGY,NEUROPATHOLOGY' : 'PATHOLOGIST',
'DERMATOLOGY-DERMATOPATHOLOGY' : 'PATHOLOGIST',
'DERMATOPATHOLOGY' : 'PATHOLOGIST',
'PEDIATRICMEDICINE' : 'PEDIATRICIAN',
'PEDIATRSELFREFCAP' : 'PEDIATRICIAN',
'PEDIATRICSPECIALTYIALIST' : 'PEDIATRICIAN',
'PEDIATRICS' : 'PEDIATRICIAN',
'PEDIATRICSSPECIALTYIALIST' : 'PEDIATRICIAN',
'PLASTICANDRECONSTRUCTIVESURGERY' : 'PLASTIC SURGEON',
'PLASTICSURGERY' : 'PLASTIC SURGEON',
'PLASTICSURGERYWITHINTHEHEAD&NECK' : 'PLASTIC SURGEON',
'PSYCHIATRY' : 'PSYCHIATRIST'}

I wrote a function like this in order to get the value for the key,

def key_in_dic(p):
    return next((options[x] for x in p if x in options), 'Other')
ovations['specialty_adj'] = key_in_dic(list(ovations['specialty']))

It is not working as expected, What could be the problem in this?

Here is how, I am getting, It should return Other for Non matching keys whic is ALLERGIST, but it is not the case.

enter image description here Thank you.

3
  • 1
    Maybe add, how it is working and highlight the mismatch from how it should? Commented Mar 25, 2017 at 11:48
  • 1
    updated, please check Commented Mar 25, 2017 at 11:51
  • Why don't you use options.get(x, default='Other') to specify a default value for nonexistent specialties? Commented Mar 25, 2017 at 11:58

2 Answers 2

2

As Barmar already stated, you can use the get method of dictionaries. I think the following should give you what you want:

ovations["specialty_adj"] = ovations["specialty"].apply(lambda x: options.get(x, "Other"))
Sign up to request clarification or add additional context in comments.

4 Comments

One more help, I want to return specialty , if it is not matching, Can you suggest me how to do this?
Happy to help. Note that you need to have matching spellings for the string comparison here, as options contains FAMILYPRACTICE/CLINIC but in the data frame it is written FAMILY PRACTICE/CLINIC with a space. Depending on what you want to achieve you could try options.get(x.replace(" ",""), "Other" in the lambda expression. You should accept this answer as the correct one, though.
yeah, Thank you, I would do for sure, for non matching record it should return the value of specialty, How to do?
You should check the documentation for dicts. get will return the value for the searched item or a default value which you can specify. So options.get(x, x) should work for you.
1

Use the dict.get() method to specify a default when the key isn't found.

def key_in_dict(p):
    return (options.get(x, default='Other') for x in p)

1 Comment

it is giving me <generator object key_in_dic.<locals>.<genexpr..., while assigning to new variable into data frame, any changes while calling the function?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.