I have a dataframe df1
Questions Purpose
what is scientific name of <input> scientific name
what is english name of <input> english name
And I have 2 lists as below:
name1 = ['salt','water','sugar']
name2 = ['sodium chloride','dihydrogen monoxide','sucrose']
I want to create a new dataframe by replacing <input> by values in the list depends on the purpose.
if purpose is english name replace <input> by values in name2
else replace <input> by name1.
Expected Output DataFrame:
Questions Purpose
what is scientific name of salt scientific name
what is scientific name of water scientific name
what is scientific name of sugar scientific name
what is english name of sodium chloride english name
what is english name of dihydrogen monoxide english name
what is english name of sucrose english name
My Efforts
questions = []
purposes = []
for i, row in df1.iterrows():
if row['Purpose'] == 'scientific name':
for name in name1:
ques = row['Questions'].replace('<input>', name)
questions.append(ques)
purposes.append(row['Purpose'])
else:
for name in name2:
ques = row['Questions'].replace('<input>', name)
questions.append(ques)
purposes.append(row['Purpose'])
df = pd.DataFrame({'Questions':questions, 'Purpose':purposes})
The above code produces expected output. But it is too slow as I have many questions on the original dataframe. (I have multiple purposes too but for now, I'm sticking with only 2).
I am looking for a more efficient solution which may get rid of for loop.