Creating data frame conditionally based on 3 data frames

Question

I have the following 3 data frames:

dfSpa = pd.read_csv(
    "sentences and translations/SpanishSentences.csv", sep=',')
print(dfSpa.head())

dfEng = pd.read_csv(
    'sentences and translations/EngTranslations.csv', sep=',')
print(dfEng.head())

dfIndex = pd.read_csv(
    'sentences and translations/SpaSentencesThatHaveEngTranslations.csv', sep=',')
print(dfIndex.head())

That output the following:

      0    1                               2
0  2482  spa        Tengo que irme a dormir.
1  2487  spa   Ahora, Muiriel tiene 20 años.
2  2493  spa  Simplemente no sé qué decir...
3  2495  spa      Yo estaba en las montañas.
4  2497  spa          No sé si tengo tiempo.
      0    1                               2
0  1277  eng          I have to go to sleep.
1  1282  eng              Muiriel is 20 now.
2  1287  eng     This is never going to end.
3  1288  eng  I just don't know what to say.
4  1290  eng         I was in the mountains.
      0       1
0  2482    1277
1  2487    1282
2  2493    1288
3  2493  693485
4  2495    1290

Colum 0 in dfIndex represents a Spanish sentence in dfSpa and column 1 represents the English translation in dfEng that goes with it. dfSpa has more rows than the other 2 df's so, some sentences do not have english translations. Also, dfIndex is longer than dfEng because there are some duplicate translations with different values such as with 2493, in dfIndex.head(), as shown above.

I am trying to create another data frame that simply has the Spanish sentence in one column and the corresponding English translation in the other column. How could I get this done?

mujjiga · Accepted Answer · 2020-07-22 17:02:48Z

1

dfIndex.merge(
    dfSpa[[0,2]], on=0)[[1,2]].rename(columns={2: "Spa"}).merge(
        dfEng, left_on=1, right_on=0).rename(columns={2: "Eng"})[['Spa', 'Eng']]

answered Jul 22, 2020 at 17:02

mujjiga

17.1k2 gold badges37 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ChairNTable · Accepted Answer · 2020-07-22 16:48:31Z

1

You could try:

df_n=pd.DataFrame()
df_n['A'] = [df.iloc[x].values for x in dfSpa.loc[:,0]]
df_n['B'] = [df.iloc[x].values for x in dfEng.loc[:,0]]

and then remove duplicated rows using:

df_n = df_n.drop_duplicates(subset = ['A'])

It would be easier to check if you had sample dfs.

answered Jul 22, 2020 at 16:48

ChairNTable

864 bronze badges

3 Comments

themrdan Over a year ago

df_n['A'] = [dfIndex.iloc[x].values for x in dfSpa.loc[:, '0']] df_n['B'] = [dfIndex.iloc[x].values for x in dfEng.loc[:, '0']]

Is giving the error single positional indexer is out-of-bounds. How can I provide a sample df? Thanks

ChairNTable Over a year ago

If @mujjiga's comment above doesn't work, you can upload short csv files?

themrdan Over a year ago

His comment worked. But, thanks for the help anyway.

Collectives™ on Stack Overflow

Creating data frame conditionally based on 3 data frames

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related