I have been trying this for a while and I am stuck. Here it is the problem:
I am working with some metadata about texts that I have in CSV files. It looks like this:
The real table is longer and more complex, but it follows the same logic: every row is a text and every column is different aspects of the text. I have in some of the columns to much variation and I want it to remodel in a simpler one. For example changing from the narrative-perspective the values of homodiegetic and autodiegetic to non-heterodiegetic. I define this new model in another CSV file called keywords that looks like this:
As you can see, every column of the metadata becomes a row in the new model-keywords, where the old value is in the term_value column and the new value is in the new_model column.
So I need to map or replace this values using Pandas. This is what I have got for now:
import re
import pandas as pd
df_metadata = pd.read_csv("/metadata.csv", encoding="utf-8", sep=",")
df_keywords = pd.read_csv("/keywords.csv", encoding="utf-8", sep="\t")
for column_metadata,value_metadata in df_metadata.iteritems():
if str(column_metadata) in list(df_keywords.loc[:,"term_type"]):
df_metadata.loc[df_metadata[column_metadata] == value_metadata, column_metadata] = df_keywords.loc[df_keywords["term_value"] == value_metadata, ["new_model"]]
And Python always gives this error back:
"ValueError: Series lengths must match to compare"
I think the problem is in the value_metadata of the second part of the replace with loc, I mean here:
df_keywords.loc[df_keywords["term_value"] == value_metadata, ["new_model"]]
The thing I don't understand is why value_metadata works in the first part of this command but it doesn't in the second one...
Please, I would appreciate any help. Maybe there is a simpler way than iterate through the dataframes... I am very open to any suggestion. Best regards, José

