I am new to coding and I'm having an issue merging csv files. I have searched similar questions and haven't found a fix. Just to include some relevant details: CSV files are cancer types over the period of 1950 - 2017 for different countries (lung cancer, colorectal cancer, stomach cancer, liver cancer and breast cancer) Below is an example of the layout of lung cancer.
dlung.describe(include='all')
dlung
Year Cancer Country Gender ASR SE
0 1950 Lung Australia Male 13.89 0.56
1 1951 Lung Australia Male 14.84 0.57
2 1952 Lung Australia Male 17.19 0.61
3 1953 Lung Australia Male 18.21 0.62
4 1954 Lung Australia Male 19.05 0.63
5 1955 Lung Australia Male 20.65 0.65
6 1956 Lung Australia Male 22.05 0.67
7 1957 Lung Australia Male 23.93 0.69
8 1958 Lung Australia Male 23.77 0.68
9 1959 Lung Australia Male 26.12 0.71
10 1960 Lung Australia Male 27.08 0.72
I am interested in joining all cancer types into one dataframe based on shared columns (year, country). I have tried different methods, but they all seem to duplicate Year and Country (as below)
This one wasn't bad, but I have two columns for year and country
df_lung_colorectal = pd.concat([dlung, dcolorectal], axis = 1) df_lung_colorectal Year Cancer Country Gender ASR SE Year Cancer Country Gender ASR SE
If I continue like this, I will end up with 5 identical columns for YEAR and 5 for COUNTRY.
Any ideas on how merge all values that are independent (Cancer type and associated ASR (standardized risk), and SE values) with only one column for YEAR, COUNTRY (and GENDER) if possible?