I would like to concatenate 2 csv files. Each CSV file has the following structure:
File 1
id,name,category-id,lat,lng
4c29e1c197,Area51,4bf58dd8d,45.44826958,9.144208431
4ede330477,Punto Snai,4bf58dd8d,45.44833354,9.144086353
51efd91d49,Gelateria Cecilia,4bf58dd8d,45.44848931,9.144008735
File 2
id,name,category-id,lat,lng
4c29e1c197,Area51,4bf58dd8d,45.44826958,9.144208432
4ede330477,Punto Snai,4bf58dd8d,45.44833354,9.144086353
51efd91d49,Gelateria Cecilia,4bf58dd8d,45.44848931,9.144008735
5748729449,Duomo Di Milano,52e81612bc,45.463898,9.192034
I got a final csv that look like
Final file
id,name,category-id,lat,lng
4c29e1c197,Area51,4bf58dd8d,45.44826958,9.144208431
4c29e1c197,Area51,4bf58dd8d,45.44826958,9.144208432
4ede330477,Punto Snai,4bf58dd8d,45.44833354,9.144086353
51efd91d49,Gelateria Cecilia,4bf58dd8d,45.44848931,9.144008735
5748729449,Duomo Di Milano,52e81612bc,45.463898,9.192034
So I have done this:
import pandas as pd
df1=pd.read_csv("file1.csv")
df2=pd.read_csv("file2.csv")
full_df = pd.concat(df1,df2)
full_df = full_df.groupby(['id','category_id','lat','lng']).count()
full_df2 = full_df[['id','category_id']].groupby('id').agg('count')
full_df2.to_csv("final.csv",index=False)
I tried to groupby by id, categoy_id, lat and lng, the name could change After the first groupby I want to groupby again but now by id and category_id because as showed in my example the first row changed in long but that is probably because file2 is an update of file1
I don't understand about groupby because when i tried to print I got just the count value.