Use pandas to combine 2 CSV files

Question

I have 2 csv files containing bathymetry data - single and multibeam

CSV 1 which is the multibeam data looks like:

X	Y	Z
626066.4	234058.2	6.69
626066.4	234059.2	6.89
626066.4	234060.2	7.06

And CSV 2 which contains the singlebeam data looks like:

x	Y	A	B	C
627839	232463.4	14.22	14.46	14.71
627839	232463.1	14.22	14.46	14.71

I would like to use pandas to merge the 2 CSVs based as follows:

The X and Y coordinates of CSV2 are combined with the pre-existing X,Y coordinates of CSV 1
The specified Z value of CSV 2 (A, B or C) is combined with CSV 1 Z value

This way I will have a combined xyz dataset of a survey area which will evidentally be more accurate due to a combination of multi and single beam data.

For clarification, what I want to end up with results wise is:

X	Y	Z
626066.4	234058.2	6.69
626066.4	234059.2	6.89
626066.4	234060.2	7.06
627839	232463.4	14.22
627839	232463.1	14.22

I have tried the below code snippet, but need a way of combining the X's, Y's and Z's based on specified columns of CSV 2.

import pandas as pd

# Read the files into two dataframes.
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

# Merge the two dataframes, using _ID column as key
df3 = pd.merge(df1, df2, on = 'X')
df3.set_index('X', inplace = True)

# Write it to a new CSV file
df3.to_csv('CSV3.csv')

can you include sample data, instead of screenshots?

Rafael Barros
– Rafael Barros

2021-11-15 20:15:48 +00:00
Commented Nov 15, 2021 at 20:15 — Rafael Barros
– Rafael Barros, Commented Nov 15, 2021 at 20:15
What do you mean by combined? Added together? Averaged?

DNy
– DNy

2021-11-15 20:17:05 +00:00
Commented Nov 15, 2021 at 20:17 — DNy
– DNy, Commented Nov 15, 2021 at 20:17
@DNy All records of CSV 2 added beneath CSV 1

User_289
– User_289

2021-11-15 20:23:19 +00:00
Commented Nov 15, 2021 at 20:23 — User_289
– User_289, Commented Nov 15, 2021 at 20:23

Tranbi · Accepted Answer · 2021-11-15 21:22:23Z

1

IIUC you want to be able to tell you function which column is to be handled as Z in df2 and then concat both lists:

def concat_df_on_z(df1, df2, z_col):
    df2 = df2[['X', 'Y', z_col]].rename(columns={z_col: 'Z'})
    return pd.concat([df1, df2])

df3 = concat_df_on_z(df1, df2, 'B')
print(df3)

Output:

          X         Y      Z
0  626066.4  234058.2   6.69
1  626066.4  234059.2   6.89
2  626066.4  234060.2   7.06
0  627839.0  232463.4  14.46
1  627839.0  232463.1  14.46

answered Nov 15, 2021 at 21:22

Tranbi

12.8k6 gold badges19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

User_289 Over a year ago

If I use the script

import pandas as pd  # Read the files into two dataframes. df1 = pd.read_csv('C:/Users/Public/multibeam.csv') df2 = pd.read_csv('C:/Users/Public/singlebeam.csv')  def concat_df_on_z(df1, df2, z_col):     df2 = df2[['X', 'Y', z_col]].rename(columns={z_col: 'Z'})     return pd.concat([df1, df2])  df3 = concat_df_on_z(df1, df2, 'B') df3.to_csv(r'C:\Users\Public\CSV3.csv')

I get the below results

User_289 Over a year ago

X[m] Y[m] Z[m] X Y Z 0 626066.43 234058.2 6.69 NaN NaN NaN 1 626066.43 234059.2 6.89 NaN NaN NaN 2 626066.43 234060.2 7.06 NaN NaN NaN 3 626067.43 234057.2 6.69 NaN NaN NaN 4 626067.43 234058.2 6.89 NaN NaN NaN ... ... ... ... ... ... ... 34158 NaN NaN NaN 627148.40 233739.17 12.94 34159 NaN NaN NaN 627148.27 233739.35 12.92

Tranbi Over a year ago

It's very hard to read as a comment. What is the problem? You can get rid of the index column while writing to csv: df3.to_csv('CSV3.csv', index=False). What are X[m] Y[m] Z[m] ?

Collectives™ on Stack Overflow

Use pandas to combine 2 CSV files

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related