Pandas Merge - How to avoid duplicating columns
You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.
cols_to_use = df2.columns.difference(df.columns)
Then perform the merge (note this is an index object but it has a handy tolist()
method).
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')
This will avoid any columns clashing in the merge.
Pandas merge without duplicating columns
Use DataFrame.combine_first
with indices by postcode
in both DataFrames and then if necessary add DataFrame.reindex
for same order of columns like original df1
:
print (df1)
postcode lat lon plus 32 more columns
0 M20 2.3 0.2 NaN NaN NaN NaN
1 LS1 NaN NaN NaN NaN NaN NaN
2 LS1 NaN NaN NaN NaN NaN NaN
3 LS2 NaN NaN NaN NaN NaN NaN
4 M21 2.4 0.3 NaN NaN NaN NaN
df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')
df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
lat lon plus 32 more columns
postcode
LS1 1.4 0.1 NaN NaN NaN NaN
LS1 1.4 0.1 NaN NaN NaN NaN
LS2 1.5 0.2 NaN NaN NaN NaN
M20 2.3 0.2 NaN NaN NaN NaN
M21 2.4 0.3 NaN NaN NaN NaN
How to merge Pandas dataframes without duplicating columns
Your problem is that you don't really want to just merge
everything. You need to concat
your first set of frames, then merge.
import pandas as pd
import numpy as np
base_frame.merge(pd.concat([frame1, frame2]), how='left')
# id supplier1_match0
#0 1 x
#1 2 2x
#2 3 NaN
Alternatively, you could define base_frame
so that it has all of the relevant columns of the other frames and set id
to be the index and use .update
. This ensures base_frame
remains the same size, while the above does not. Though data would be over-written if there are multiple non-null values for a given cell.
base_frame = pd.DataFrame({'id':[1,2,3]}).assign(supplier1_match0 = np.NaN).set_index('id')
for df in [frame1, frame2]:
base_frame.update(df.set_index('id'))
print(base_frame)
supplier1_match0
id
1 x
2 2x
3 NaN
Avoid duplicate columns while merging with pandas
I would pd.concat
similar structured dataframes then merge
the others like this:
df.merge(pd.concat([df1, df3]), on='date_time', how='left')\
.merge(df2, on='date_time', how='left')
Output:
date_time potato carrot
0 2018-06-01 00:00:00 NaN NaN
1 2018-06-01 00:30:00 13.0 NaN
2 2018-06-01 01:00:00 21.0 NaN
3 2018-06-01 01:30:00 27.0 14.0
Per comments below:
df = pd.DataFrame({'date_time':['2018-06-01 00:00:00','2018-06-01 00:30:00','2018-06-01 01:00:00','2018-06-01 01:30:00']})
# Dataframes to merge to reference dataframe
df1 = pd.DataFrame({'date_time':['2018-06-01 00:30:00','2018-06-01 01:00:00'],
'potato':[13,21]})
df2 = pd.DataFrame({'date_time':['2018-06-01 01:30:00','2018-06-01 02:00:00','2018-06-01 02:30:00'],
'carrot':[14,8,32]})
df3 = pd.DataFrame({'date_time':['2018-06-01 01:30:00', '2018-06-01 02:00:00'],'potato':[27,31], 'zucchini':[11,1]})
df.merge(pd.concat([df1, df3]), on='date_time', how='left').merge(df2, on='date_time', how='left')
Output:
date_time potato zucchini carrot
0 2018-06-01 00:00:00 NaN NaN NaN
1 2018-06-01 00:30:00 13.0 NaN NaN
2 2018-06-01 01:00:00 21.0 NaN NaN
3 2018-06-01 01:30:00 27.0 11.0 14.0
Python Pandas merge dataframes without duplicating columns
FYI, you do not need the reduce function, you can simply use:
df_all = df1.merge(df2)
It is duplicating columns because you are merging on 'Name'. If all your columns are the same, you can drop the on='Name' argument and it will merge on all common columns instead of duplicating them.
Alternatively, you can merge only the non-duplicate columns from df2:
df_all = df1.merge(df2[['Name','Age']])
Merging multiple data frames causing duplicate column names
You can do
s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]:
key value_0 value_1 value_2 value_3
0 A -1.957968 NaN -0.852135 -0.976960
1 B 1.545932 -0.276838 NaN 0.197615
2 C -2.149727 NaN -0.364382 0.349993
3 D 0.524990 -0.476655 NaN NaN
4 E NaN -2.135870 0.798782 NaN
5 F NaN 1.456544 -0.255705 0.447279
Related Topics
How to Change Python Version in Anaconda Spyder
How to Limit the User Input to Only Integers in Python
How to Use Variables in SQL Statement in Python
Reduce Multi-Index/Multi-Level Dataframe to Single Index, Single Level
How to Convert Number 1 to a Boolean in Python
Counting the No. of Black to White Pixels in the Image Using Opencv
Python How to Use Excelwriter to Write into an Existing Worksheet
How to Extract Column Value Within Square Brackets in Pyspark
Regular Expression to Check Whitespace in the Beginning and End of a String
Python Creating Dictionary from Excel Data
How to Update a Pyspark Dataframe With New Values from Another Dataframe
Python - How to Make User Input Not Case Sensitive
How to Plot Pandas Dataframe With Date (Year/Month)
How to Repeatedly Execute a Function Every X Seconds
Loading and Parsing a Json File With Multiple Json Objects
How to Install a Module for All Users With Pip on Linux
How to Stop Execution of Python Script in Visual Studio Code