How to Merge and Join Pandas Dataframes

Last updated on May 19th, 2025 at 06:06 am

Pandas dataframe makes it easy to store and analyze data as tables using rows and columns. They support tons of functions and operations simplifying data analysis and reporting. Sometimes, Python developers need to merge and join Pandas dataframes. In this article, we will learn the different ways to do this.

How to Merge and Join Pandas Dataframes

Here are the different ways to merge and join pandas dataframes, using concat(), merge( ) and join() functions. Let us say you have the following two dataframes to work with.

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
'Age': [27, 24, 22],
'City': ['NYC', 'San Francisco','London'],
}

d2 = {'Name': ['Jane','Jessica','Tess'],
'Age': [25, 34, 32],
'City': ['Chicago', 'Los Angeles','Philadelphia'],
}

df1 = pd.DataFrame(d1, index=[0, 1, 2])

df2 = pd.DataFrame(d2, index=[2, 3, 4])

print(df1)
print(df2)

## output

Name Age City
0 John 27 NYC
1 Jim 24 San Francisco
2 Tim 22 London

Name Age City
2 Jane 25 Chicago
3 Jessica 34 Los Angeles
4 Tess 32 Philadelphia

1. Concatenate Dataframe

Using concat

Pandas library supports concat() function that allows you to concatenate 2 or more dataframes by passing a list of dataframes as argument. Here is its syntax.

pd.concat([df1, df2, ...])

It returns a new concatenated dataframe that can be stored as a separate dataframe or be used to overwrite one of the original dataframes.

res = pd.concat([df1,df2])
print(res)

## output

Name Age City
0 John 27 NYC
1 Jim 24 San Francisco
2 Tim 22 London
2 Jane 25 Chicago
3 Jessica 34 Los Angeles
4 Tess 32 Philadelphia

In this approach, if you end up with duplicate rows or column values, then you can easily find duplicates using duplicated() function.

By specifying type of concatenation

Concat() function allows you to perform different types of joins such as inner joins and outer joins. You can do this by using axis and join arguments. This is similar to SQL joins where the columns of two tables are combined side by side, using common column values, to create a larger table. In this case, the dataframe rows are not added one below the other, but one beside the other.

res = pd.concat([df1,df2], axis=1, join='inner')
print(res)

## output

Name Age City Name Age City
2 Tim 22 London Jane 25 Chicago

By ignoring indexes

As seen above, by default, the result dataframe’s index values are borrowed from the original dataframes. But if you want the result dataframe to have a new set of indexes, you can use ignore_index option.

res = pd.concat([df1,df2], ignore_index=True)
print(res)

## output

Name Age City
0 John 27 NYC
1 Jim 24 San Francisco
2 Tim 22 London
3 Jane 25 Chicago
4 Jessica 34 Los Angeles
5 Tess 32 Philadelphia

Using group of keys

While concatenating dataframe, you can also assign group keys to groups of rows belonging to each dataframe. You can do this using keys argument, as shown.

res = pd.concat([df1,df2], keys = ['x','y'])
print(res)

## output

Name Age City
x 0 John 27 NYC
1 Jim 24 San Francisco
2 Tim 22 London
y 2 Jane 25 Chicago
3 Jessica 34 Los Angeles
4 Tess 32 Philadelphia

2. Merge Dataframe

Merging dataframe is similar to SQL joins where you combine rows from multiple tables using common column values. You can do this using merge() function available in pandas library.

Based on one or more keys

Here is the command to merge two dataframes using common column names, or keys. You need to mention the dataframes to be merged, one after the other. Then you need to mention the list of keys to be used for merging, for ‘on’ argument.

pd.merge(df1, df2, on=[key1, key2,...]

Let us say you have the following dataframes, with the ID columns having common values

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
'Age': [27, 24, 22],
'City': ['NYC', 'San Francisco','London'],
'ID':[1, 2, 3]
}

d2 = {'Name': ['Jane','Jessica','Tess'],
'Age': [25, 34, 32],
'City': ['Chicago', 'Los Angeles','Philadelphia'],
'ID':[2,3,4]
}

df1 = pd.DataFrame(d1, index=[0, 1, 2])
df2 = pd.DataFrame(d2, index=[2, 3, 4])

Here is the code to merge the two dataframes, using ID column.

res = pd.merge(df1,df2, on=['ID'])
print(res)

## output

Name_x Age_x City_x ID Name_y Age_y City_y
0 Jim 24 San Francisco 2 Jane 25 Chicago
1 Tim 22 London 3 Jessica 34 Los Angeles

By default, pandas will try to do an INNER JOIN where it will pick rows with common values for ID column. If there are no common values in the key column, then the result will be an empty dataframe.

By specifying type of merge

You can also specify the type of join, using ‘how’ argument. As you can see in above dataframes, the name column does not have any common values. In such cases, you can do a LEFT or RIGHT join, preserving the column values of one of the two tables. Here is an example of LEFT JOIN. The values from left dataframe are preserved whereas the corresponding values for RIGHT table are set as NaN.

res = pd.merge(df1,df2, how='left',on=['Name'])
print(res)

## output

Name Age_x City_x ID_x Age_y City_y ID_y
0 John 27 NYC 1 NaN NaN NaN
1 Jim 24 San Francisco 2 NaN NaN NaN
2 Tim 22 London 3 NaN NaN NaN

If you want to do an outer join, then specify how=’outer’.

res = pd.merge(df1,df2, how='outer',on=['Name'])
print(res)

## output

Name Age_x City_x ID_x Age_y City_y ID_y
0 Jane NaN NaN NaN 25.0 Chicago 2.0
1 Jessica NaN NaN NaN 34.0 Los Angeles 3.0
2 Jim 24.0 San Francisco 2.0 NaN NaN NaN
3 John 27.0 NYC 1.0 NaN NaN NaN
4 Tess NaN NaN NaN 32.0 Philadelphia 4.0
5 Tim 22.0 London 3.0 NaN NaN NaN

In all the examples above, you can see that since both tables have same column names, merge() function will automatically assign suffixes x, y, … to overlapping column names.

3. Join Dataframe

Using join()

Like merge, you can also use join() function to combine two dataframes. It is available in all dataframes. Here is its syntax.

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)

Here is an example to illustrate it.

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
'Age': [27, 24, 22],
'ID':[1, 2, 3]
}

d2 = {
'City': ['NYC', 'San Francisco','London'],
}

df1 = pd.DataFrame(d1, index=[0, 1, 2])
df2 = pd.DataFrame(d2, index=[2, 3, 4])

res = df1.join(df2, on='ID')
print(res)

## output

Name Age ID City
0 John 27 1 NaN
1 Jim 24 2 NYC
2 Tim 22 3 San Francisco

In the above example, the dataframe df2 is joined to df1 using index values. It works well if your dataframes do not have overlapping columns.

By default, the join() function does a left join where it retains all rows and columns of left dataframe, and populates the other dataframe’s values for matching indexes.

If your dataframes have overlapping columns, then you must mention the suffix to differentiate them, otherwise you will get an error.

Here is an example to join the above two dataframes using ID column as merge key.

import pandas as pd

d1 = {'Name': ['John','Jim','Tim'],
'Age': [27, 24, 22],
'City': ['NYC', 'San Francisco','London'],
'ID':[1, 2, 3]
}

d2 = {'Name': ['Jane','Jessica','Tess'],
'Age': [25, 34, 32],
'City': ['Chicago', 'Los Angeles','Philadelphia'],
'ID':[2,3,4]
}

df1 = pd.DataFrame(d1, index=[0, 1, 2])
df2 = pd.DataFrame(d2, index=[2, 3, 4])

res = df1.join(df2, on='ID')
print(res)

## output
ValueError: columns overlap but no suffix specified: Index(['Name', 'Age', 'City', 'ID'], dtype='object')

Since both dataframes have overlapping columns, we get an error since we did not specify a suffix to differentiate common columns.

To fix this problem, we use lsuffix argument and specify the suffix for overlapping column names.

res = df1.join(df2, on='ID',lsuffix='_a')
print(res)

## output

Name_a Age_a City_a ID_a Name Age City ID
0 John 27 NYC 1 NaN NaN NaN NaN
1 Jim 24 San Francisco 2 Jane 25.0 Chicago 2.0
2 Tim 22 London 3 Jessica 34.0 Los Angeles 3.0

Using how argument

Like merge function, you can also specify the type of join using ‘how’ argument. Here is an example to do inner join.

res = df1.join(df2, on='ID',lsuffix='_a', how='inner')
print(res)

## output

Name_a Age_a City_a ID_a Name Age City ID
1 Jim 24 San Francisco 2 Jane 25 Chicago 2
2 Tim 22 London 3 Jessica 34 Los Angeles 3

Here is an example to do left join. In this case, the values of left dataframe are retained, along with matching values from right dataframe.

res = df1.join(df2, on='ID',lsuffix='_a', how='left')
print(res)

## output

Name_a Age_a City_a ID_a Name Age City ID
0 John 27 NYC 1 NaN NaN NaN NaN
1 Jim 24 San Francisco 2 Jane 25.0 Chicago 2.0
2 Tim 22 London 3 Jessica 34.0 Los Angeles 3.0

Conclusion

In this article, we have learnt several simple ways to combine data from two dataframes. Depending on your requirement, you can choose to merge, join or concatenate dataframes. concat() function allows you to combine dataframes vertically one below the other, or horizontally one beside the other. On the other hand, join() and merge() functions combine dataframes horizontally only.

Also read:
How to Create Pivot Tables in Python
How to Connect Pandas to Database
Top Machine Learning Libraries in Python

Leave a Reply

Your email address will not be published. Required fields are marked *