Pandas dataframes are very useful to store, manipulate and analyze data in a tabular manner, as rows and columns. They allow you to perform numerous data operations to easily retrieve the information you need. Often Python developers need to add one or more new columns to an existing dataframe. There are several ways ways to do it. In this article, we will learn the different ways to insert new column to pandas dataframe.
How to Insert New Column to Pandas DataFrame
Here are some of the most common ways to add or insert new column to a Pandas Dataframe. Let us say you have the following dataframe of 4 students with name and id values.
import pandas as pd
data = {'Name': ['John', 'Jane', 'Jim', 'Joe'],
'ID': [1, 2, 3, 4]}
df = pd.DataFrame(data)
print(df)
## output
Name ID
0 John 1
1 Jane 2
2 Jim 3
3 Joe 4
1. Using List As Column
Let us say you want to add a new column to the above dataframe, with age values for the students. You can do this by first storing the column values as a list. Then we assign this list to the new column in dataframe, using the syntax dataframe[new_column_name] = column_values_list. Here is the code to add age column to our dataframe. The age values of 4 students is stored in a list and assigned to the column.
age = [16,15,17,14]
df['age'] = age
print(df)
## output
Name ID age
0 John 1 16
1 Jane 2 15
2 Jim 3 17
3 Joe 4 14
The column values can be of different data type, it will still work. Here is an example, where some values are numbers while some are text. The following code will give the same output as above.
age = [16,'15',17,'14']
df['age'] = age
print(df)
Please note, the number of column values should be same as the length of index column, otherwise it will give an error. Here is an example where we add 3 column values whereas our dataframe has 4 index values.
age = [16,15,17]
df['age'] = age
print(df)
## output
ValueError: Length of values (3) does not match length of index (4)
Also, when we add a new column, it will modify the existing dataframe instead of creating its copy.
2. Using Assign() function
You can also use assign() method available for every dataframe to add a new column. Here is its syntax.
dataframe.assign(column_name=list_of_column_values)
You can call this function directly on dataframe variable. In it, we specify the column name, along with a list of column values.
Here is an example to add age column to our dataframe.
age_data = [16,15,17,14]
df=df.assign(age=age_data)
print(df)
## output
Name ID age
0 John 1 16
1 Jane 2 15
2 Jim 3 17
3 Joe 4 14
Please note, using assign() method will return a new dataframe, without changing the original dataframe. If you want to modify the existing dataframe, then you need to re-assign the output of assign() method back to the original dataframe variable, as shown above.
In this case also, if the number of items provided as column values is more or less than the length of dataframe index, then it will give an error. It has to exactly equal to the length of dataframe index.
Please note, in the above code, we have mentioned column name without any quotes. If you include single or double quotes in your column name, you will get the following error.
df=df.assign('age'=age_data)
OR
df=df.assign("age"=age_data)
## error output
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
Here is an example where our column ‘age bar’ has a space in it and we mention it without quotes.
df=df.assign( age bar=age_data)
## output
SyntaxError: invalid syntax. Perhaps you forgot a comma?
So if your new column has a space in it, you cannot wrap it in quotes, or mention it as it is. If your new column name has a space in it then you need to provide the column name-values as a dictionary along with unpacking (**) operator at its beginning.
age_data = [16,15,17,14]
df=df.assign(**{'age bar':age_data})
print(df)
## output
Name ID age bar
0 John 1 16
1 Jane 2 15
2 Jim 3 17
3 Joe 4 14
In fact, you need to use this unpacking operator, if you want to add multiple columns to your dataframe. Here is an example to add two columns – age and city to our dataframe.
data={"age": [16,15,17,14],
"city":['NYC','Mumbai','Tokyo','SFO']}
df=df.assign(**data)
print(df)
## output
Name ID age city
0 John 1 16 NYC
1 Jane 2 15 Mumbai
2 Jim 3 17 Tokyo
3 Joe 4 14 SFO
In the above code, first we create a dictionary containing the key-value pairs of both columns. Here each key is the column name and value is the list of column values for that column. Then we pass it to assign function prefixing it with unpacking operator.
So this is a useful method to one or more columns using a dictionary. Typically we receive data as JSON/dictionary. This method can be used to easily insert this data into a dataframe using assign method.
Another easy way to add multiple columns using assign function is to mention each column_name=column_value_list in a comma-separated manner.
df=df.assign(age=[16,15,17,14],city=['NYC','Mumbai','Paris','Tokyo'])
print(df)
## output
Name ID age city
0 John 1 16 NYC
1 Jane 2 15 Mumbai
2 Jim 3 17 Paris
3 Joe 4 14 Tokyo
3. Using Insert() Method
Both the above solutions will add new column or columns at the end of dataframe. What if you you want to insert one or more columns at a specific position of your dataframe? In this case, you need to use insert() method. It modifies the original dataframe without creating its copy. It provides many useful options to add columns at different positions of dataframe. Here is its syntax.
dataframe.insert(position, column_name, column_values, allow_duplicates = False)
You can call insert() function from every dataframe variable. The first argument is the position where you want to insert the new column, starting 0, 1, 2…. Next argument is the column name, followed by a list of column values. Lastly, you can specify if you want to allow duplicate column names or not.
Here is an example to insert age column before id and after name columns.
df.insert(1, "age", [16,15,17,14], True)
print(df)
## output
Name age ID
0 John 16 1
1 Jane 15 2
2 Jim 17 3
3 Joe 14 4
When you insert a new column at a given position, it will shift the existing column at that position to its right.
Now, if you want to call insert function again, then you need to use the new position numbers. Here is an example to insert city column before ID column. Here we have specified location=2 whereas in the above code, it was location=1.
df.insert(2, "city", ['NYC','Mumbai','Tokyo','Paris'], True)
print(df)
## output
Name age city ID
0 John 16 NYC 1
1 Jane 15 Mumbai 2
2 Jim 17 Tokyo 3
3 Joe 14 Paris 4
If you want to prepend a new column at the beginning of the dataframe, use position=0.
df.insert(0, "age", [16,15,17,14], True)
print(df)
## output
age Name ID
0 16 John 1
1 15 Jane 2
2 17 Jim 3
3 14 Joe 4
If you want to add a new column at the end of dataframe, use position as len(df.columns). This will calculate the number of columns and use this value as position.
df.insert(len(df.columns), "age", [16,15,17,14], True)
print(df)
## output
Name ID age
0 John 1 16
1 Jane 2 15
2 Jim 3 17
3 Joe 4 14
As you can see, insert() makes it easy to add new column at different positions of dataframe.
4. Using loc() method
You can also use loc() function available for every dataframe, to insert new column in pandas dataframe. It is a very useful function that not only allows you to insert or modify column at a given position but also allows you to create new columns whose values are derived from other columns. Here is the syntax to add new column using loc method.
dataframe.loc[:,column_name] = list_of_column_values
Here is an example to add a new column using loc.
df.loc[:,'age'] =[16,'15',17,'14']
print(df)
## output
Name ID age
0 John 1 16
1 Jane 2 15
2 Jim 3 17
3 Joe 4 14
You can also use it to create a new column based on values derived from other column in dataframe. Here is an example to create a new column based on ID column value.
df.loc[df['ID'] >= 3, 'Category'] = 'Tall'
df.loc[df['ID'] < 3, 'Category'] = 'Short'
print(df)
## output
Name ID Category
0 John 1 Short
1 Jane 2 Short
2 Jim 3 Tall
3 Joe 4 Tall
In the above code, we have provided 2 arguments to loc method. The first method is the condition and the second argument is the column name. We have assigned it values Tall/Short depending on the value of ID column.
5. Using Dictionary
In this solution, we use a dict, along with map function, to add one or more columns to a dataframe. First we define the dict where the key-value pairs are such that all key values belong to one of the existing column values, and all values are equal to the new column’s values. In this example, each key is a value for column name.
address = {'John': 'NewYork', 'Joe': 'Chicago',
'Jane': 'Boston', 'Jim': 'Miami'}
Next, we call map function on this column, and assign its result to a new column Address
df['Address'] = df['Name'].map(address)
print(df)
## output
Name ID Address
0 John 1 NewYork
1 Jane 2 Boston
2 Jim 3 Miami
3 Joe 4 Chicago
In the above dataframe, you will see that although our original dict contained key-value pairs in random order, each value has been mapped to its corresponding Name column value. For example, if Name=Jane, then Address=Boston, and so on. This is a great way to add column from other data source where the data may be present as JSON or dict, and the order of key-value pairs may not be same as the order of column values in your dataframe.
Please note, if there is no key-value pair matching the column value of your dataframe, then the new column’s value will be NaN. In the following example, the dict has only 3 key-value pairs but the dataframe has 4 rows. So one value of Address column is NaN.
address = {'John': 'NewYork', 'Joe': 'Chicago',
'Jane': 'Boston'}
df['Address'] = df['Name'].map(address)
print(df)
## output
Name ID Address
0 John 1 NewYork
1 Jane 2 Boston
2 Jim 3 NaN
3 Joe 4 Chicago
6. Add Derived Columns
Sometimes you may need to create a new column based on value of another column. There are several ways to do this. One of the simplest ways to do this is using mathematical operators. Here is an example to create a new column new_id using values of column ID.
df['new_ID'] = df['ID'] + 100
print(df)
## output
Name ID new_ID
0 John 1 101
1 Jane 2 102
2 Jim 3 103
3 Joe 4 104
As we have seen earlier, you can also use loc method to create a new column based on condition applied to the values of another existing column.
df.loc[df['ID'] >= 3, 'Type'] = 'Old'
df.loc[df['ID'] < 3, 'Type'] = 'Young'
print(df)
## output
Name ID Category
0 John 1 Young
1 Jane 2 Young
2 Jim 3 Old
3 Joe 4 Old
7. Add Column from Another Dataframe
Another common requirement is to be able to insert a new column in a dataframe from another dataframe. You can easily to this using assignment operator ‘=’.
First, we define the second dataframe.
df2 = pd.DataFrame({'Marks': [70, 80, 90, 60]})
Next, we use assignment operator to assign the values of Marks column in df2 dataframe to the new Marks column in df dataframe.
df['Marks'] = df2['Marks']
print(df)
## output
Name ID Marks
0 John 1 70.0
1 Jane 2 80.0
2 Jim 3 90.0
3 Joe 4 60.0
Please note, the dataframe df did not have Marks column earlier, but was created during assignment of values from df2. The column values of new column will have same order as they are in the other dataframe. If the number of values in the other dataframe is less than the number of rows in our original dataframe, then the absent values will populated as NaN. Here is an example, where our second dataframe has only 3 values whereas our original dataframe has 4 rows.
df2 = pd.DataFrame({'Marks': [70, 80, 90]})
df['Marks'] = df2['Marks']
print(df)
## output
Name ID Marks
0 John 1 70.0
1 Jane 2 80.0
2 Jim 3 90.0
3 Joe 4 NaN
8. Adding Multiple Columns
If you want to add multiple columns, then the easiest way is to create a dictionary with column names and values. In this dictionary, each key is the column name and and its value is a list of column values.
data={"age": [16,15,17,14],
"city":['NYC','Mumbai','Tokyo','SFO']}
Once you have defined this dictionary, then you can call assign() function on dataframe, and pass this dictionary variable prefixed with unpacking ‘**’ operator, as shown.
df=df.assign(**data)
print(df)
## output
Name ID age city
0 John 1 16 NYC
1 Jane 2 15 Mumbai
2 Jim 3 17 Tokyo
3 Joe 4 14 SFO
Alternatively, you can call assign function for your dataframe and pass the column_name-column_value pairs in a comma-separated manner.
df=df.assign(age=[16,15,17,14],city=['NYC','Mumbai','Paris','Tokyo'])
print(df)
This code will give the same output as above.
Conclusion
In this article, we have learnt several different ways to insert new column in Pandas dataframe. We learnt how to add column using various methods. We also learnt how to add multiple columns. We have also seen how to derive column from another dataframe column, as well as create new column using column from another dataframe. You can use any of these methods as per your requirement.
FAQs
1. How to Add Single column to dataframe?
You can add new column by directly assigning its values as a list to the dataframe. You can also use insert, loc or assign for this purpose.
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['John','Jane','Jim']})
# Adding a new column 'Marks'
df['Marks'] = [70, 80, 90]
2. How to Add Multiple columns to dataframe
You can add multiple columns to dataframe using assign() method. You can either provide column name-value pairs in a comma separated manner as shown.
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['John','Jane','Jim']})
# Adding new columns Marks and Age
df.assign(Marks=[70, 80, 90],Age=[15,14,16])
Also read:
How to Change Order of Dataframe Columns
How to Select Multiple Columns in Pandas Dataframe
How to Randomly Select Item from Python List

Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.