How to Insert New Column to Pandas DataFrame

Pandas dataframes are very useful to store, manipulate and analyze data in a tabular manner, as rows and columns. They allow you to perform numerous data operations to easily retrieve the information you need. Often Python developers need to add one or more new columns to an existing dataframe. There are several ways ways to do it. In this article, we will learn the different ways to insert new column to pandas dataframe.

Table of Contents

How to Insert New Column to Pandas DataFrame

Here are some of the most common ways to add or insert new column to a Pandas Dataframe. Let us say you have the following dataframe of 4 students with name and id values.

import pandas as pd

data = {'Name': ['John', 'Jane', 'Jim', 'Joe'],
        'ID': [1, 2, 3, 4]}

df = pd.DataFrame(data)

print(df)

## output

   Name  ID
0  John   1
1  Jane   2
2   Jim   3
3   Joe   4

1. Using List As Column

Let us say you want to add a new column to the above dataframe, with age values for the students. You can do this by first storing the column values as a list. Then we assign this list to the new column in dataframe, using the syntax dataframe[new_column_name] = column_values_list. Here is the code to add age column to our dataframe. The age values of 4 students is stored in a list and assigned to the column.

age = [16,15,17,14]
df['age'] = age
print(df)

## output

   Name  ID  age
0  John   1   16
1  Jane   2   15
2   Jim   3   17
3   Joe   4   14

The column values can be of different data type, it will still work. Here is an example, where some values are numbers while some are text. The following code will give the same output as above.

age = [16,'15',17,'14']
df['age'] = age
print(df)

Please note, the number of column values should be same as the length of index column, otherwise it will give an error. Here is an example where we add 3 column values whereas our dataframe has 4 index values.

age = [16,15,17]
df['age'] = age
print(df)

## output

ValueError: Length of values (3) does not match length of index (4)

Also, when we add a new column, it will modify the existing dataframe instead of creating its copy.

2. Using Assign() function

You can also use assign() method available for every dataframe to add a new column. Here is its syntax.

dataframe.assign(column_name=list_of_column_values)

You can call this function directly on dataframe variable. In it, we specify the column name, along with a list of column values.

Here is an example to add age column to our dataframe.

age_data = [16,15,17,14]
df=df.assign(age=age_data)
print(df)

## output

   Name  ID  age
0  John   1   16
1  Jane   2   15
2   Jim   3   17
3   Joe   4   14

Please note, using assign() method will return a new dataframe, without changing the original dataframe. If you want to modify the existing dataframe, then you need to re-assign the output of assign() method back to the original dataframe variable, as shown above.

In this case also, if the number of items provided as column values is more or less than the length of dataframe index, then it will give an error. It has to exactly equal to the length of dataframe index.

Please note, in the above code, we have mentioned column name without any quotes. If you include single or double quotes in your column name, you will get the following error.

df=df.assign('age'=age_data)
OR
df=df.assign("age"=age_data)

## error output

SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

Here is an example where our column ‘age bar’ has a space in it and we mention it without quotes.

df=df.assign( age bar=age_data)

## output
SyntaxError: invalid syntax. Perhaps you forgot a comma?

So if your new column has a space in it, you cannot wrap it in quotes, or mention it as it is. If your new column name has a space in it then you need to provide the column name-values as a dictionary along with unpacking (**) operator at its beginning.

age_data = [16,15,17,14]
df=df.assign(**{'age bar':age_data})
print(df)

## output

   Name  ID  age bar
0  John   1       16
1  Jane   2       15
2   Jim   3       17
3   Joe   4       14

In fact, you need to use this unpacking operator, if you want to add multiple columns to your dataframe. Here is an example to add two columns – age and city to our dataframe.

data={"age": [16,15,17,14],
      "city":['NYC','Mumbai','Tokyo','SFO']}
df=df.assign(**data)
print(df)

## output

   Name  ID  age   city
0  John   1   16    NYC
1  Jane   2   15  Mumbai
2   Jim   3   17  Tokyo
3   Joe   4   14    SFO

In the above code, first we create a dictionary containing the key-value pairs of both columns. Here each key is the column name and value is the list of column values for that column. Then we pass it to assign function prefixing it with unpacking operator.

So this is a useful method to one or more columns using a dictionary. Typically we receive data as JSON/dictionary. This method can be used to easily insert this data into a dataframe using assign method.

Another easy way to add multiple columns using assign function is to mention each column_name=column_value_list in a comma-separated manner.

df=df.assign(age=[16,15,17,14],city=['NYC','Mumbai','Paris','Tokyo'])
print(df)

## output
   Name  ID  age    city
0  John   1   16     NYC
1  Jane   2   15  Mumbai
2   Jim   3   17   Paris
3   Joe   4   14   Tokyo

3. Using Insert() Method

Both the above solutions will add new column or columns at the end of dataframe. What if you you want to insert one or more columns at a specific position of your dataframe? In this case, you need to use insert() method. It modifies the original dataframe without creating its copy. It provides many useful options to add columns at different positions of dataframe. Here is its syntax.

dataframe.insert(position, column_name, column_values, allow_duplicates = False)

You can call insert() function from every dataframe variable. The first argument is the position where you want to insert the new column, starting 0, 1, 2…. Next argument is the column name, followed by a list of column values. Lastly, you can specify if you want to allow duplicate column names or not.

Here is an example to insert age column before id and after name columns.

df.insert(1, "age", [16,15,17,14], True)
print(df)

## output

   Name  age  ID
0  John   16   1
1  Jane   15   2
2   Jim   17   3
3   Joe   14   4

When you insert a new column at a given position, it will shift the existing column at that position to its right.

Now, if you want to call insert function again, then you need to use the new position numbers. Here is an example to insert city column before ID column. Here we have specified location=2 whereas in the above code, it was location=1.

df.insert(2, "city", ['NYC','Mumbai','Tokyo','Paris'], True)
print(df)

## output

   Name  age   city  ID
0  John   16    NYC   1
1  Jane   15  Mumbai  2
2   Jim   17  Tokyo   3
3   Joe   14  Paris   4

If you want to prepend a new column at the beginning of the dataframe, use position=0.

df.insert(0, "age", [16,15,17,14], True)
print(df)

## output

   age  Name  ID
0   16  John   1
1   15  Jane   2
2   17   Jim   3
3   14   Joe   4

If you want to add a new column at the end of dataframe, use position as len(df.columns). This will calculate the number of columns and use this value as position.

df.insert(len(df.columns), "age", [16,15,17,14], True)
print(df)

## output
   Name  ID  age
0  John   1   16
1  Jane   2   15
2   Jim   3   17
3   Joe   4   14

As you can see, insert() makes it easy to add new column at different positions of dataframe.

4. Using loc() method

You can also use loc() function available for every dataframe, to insert new column in pandas dataframe. It is a very useful function that not only allows you to insert or modify column at a given position but also allows you to create new columns whose values are derived from other columns. Here is the syntax to add new column using loc method.

dataframe.loc[:,column_name] = list_of_column_values

Here is an example to add a new column using loc.

df.loc[:,'age'] =[16,'15',17,'14']
print(df)

## output
   Name  ID age
0  John   1  16
1  Jane   2  15
2   Jim   3  17
3   Joe   4  14

You can also use it to create a new column based on values derived from other column in dataframe. Here is an example to create a new column based on ID column value.

df.loc[df['ID'] >= 3, 'Category'] = 'Tall'
df.loc[df['ID'] < 3, 'Category'] = 'Short'
print(df)

## output

   Name  ID Category
0  John   1    Short
1  Jane   2    Short
2   Jim   3     Tall
3   Joe   4     Tall

In the above code, we have provided 2 arguments to loc method. The first method is the condition and the second argument is the column name. We have assigned it values Tall/Short depending on the value of ID column.

5. Using Dictionary

In this solution, we use a dict, along with map function, to add one or more columns to a dataframe. First we define the dict where the key-value pairs are such that all key values belong to one of the existing column values, and all values are equal to the new column’s values. In this example, each key is a value for column name.

address = {'John': 'NewYork', 'Joe': 'Chicago', 
            'Jane': 'Boston', 'Jim': 'Miami'}

Next, we call map function on this column, and assign its result to a new column Address

df['Address'] = df['Name'].map(address)
print(df)

## output

   Name  ID  Address
0  John   1  NewYork
1  Jane   2   Boston
2   Jim   3    Miami
3   Joe   4  Chicago

In the above dataframe, you will see that although our original dict contained key-value pairs in random order, each value has been mapped to its corresponding Name column value. For example, if Name=Jane, then Address=Boston, and so on. This is a great way to add column from other data source where the data may be present as JSON or dict, and the order of key-value pairs may not be same as the order of column values in your dataframe.

Please note, if there is no key-value pair matching the column value of your dataframe, then the new column’s value will be NaN. In the following example, the dict has only 3 key-value pairs but the dataframe has 4 rows. So one value of Address column is NaN.

address = {'John': 'NewYork', 'Joe': 'Chicago', 
            'Jane': 'Boston'}


df['Address'] = df['Name'].map(address)
print(df)

## output

   Name  ID  Address
0  John   1  NewYork
1  Jane   2   Boston
2   Jim   3      NaN
3   Joe   4  Chicago

6. Add Derived Columns

Sometimes you may need to create a new column based on value of another column. There are several ways to do this. One of the simplest ways to do this is using mathematical operators. Here is an example to create a new column new_id using values of column ID.

df['new_ID'] = df['ID'] + 100
print(df)

## output

   Name  ID  new_ID
0  John   1     101
1  Jane   2     102
2   Jim   3     103
3   Joe   4     104

As we have seen earlier, you can also use loc method to create a new column based on condition applied to the values of another existing column.

df.loc[df['ID'] >= 3, 'Type'] = 'Old'
df.loc[df['ID'] < 3, 'Type'] = 'Young'
print(df)

## output

   Name  ID Category
0  John   1    Young
1  Jane   2    Young
2   Jim   3     Old
3   Joe   4     Old

7. Add Column from Another Dataframe

Another common requirement is to be able to insert a new column in a dataframe from another dataframe. You can easily to this using assignment operator ‘=’.

First, we define the second dataframe.

df2 = pd.DataFrame({'Marks': [70, 80, 90, 60]})

Next, we use assignment operator to assign the values of Marks column in df2 dataframe to the new Marks column in df dataframe.

df['Marks'] = df2['Marks']
print(df)

## output
   Name  ID  Marks
0  John   1   70.0
1  Jane   2   80.0
2   Jim   3   90.0
3   Joe   4   60.0

Please note, the dataframe df did not have Marks column earlier, but was created during assignment of values from df2. The column values of new column will have same order as they are in the other dataframe. If the number of values in the other dataframe is less than the number of rows in our original dataframe, then the absent values will populated as NaN. Here is an example, where our second dataframe has only 3 values whereas our original dataframe has 4 rows.

df2 = pd.DataFrame({'Marks': [70, 80, 90]})
df['Marks'] = df2['Marks']
print(df)

## output

   Name  ID  Marks
0  John   1   70.0
1  Jane   2   80.0
2   Jim   3   90.0
3   Joe   4    NaN

8. Adding Multiple Columns

If you want to add multiple columns, then the easiest way is to create a dictionary with column names and values. In this dictionary, each key is the column name and and its value is a list of column values.

data={"age": [16,15,17,14],
      "city":['NYC','Mumbai','Tokyo','SFO']}

Once you have defined this dictionary, then you can call assign() function on dataframe, and pass this dictionary variable prefixed with unpacking ‘**’ operator, as shown.

df=df.assign(**data)
print(df)

## output

   Name  ID  age   city
0  John   1   16    NYC
1  Jane   2   15  Mumbai
2   Jim   3   17  Tokyo
3   Joe   4   14    SFO

Alternatively, you can call assign function for your dataframe and pass the column_name-column_value pairs in a comma-separated manner.

df=df.assign(age=[16,15,17,14],city=['NYC','Mumbai','Paris','Tokyo'])
print(df)

This code will give the same output as above.

Conclusion

In this article, we have learnt several different ways to insert new column in Pandas dataframe. We learnt how to add column using various methods. We also learnt how to add multiple columns. We have also seen how to derive column from another dataframe column, as well as create new column using column from another dataframe. You can use any of these methods as per your requirement.

FAQs

1. How to Add Single column to dataframe?

You can add new column by directly assigning its values as a list to the dataframe. You can also use insert, loc or assign for this purpose.

import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['John','Jane','Jim']})
# Adding a new column 'Marks'
df['Marks'] = [70, 80, 90]

2. How to Add Multiple columns to dataframe

You can add multiple columns to dataframe using assign() method. You can either provide column name-value pairs in a comma separated manner as shown.

import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['John','Jane','Jim']})
# Adding new columns Marks and Age
df.assign(Marks=[70, 80, 90],Age=[15,14,16])

How to Copy Data from One Dictionary to Another in Python

How to Select Rows from Dataframe Based on Column Values

How to Convert Pandas Dataframe to Dictionary

How to Concatenate Two Lists in Python

Sreeram Sreenivasan

Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.

How to Insert New Column to Pandas DataFrame

How to Insert New Column to Pandas DataFrame

1. Using List As Column

2. Using Assign() function

3. Using Insert() Method

4. Using loc() method

5. Using Dictionary

6. Add Derived Columns

7. Add Column from Another Dataframe

8. Adding Multiple Columns

Conclusion

FAQs

Related posts:

How to Copy Data from One Dictionary to Another in Python

How to Select Rows from Dataframe Based on Column Values

How to Convert Pandas Dataframe to Dictionary

How to Concatenate Two Lists in Python

Leave a Reply Cancel reply

How to Insert New Column to Pandas DataFrame

1. Using List As Column

2. Using Assign() function

3. Using Insert() Method

4. Using loc() method

5. Using Dictionary

6. Add Derived Columns

7. Add Column from Another Dataframe

8. Adding Multiple Columns

Conclusion

FAQs

Related posts:

How to Copy Data from One Dictionary to Another in Python

How to Select Rows from Dataframe Based on Column Values

How to Convert Pandas Dataframe to Dictionary

How to Concatenate Two Lists in Python

Share this:

Leave a Reply Cancel reply