How to Convert Pandas Dataframe to Dictionary

Python Pandas is a powerful library that allows you to store and manipulate data as columns and rows, using dataframes. It also provides many useful functions to convert different data types into dataframes, or even export data into dataframes. But sometimes, you may need to convert dataframe to dictionary. In this article, we will how to convert Pandas dataframe to dictionary.

Table of Contents

Why Convert Dataframe to Dictionary

There are several reasons why you may need to convert dataframe to dictionary:

Dictionary data types can be easily converted into JSON and vice versa. So sometimes, you may need to convert a dataframe to dictionary
Dictionary allows you to easily access data using keys. It is especially useful if you have nested data. In such cases, you may need to convert dataframe to dictionary.
Sometimes external systems and processes may require you to send data as dictionary. In such cases, it is convenient to convert dataframe to dictionary.

How to Convert Pandas Dataframe to Dictionary

Let us look at the different ways to convert Pandas dataframe to dictionary.

About to_dict

We will mainly use to_dict function to convert dataframe into dictionary. It can be directly called from any dataframe object and allows you to transform dataframe into different types of data such as dictionary, list, series, etc. It has orient parameter that accepts different values that determine the format of result. They are covered below one by one.

Let us say you have the following dataframe.

import pandas as pd

data = {'Name': ['John', 'Jane', 'Jim', 'Joe'],
        'Age': [28, 24, 25, 22],
        'City': ['New York', 'Paris', 'Berlin', 'London']}

df = pd.DataFrame(data)
print(df)

## output 

   Name  Age      City
0  John   28  New York
1  Jane   24     Paris
2   Jim   25    Berlin
3   Joe   22    London

1. Using dict option

The default orient parameter value is dict and it will convert dataframe into dictionary with index as keys and corresponding values. Even if you do not specify any orient value, still to_dict will use this option.

Here is its syntax.

dataframe.to_dict('dict')
OR
dataframe.to_dict()

Here is an example to demonstrate its use.

dt=df.to_dict()
print(dt)

## output

{'Name': {0: 'John', 1: 'Jane', 2: 'Jim', 3: 'Joe'}, 'Age': {0: 28, 1: 24, 2: 25, 3: 22}, 'City': {0: 'New York', 1: 'Paris', 2: 'Berlin', 3: 'London'}}

The above result is a nested dict. The outermost keys are the column names. The keys of inner dict are the indexes and the final values of these keys are the dataframe column values.

2. Using list option

When you use this option, the column names will be keys and the column values will become the lists.

dt=df.to_dict('list')
print(dt)

## output 

{'Name': ['John', 'Jane', 'Jim', 'Joe'], 'Age': [28, 24, 25, 22], 'City': ['New York', 'Paris', 'Berlin', 'London']}

3. Using series option

When you use this option, column names are keys and column values are series of column values.

dt=df.to_dict('series')
print(dt)

## output

{'Name': 0    John
1    Jane
2     Jim
3     Joe
Name: Name, dtype: object, 'Age': 0    28
1    24
2    25
3    22
Name: Age, dtype: int64, 'City': 0    New York
1       Paris
2      Berlin
3      London
Name: City, dtype: object}

4. Using split option

In this case, the result dict will contain 3 keys – index, columns and data. The first key’s value is a list of all indexes. The second key’s value is a list of all column names. The third key’s value is a nested list where each inner list is a list of row values.

dt=df.to_dict('split')
print(dt)

## output

{'index': [0, 1, 2, 3], 'columns': ['Name', 'Age', 'City'], 'data': [['John', 28, 'New York'], ['Jane', 24, 'Paris'], ['Jim', 25, 'Berlin'], ['Joe', 22, 'London']]}

5. Using records option

When you use this option, to_dict will return a list of dictionaries. In each dictionary, the keys are column names and the values are column values.

dt=df.to_dict('records')
print(dt)

## output

[{'Name': 'John', 'Age': 28, 'City': 'New York'}, {'Name': 'Jane', 'Age': 24, 'City': 'Paris'}, {'Name': 'Jim', 'Age': 25, 'City': 'Berlin'}, {'Name': 'Joe', 'Age': 22, 'City': 'London'}]

6. Using index option

With this option, you get a nested dictionary, where outer keys are indexes. The inner dictionary keys are column names and final values are the column values.

dt=df.to_dict('index')
print(dt)

## output

{0: {'Name': 'John', 'Age': 28, 'City': 'New York'}, 1: {'Name': 'Jane', 'Age': 24, 'City': 'Paris'}, 2: {'Name': 'Jim', 'Age': 25, 'City': 'Berlin'}, 3: {'Name': 'Joe', 'Age': 22, 'City': 'London'}}

Advantages of converting dataframe to dict

Here are the main benefits of transforming dataframe into dict.

Dictionary is a much simpler data structure than dataframe. It allows you to easily store and access tabular data using key-value pairs. This can be useful if you are dealing with large data sets that require frequent read operations. Also, dictionary is more suitable for nested data.
Since you can directly access data using its key, you can easily manipulate it as per your requirement. It is easier to modify data in a dictionary compared to doing so with a dataframe.
Certain programs and processes may require input data as dictionary. In such cases, it is better to convert dataframe into dict.
Dictionary is more memory efficient than dataframe, since it does not take up much space to store data, and it does not contain much metadata. On the other hand, dataframes take up more space. If you need to optimize the memory utilization of your Python program, then you can convert dataframe into dict.

Disadvantages of converting dataframe to dict

On the other hand, there are certain disadvantages to converting dataframe to dict.

When you convert dataframe to dict, then the index information used to optimize dataframe access is lost. If your program heavily relies on dataframe indexes, then it will slow down since dictionaries do not support indexes. However, their key-value structure allows you to quickly access data too.
Dataframes allow you to label data using column names. They also allow you to get data type information for columns. Dictionary lack both these features
While converting dataframe into dict, in most cases, column names are converted into keys. Sometimes you may end up with redundant data, in case the same column name has to be repeated multiple times. So in some cases, the dictionary may be bulkier than its dataframe.

Possible Errors in converting dataframe to dict

We have to be careful about the following things while converting dataframe to dict:

The to_dict function only accepts specific values for orient argument. If you pass an invalid argument or value for orient parameter, then it will result in an error.
If any of the dataframe columns contains mixed data-type then it will result in an error. So before you call to_dict function, it is important to ensure that the columns contain consistent data types.
If the dataframe contains NaN or missing values then conversion to dictionary may cause problems. It may give an error or result in unexpected values in dictionary. So it is good to look for missing or NaN values and replace them before you convert dataframe into dict.

Conclusion

In this article, we have learnt several simple ways to easily convert Pandas dataframe into dictionary. Mainly, we need to use to_dict() function with appropriate orient parameter value. Depending on your requirement, you can use any of these options. It is important to use the right orient parameter value depending on how you want to organize dataframe data into dictionary.

FAQs

1. How to Convert Dataframe into Dictionary where each row’s values are stored as a separate dictionary?

You can use to_dict() function with orient parameter value set to ‘records’. In this case, you will get a list of dictionaries, where each dictionary contains a separate dataframe row’s values.

2. How to exclude index values from dataframe in dictionary?

For this purpose, you can use to_dict function with orient parameter as list or records option.

3. How to include index values from dataframe in dictionary?

There are several options to include index values from dataframe. The default option or dict option for orient parameter will return a nested dictionary where outer keys are column values and inner keys are index values.

If you use index option, then also you get a nested dict but in this case, the outer key is index and the inner key is column name.

If you use series option, then the keys will be column names and the values will be index followed by column value.

How to Convert String to List in Python

How to Clone List in Python

How to Check if File Exists in Python

Machine Learning Life Cycle

Sreeram Sreenivasan

Sreeram Sreenivasan is the Founder of Ubiq. He has helped many Fortune 500 companies in the areas of BI & software development.

How to Convert Pandas Dataframe to Dictionary

Why Convert Dataframe to Dictionary

How to Convert Pandas Dataframe to Dictionary

About to_dict

1. Using dict option

2. Using list option

3. Using series option

4. Using split option

5. Using records option

6. Using index option

Advantages of converting dataframe to dict

Disadvantages of converting dataframe to dict

Possible Errors in converting dataframe to dict

Conclusion

FAQs

Related posts:

How to Convert String to List in Python

How to Clone List in Python

How to Check if File Exists in Python

Machine Learning Life Cycle

Leave a Reply Cancel reply

Why Convert Dataframe to Dictionary

How to Convert Pandas Dataframe to Dictionary

About to_dict

1. Using dict option

2. Using list option

3. Using series option

4. Using split option

5. Using records option

6. Using index option

Advantages of converting dataframe to dict

Disadvantages of converting dataframe to dict

Possible Errors in converting dataframe to dict

Conclusion

FAQs

Related posts:

How to Convert String to List in Python

How to Clone List in Python

How to Check if File Exists in Python

Machine Learning Life Cycle

Share this:

Leave a Reply Cancel reply