Apify is all about making the web more programmable. Our SDK for Python is a great toolkit to help simplify the process of making scrapers to collect web data. This tutorial aims to give you a solid understanding of Python dictionaries for storing data.
What is a Python dictionary?
Python has a variety of built-in data structures that can store different types of data. One such data structure is the Python dictionary, which can store data in the form of key:value pairs and allows for quick access to values associated with keys. You can think of it like a regular dictionary, where words are keys and their definitions are values.
In some other languages, the Python dictionary is called a hashtable because its keys are hashable. Python dictionaries are dynamic and mutable, which means that their data can be changed.
What are dictionaries used for in Python?
Python dictionaries are used to store data in key-value pairs, where each key is unique within a dictionary, while values may not be. The values of a dictionary can be of any type, but the keys must be of an immutable data type, such as strings, numbers, or tuples. This is because the Python dictionary is implemented internally as a hash table. So, if you hash a mutable object, then change it and hash it again, you will get a different hash.
Python dictionaries vs. lists
Python dictionaries are optimized for fast lookups, making them more efficient than lists for this purpose.
In Python, the average time complexity of a dictionary key lookup is O(1) because dictionaries are implemented as hash tables, and keys are hashable. On the other hand, the time complexity of a lookup in a list is O(n) on average.
When to use Python dictionaries
Using a Python dictionary makes the most sense under the following conditions:
If you want to store data and objects using names rather than just index numbers or positions. Use a list if you want to store elements so that you can retrieve them by their index number.
If you need to look up data and objects by names quickly. Dictionaries are optimized for constant-time lookups.
If you need to store data efficiently and flexibly. Dictionaries store unique keys, so if you’ve got a lot of duplicate data, dictionaries will only store unique data.
How to use dictionaries in Python
A dictionary is a group of key-value pairs. Using a dictionary in Python means working with the key-value pairs and performing various operations, such as retrieving data, deleting data, and inserting data.
What is a key-value pair?
A key-value pair is a combination of two elements: a key and a value. The key in a key-value pair must be immutable, meaning that it cannot be changed. Examples of immutable keys include numbers, strings, and tuples. Values in key-value pairs can be any type of data, including numbers, lists, strings, tuples, and even dictionaries.
The values can repeat, but the keys must remain unique. You cannot assign multiple values to the same key. However, you can assign a list of values as a single value.
How to create a Python dictionary
To create a dictionary in Python, you can use curly braces {} to enclose a sequence of items separated by commas. Each item consists of a key and a value. There are two primary methods for defining a dictionary: using a literal (curly braces) or built-in function (dict()).
First, let’s create an empty dictionary and then fill it with some items.
test_dict = {}
Let's fill the dictionary with some data. Suppose we’ve got integer keys and string values.
Alternatively, you can create a dictionary by explicitly calling the Python dict() constructor. Remember, the main usage of the built-in dict() function is to convert between different data types.
In Python dictionaries, keys should be hashable (they should be immutable). Thus, mutable data types like lists aren't allowed. Let's try to hash() different data types and see what happens:
# Hashing of an integer
print(hash(1))
# Hashing of a float
print(hash(1.2))
# Hashing of a string
print(hash("apify"))
# Hashing of a tuple
print(hash((1, 2)))
# Hashing of a list
# Lists are not hashable, so this will raise a TypeError
print(f"Hash of list [1, 2, 3]: {hash([1, 2, 3])}")
Here’s the code result:
Integers, floats, strings, and tuples are hashable data types, whereas lists are unhashable data types.
Accessing values using keys
To access the value from a Python dictionary using a key, you can use the square bracket notation or the get() method.
To access the value using square bracket notation, place the key inside the square brackets.
test_dict[key]
In the following code, we access the values using these keys: "name" and "products”.
Now, what if you're searching for the key and the key does not exist in the dictionary? If you use the square bracket notation, the program will throw a KeyError. However, if you use the get() method, you’ll get None by default, and you can also set the default return value by passing the default values as the second argument. The following examples return "Key not found!" if the key doesn't exist.
company = {
"name": "apify",
"year": 2015,
"solution": "web scraping",
"products": ["crawlee", "actors", "proxy"],
"active": True,
}
print(company.get("founder")) # None
print(company.get("founder", "Key not found!")) # Key not found!
How to modify Python dictionaries
Python dictionaries are mutable, which means you can modify the dictionary items. You can add, update, or remove key-value pairs. Here are some basic operations to modify dictionaries.
Adding and updating key-value pairs
There are various ways to add new elements to a dictionary. A common way is to add a new key and assign a value to it using the = operator.
You can add multiple key-value pairs to an existing dictionary. This is achieved by using the update() method. If the key is already present in the dictionary, it gets overwritten with the new value. This is the most popular method for adding key-value pairs to the Python dictionary.
Here’s the code result:
One last method involves using the Merge (|) and Update (|=) operators. They were introduced in Python 3.9. The merge (|) operator creates a new dictionary with the keys and values from both of the given dictionaries. You can then assign this newly created dictionary to a new variable.
The Update (|=) operator adds the key-value pairs of the second dictionary to the first dictionary.
company = {"name": "apify"}
new_dict = {"year": 2015, "solution": "web scraping"}
# Merge Operator
result = company | new_dict
print(result)
# Update Operator
company |= new_dict
print(company)
Here's the code result in the Python (3.9+) interpreter:
Removing key-value pairs
The removal of key-value pairs can be done in several ways, which we’ll discuss one by one.
The del keyword can be used to delete key-value pairs from the dictionary. Just pass the key of the key-value pair that you want to delete.
The "year" entry has been deleted from the Python dictionary.
Another way is to use the pop() method to delete the key-value pair. The main difference between Pop and Del is pop will return the popped item, whereas del will not.
If you want to delete the entire dictionary, it would be difficult to use the above methods on every single key to delete all the items. Instead, you can use the del keyword to delete the entire dictionary.
company = {
"name": "apify",
"year": 2015,
"solution": "web scraping",
"active": True,
}
del company
print(company)
But if you try to access the dictionary, you’ll encounter a NameError because the dictionary no longer exists.
A more appropriate method to delete all elements from a dictionary without deleting the dictionary itself is to use the clear() method.
The code gives you the empty dictionary as the output. The del will delete the whole object, and the clear() method will just clear the content of the dictionary, but the object will remain.
Built-in Python dictionary methods
Python provides a set of built-in methods to make common dictionary operations like adding, deleting, and updating easier. These methods improve code performance and consistency. Some common methods include get(), keys(), values(), items(), pop(), and update().
get()
The get() method returns the value for a key if it exists in the dictionary or returns None if the key does not exist. You can also set a default value, which will be returned when the key does not exist in the dictionary.
In the above output, when you try to access the key name, the value apify is returned. When you try to access the key founder, which does not exist in the dictionary, the value None is returned. When you access the key founder again, the default value that you set to return if the key is not found is returned. Finally, when you try to access the key without using the .get() method, the program throws an error. This is why the .get() method is useful.
keys()
This method returns a list of keys in a dictionary. It returns an iterable object (dict_keys) that contains a list of all keys in the dictionary.
This method returns a list of key-value pairs as tuples, where the first item in each tuple is the key, and the second item is the value. The returned object is iterable, so this method is primarily used when you want to iterate through the dictionary.
company = {
"name": "apify",
"year": 2015,
"solution": "web scraping",
"products": ["crawlee", "actors", "proxy"],
}
for k, v in company.items():
print(k, v)
# or
for x in company:
print(x, company[x])
Here’s the code output:
Note that when you change the value in the dictionary, the items object will also be updated to reflect the changes.
The pop() method removes a key from a dictionary if it is present and returns its associated value.
The pop() method raises an error if the key is not found.
But if you don't want this error to be raised, you can set a default value.
update()
The update() method is useful whenever you want to merge a dictionary with another dictionary or with an iterable of key-value pairs. Let's consider that the dictionary company will be updated using the entries from the dictionary new_dict. For each key in new_dict:
If the key is not already present in the company, the key-value pair from new_dict will be added to the company.
If the key is already present in the company, the corresponding value in the company for that key is updated with the value from new_dict.
As shown above, the new key active is added to the company dictionary, and the solution key is updated.
Now, new_dict can also be specified as a list of tuples.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
new_dict = (("active", True), ("solution", "web scraping and automation"))
company.update(new_dict)
print(company)
Here’s the code result:
Values to be merged can also be specified as a list of arguments.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
new_dict = (("active", True), ("solution", "web scraping and automation"))
company.update(active=True, solution="web scraping and automation")
print(company)
Here’s the code output:
fromkeys() method
The fromkeys() method is used to create a dictionary from a given sequence of keys (which can be a string, tuple, list, etc.) and values. The value parameter is optional, and if a value is not provided, None is assigned to the keys. The fromkeys() method returns a new dictionary with the specified sequence of keys and values.
Now, let's take a look at one more feature provided by fromkeys(). This time, use a mutable value.
keys = {"name", "year", "solution"}
value = ["apify"]
new = dict.fromkeys(keys, value)
print(new)
value.append(2015)
print(new)
Here’s the code output:
In the above result, when we update the list value using append, the keys are assigned with the new updated values. This is because each element is referencing the same memory address.
copy()
This method returns a copy of the existing dictionary. Note that modifications made to the copied dictionary won't affect the original one.
In the code, we are trying to modify the name of the new_copy dictionary, but it does not affect the original dictionary (company).
Iterating over Python dictionaries
The dictionary allows access to the data in O(1) time. So, it is important to understand how to iterate over it so that you can access the data you’ve stored in the dictionary. Three commonly used methods for iterating over dictionaries are keys(), values(), and items().
Using loop with keys()
This method allows you to iterate through all the initialized keys. It returns a view object, which is basically a view of some data. You can iterate over this returned object without any problems. However, if you want to store the list of keys, you need to materialize it.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
all_keys = company.keys()
print(all_keys) # dict_keys(['name', 'year', 'solution'])
# Materializing the keys into a list of keys
keys_list = list(company.keys())
print(keys_list) # ['name', 'year', 'solution']
The first result is a view object, and the second result is a list of keys because you materialize the view object into a list.
Now, let's iterate over the dictionary, demonstrating two simple ways.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
for key in company.keys():
print(key, company[key])
for key in company:
print(key, company[key])
Here’s the code output:
Note: When using the in keyword with a dictionary, the dictionary invokes the __iter__() method, which returns an iterator. This iterator is then used to iterate through the keys of the dictionary implicitly.
Using loop with values()
Like the keys() method, the values() method also returns a view object that allows you to iterate over the values. Unlike the keys() method, this method provides only values. So, if you are not concerned about the keys, use this method.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
for val in company.values():
print(val)
# Output:
""""
apify
2015
web scraping
"""
Using loop with items()
Similar to the previous method, the items() method also returns a view object, but here you can iterate through the (key, value) pairs instead of iterating through either keys or values.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
for item in company.items():
print(item)
Here’s the code output:
The above code returns the key-value pairs. Now, to immediately assign both keys and values simultaneously, you can use tuple unpacking and extract them using variables.
company = {"name": "apify", "year": 2015, "solution": "web scraping"}
for key, value in company.items():
print(key, value)
Here’s the code result:
Python dictionary comprehensions
First, let's remind ourselves what comprehension is in Python. It simply means applying a specific kind of operation on each element of an iterable (tuple, list, or dictionary). Dictionary comprehension is similar to list comprehension, but it creates a dictionary instead of a list.
It is a useful feature that allows us to create dictionaries on a single line concisely and efficiently. The common uses of dictionary comprehension include constructing dictionaries, transforming existing dictionaries, and filtering dictionary content.
{key: value for (key, value) in iterable}
Suppose you’ve got a dictionary that contains the names and money of each person. Suppose you want to transform the dictionary's values, i.e., you want to increment each amount of money by 10 dollars.
Suppose you want to filter all the person names whose money is less than 20 dollars.
person = {"satyam": 22, "john": 18, "elon": 19}
new_person = {name: money for name, money in person.items() if money < 20}
print(new_person) # {'john': 18, 'elon': 19}
Let's filter the data by using if and else with dictionary comprehension. Suppose you want to create a new dictionary where the amount of money less than 20 will get a 20% increase.
person = {"satyam": 22, "john": 18, "elon": 19}
new_person = {
name: money * 1.2 if money < 20 else money for name, money in person.items()
}
print(new_person) # {'satyam': 22, 'john': 21.6, 'elon': 22.8}
You can perform a lot of operations in one line using dictionary comprehension.
Handling missing keys with get() and setdefault()
Sometimes, users don't know if a key exists in a dictionary, and they try to access it. This can cause an error. There are a few ways to handle missing keys. One way is to use the get() and setdefault() methods to set a default value for the key. This way, you don't have to handle the error yourself.
The get() method returns the associated value if the key is found in the dictionary. Otherwise, it returns the default value (which is None by default), instead of raising an error. You can also specify a custom default value as the second argument.
company = {"name": "Apify", "year": 2015, "solution": "web scraping"}
# Accessing a key using the get() method
print(company.get("founder"))
# Accessing a key using the get() method with a default value
print(company.get("founder", "Sorry, key not found!"))
# Accessing a key without using the get() method
print(company["founder"])
Here’s the code result:
Another method is setdefault(). If the key is present in the dictionary, it returns the value of the key. Otherwise, this method inserts a new key with the default value passed as an argument in the method as its value.
# Create a dictionary to store person's information
person = {"name": "Satyam", "age": 21}
# If the 'age' key is present, retrieve its value; otherwise, set a default value
age = person.setdefault("age", "Key not found!")
# If the 'city' key is not present, set it to the default value "Key not found!"
city = person.setdefault("city", "Key not found!")
print("Age:", age)
print("City:", city)
# The dictionary 'person' will be updated as the new key 'city' is added
print("Updated dictionary:", person)
Here’s the code output:
In the above code, the age key is already in the dictionary. We are trying to set the default value for the age key, but this is impossible. So, the age value remains the same, which is 21. We are also trying to set the default value for the city key, which is not in the dictionary. So, the city value is set to the default value, Key not found. Now, if you try to access the city key, the default value will be printed. The dictionary has been updated as the new key city is added.
The main difference between get() and setdefault() is that get() only returns the value of the key if it exists in the dictionary, while setdefault() will set the key to the default value if it does not exist.
Merging dictionaries
Merging of dictionaries happens from right to left (A ← B). When there is a common key in both dictionaries, the value from the second dictionary overwrites the value in the first dictionary. As shown in the illustration below, key 1 exists in both dictionary A and dictionary B, but the value of key 1 in dictionary B overwrites the value of key 1 in dictionary A. This process occurs from right to left.
Let’s understand some ways to merge Python dictionaries.
From Python version 3.9 onward, we can use the merge (|) and Update (|=) operators. The merge (|) operator creates a new dictionary with the keys and values from both of the given dictionaries. You can then assign this newly created dictionary to a new variable. The Update (|=) operator adds the key-value pairs of the second dictionary to the first dictionary.
A nested dictionary is a collection of dictionaries within a single dictionary. In short, a nested dictionary is a dictionary inside another dictionary. Below, we have created a nested dictionary for two companies:
You can add or update nested dictionary items. If the key is not present in the dictionary, the key will be added. If the key is already present in the dictionary, its value is replaced by the new one.
Python dictionary is a very useful data structure and is used in many real-world scenarios because they are easy to use, flexible, and efficient.
Benefits
Ease of use: Python dictionary is a powerful data structure that makes it easy to store and retrieve data. They are easy to use, provide quick access to data, and make finding data much easier. With very little code, you can modify dictionaries, add and delete objects, and more.
Flexible: Python dictionaries are flexible and can store many different types of data, such as numbers, strings, and lists. This makes it easy to access and manipulate the data.
Efficiency: Python dictionaries use hash tables to store data, which allows them to find the value associated with a key quickly. Python dictionaries take very little space to store data, making them ideal for applications that need to access and manipulate large amounts of data easily. Python dictionary functions like get() and setdefault() let you efficiently look up data in a dictionary. Other functions like pop(), update(), and clear() are efficient for manipulating data.
Example
Let's take a look at the real-life application where Python dictionaries are used a lot. This example shows how to store student data using a Python dictionary efficiently. You can access the student data without getting an error if the student information is not found. You can also list all the student information and remove specific student details.
class StudentDatabase:
def __init__(self):
self.students = {}
def add_student(self, student_id, name, grade):
self.students[student_id] = {"name": name, "grade": grade}
def get_student_info(self, student_id):
return self.students.get(student_id, None)
def list_all_students(self):
for student_id, student_info in self.students.items():
print(
f"Student ID: {student_id}, Name: {student_info['name']}, Grade: {student_info['grade']}"
)
def remove_student(self, student_id):
if student_id in self.students:
del self.students[student_id]
print("Student removed from the database.")
else:
print("Student not found in the database")
def main():
student_manager = StudentDatabase()
while True:
print("\\nStudent Management Menu:")
print("1. Add Student")
print("2. Retrieve Student Information")
print("3. List All Students")
print("4. Remove Student")
print("5. Exit")
choice = input("Enter your choice: ")
if choice == "1":
student_id = input("Enter student ID: ")
name = input("Enter student name: ")
grade = input("Enter student grade: ")
student_manager.add_student(student_id, name, grade)
print("Student added to the database.")
elif choice == "2":
student_id = input("Enter student ID to retrieve: ")
student = student_manager.get_student_info(student_id)
if student:
print(
f"Student ID: {student_id}, Name: {student['name']}, Grade: {student['grade']}"
)
else:
print("Student not found in the database.")
elif choice == "3":
student_manager.list_all_students()
elif choice == "4":
student_id = input("Enter student ID to remove: ")
student_manager.remove_student(student_id)
elif choice == "5":
break
else:
print("Invalid choice. Please try again.")
print("Goodbye!")
if __name__ == "__main__":
main()
Here’s the code output:
Similarly, Python dictionaries can be used in other real-life applications, such as:
Creating a contact book to store all of your contacts.
Building a user authentication service to store all of your users' data.
Best practices for Python dictionaries
Following best practices can help you get the most out of dictionaries and avoid common pitfalls. The best practices ensure that your code will remain maintainable, reliable, and perform at its best.
Ensuring unique keys
Keys in a Python dictionary must be unique. If you try to insert a duplicate key, the new value will overwrite the existing one. The most common and straightforward method to ensure unique keys is to check whether the key exists in the dictionary before inserting a new key-value pair. You can use the in operator for this.
In the following code, we check if the key exists. If it does not exist, we simply insert it. Otherwise, we handle it accordingly, such as ignoring the new value, replacing the existing value with the new value, or raising an exception.
company = {"name": "Apify", "founded": 2012}
key = "solution"
# Check if the 'key' exists in the 'company' dictionary
if key not in company:
# If it doesn't exist, add the 'key' with the value 'web scraping'
company[key] = "web scraping"
else:
# Handle the case when 'key' already exists in the dictionary (duplicate key)
Mutable vs. immutable key types
Dictionary keys must be of an immutable type, such as integers, floats, strings, tuples, or Booleans. Lists and dictionaries cannot be dictionary keys because they are mutable. However, values can be of any type and used multiple times.
The output is an Exception, what does unhashable mean?
It simply means that the Python interpreter is not able to generate a hash for the key given in the dictionary, whose data type is a list. So, the values of a dictionary can be of any type, but the keys must be of an immutable data type.
Performance considerations
Python dictionaries are an excellent choice for fast lookups, insertions, and deletions. However, some factors may affect their performance.
Use dictionary comprehension when creating dictionaries. Dictionary comprehension is more concise and readable than a loop, and it is also more efficient in some cases, especially for large datasets.
Flat dictionaries are faster than nested dictionaries. If you want to store nested data, consider using a different data structure, such as a list of dictionaries or a tuple of dictionaries.
When initializing a new dictionary, using {} is more efficient than calling dict(). With {}, there are no function call overheads. Calling dict() requires Python to look up and execute a function, which incurs a slight performance penalty.
Comparing dictionaries with other Python data structures
Python dictionary have their own unique characteristics and use cases when compared to other data structures like lists, sets, and tuples.
Differences and similarities with lists, sets, and tuples
Lists are the ordered collections of elements that can be modified after creation. Items can be accessed by index, and duplicates are allowed.
Sets are unordered collections of unique elements that cannot be modified after creation. Sets are commonly used for operations such as finding the union, intersection, or difference between sets.
Tuples are immutable ordered collections of elements that cannot be modified after creation. Tuples are often used when you want to ensure that a collection of items remains constant and in a specific order.
Dictionaries are the unordered collections of key-value pairs. The keys must be unique and hashable, and the values can be any type of data. Dictionaries can be modified after creation.
When to use dictionaries over other data structures
Python dictionaries provide a fast and efficient way to retrieve data by its key in constant time, O(1). Dictionaries do not allow duplicate keys, so you can be sure that each key in a dictionary is unique. This makes them an excellent choice for storing large amounts of data in memory.
When you need to access a specific element in a data structure, dictionaries are much faster than other data structures, such as lists. This is because the runtime for dictionary lookup operations is constant.
Here is a simple comparison of the runtimes for list and dictionary lookup operations:
List:
import time
# Create a list containing numbers from 0 to 9,999,999
number_list = [number for number in range(10**7)]
# Measure the time taken to check if '5' is in the list
start_time = time.time()
if 5 in number_list:
print("5 is in the list")
list_runtime = time.time() - start_time
print(f"\\nList runtime (checking '3' in the list): {list_runtime} seconds.")
# Measure the time taken to find the number '9,000,000' in the list
start_time = time.time()
for number in number_list:
if number == 9000000:
break
list_runtime = time.time() - start_time
print(f"\\nList runtime (finding '9,000,000' in the list): {list_runtime} seconds.")
Here’s the code output:
Dictionary:
import time
# Create a dictionary with keys and values
dictionary_data = {i: i * 2 for i in range(10**7)}
# Measure the time taken to check if '5' is a key in the dictionary
start_time = time.time()
if 5 in dictionary_data:
print("Key '5' is in the dictionary.")
dict_runtime = time.time() - start_time
print(
f"\\nDictionary runtime (retrieving the value for key '5'): {dict_runtime:.6f} seconds."
)
# Measure the time taken to retrieve the value for the key '9,000,000' from the dictionary
start_time = time.time()
value = dictionary_data.get(9000000)
dict_runtime = time.time() - start_time
print(
f"\\nDictionary runtime (retrieving the value for key '9,000,000'): {dict_runtime:.6f} seconds."
)
Here’s the code output:
Have you noticed that dictionaries perform much better than lists when looking up larger indexes? It took the dictionary almost no time to locate the number, while the list took around 1 second to perform the same operation.
Wrapping up
You've learned in detail how to create, modify, and delete Python dictionaries, as well as some of the most commonly used dictionary methods and advanced Python dictionary operations. Start using these dictionary methods whenever you see a possibility. Build amazing projects using dictionaries and their methods, keeping best practices in mind.
I am a freelance technical writer based in India. I write quality user guides and blog posts for software development startups. I have worked with more than 10 startups across the globe.