= [2,4,6,8,10]
EvenNumbers =['John','Gerard','Lola','Elena','Xie']
CustomerNames=[3.14, 5.6,7.899,12.01]
FloatNumbers=['David',3.4,39,'Regents Park','London'] MixedList
12 List, Dictionaries and Comprehensions
In previous chapters, we covered the fundamentals of defining and processing basic Python structures. We began with simple data types such as integers
, floats
, and strings
understanding how to define, manipulate, and utilize them in various contexts.
Next, we delved into the world of data analysis by learning how to define and manipulate pandas DataFrames
. We explored their powerful capabilities for handling and analyzing tabular data, making it possible to perform complex data manipulations with ease.
In this chapter, we will shift our focus to another dimension of Python programming. We will explore lists
and other advanced data structures like dictionaries
. These structures are fundamental for an efficient data processing providing versatile ways to store and manipulate data. Additionally, we will learn how to utilize comprehensions to simplify and enhance our code when working with these data structures. This will include list comprehensions, dictionary comprehensions, and more, showcasing how these concise and readable constructs can improve our code efficiency and clarity.
By the end of this chapter, you’ll have a robust understanding of these data structures and how to harness the power of comprehensions to manipulate them effectively.
12.1 Lists
Lists
are one of the most commonly used and versatile data structures in Python. Lists
are ordered, mutable collections of elements. They can store elements of various data types, including integers
, floats
, strings
, and even other lists
.
Key Characteristics of Python lists
:
- Ordered: Lists maintain the order of elements as they are inserted. This means that the order in which elements appear is preserved.
- Mutable: Lists can be changed (mutated) after they are created. You can add, remove, or change elements.
- Heterogeneous: Lists can hold elements of different data types. For example, a single list can contain integers, strings, and even other lists.
- Dynamic: Python lists can grow and shrink as needed, meaning they do not have a fixed size.
To create a list
, you simply need to give it a name and assign it a collection of elements using the equals sign (=).
List Built-In Methods
Lists in Python come with a variety of built-in methods:
List Method | Description |
---|---|
list.append() |
Adds an element to the end of the list. |
list.extend() |
Adds all elements of an iterable to the end of the list. |
list.insert() |
Inserts an element at a specified position. |
list.remove() |
Removes the first occurrence of an element. |
list.pop() |
Removes and returns an element at a specified index. |
list.clear() |
Removes all elements from the list. |
list.count() |
Returns the number of occurrences of an element. |
list.sort() |
Sorts the list in ascending order. |
= [3, 1, 4, 1, 5, 9, 5, 5]
my_list
# Sorting the list
my_list.sort()print(f'The ordered list is {my_list}')
# Reversing the list
my_list.reverse()print(f'The reversed list is {my_list}')
# Counting occurrences of '1'
print(f'I have found {my_list.count(1)} ones in the list')
# Counting occurrences of '5'
print(f'I have found {my_list.count(5)} fives in the list')
The ordered list is [1, 1, 3, 4, 5, 5, 5, 9]
The reversed list is [9, 5, 5, 5, 4, 3, 1, 1]
I have found 2 ones in the list
I have found 3 fives in the list
List Operations
In addition to the methods above, lists
can be manipulated using common Python operators.
You can access elements using their index (starting from 0) or using negative indices (starting from -1).
= [1, 2, 3, 4, 5]
numbers
print(f'The first element in the list is: {numbers[0]}')
print(f'The third element in the list is: {numbers[2]}')
print(f'The second element, starting from the end, in the list is: {numbers[-2]}')
print(f'The first element, starting from the end, in the list is: {numbers[-1]}')
The first element in the list is: 1
The third element in the list is: 3
The second element, starting from the end, in the list is: 4
The first element, starting from the end, in the list is: 5
You can modify elements by assigning a new value to a specific index.
0] = 10
numbers[print(numbers) # Output: [10, 2, 3, 4, 5]
[10, 2, 3, 4, 5]
You can combine two or more lists.
= [1, 2, 3]
list1 = [4, 5, 6]
list2
= list1 + list2
combined_list print(combined_list) # Output: [1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
Inded you can iterate over a list
using for
loops.
= [3, 1, 4, 5, 9,]
my_list for item in my_list:
=item*item
internalVariableprint(f'the power of {item} is {internalVariable}')
the power of 3 is 9
the power of 1 is 1
the power of 4 is 16
the power of 5 is 25
the power of 9 is 81
The combination of Python operators and list methods give you endless possibilities to manipulate lists. Send to your favorite code assistant the following prompt:
“Develop code to process a list and compute the natural logarithm of each element”
12.2 List Comprehensions
List comprehensions provide a concise way to create and process lists, they are equivalent to for
loops, yet they provide a more compact and often more readable alternatives to traditional for
loops. List comprehensions are consisted of three elements an expression an item and a list, refer to the following figure.
The list
is a collection of elements we want to process, the item
is a variable representing each element in the loop and expression
is the operation we want to apply to item
.
The following example processes a list
of numbers, one element at a time, stores each element in the variable item
and applies a multiplication operation over that item
.
= [3, 1, 4, 1, 5, 9, 5, 5]
my_list *item for item in my_list] [item
[9, 1, 16, 1, 25, 81, 25, 25]
The following example processes a list
of strings, one element at a time, stores each element in the variable item
and applies the string.upper()
method over that item
.
= ['hi', 'there', 'it', 'is', 'me', 'david','lopez']
another_list for item in another_list] [item.upper()
['HI', 'THERE', 'IT', 'IS', 'ME', 'DAVID', 'LOPEZ']
Bear in mind that list comprehensions return a new list containing the operations performed on the old list
The previous two examples could have been implemented using regular for
loops:
for item in my_list:
=item*item
internalVariableprint(internalVariable)
9
1
16
1
25
81
25
25
for item in another_list:
print(item.upper())
HI
THERE
IT
IS
ME
DAVID
LOPEZ
12.3 Conditional List Comprehensions
List comprehensions can be expanded with conditional operations, this is really useful for instance to check for errors or inconsistencies in the data being processed. The format of a conditional list comprehension is quite similar to its regular counterpart, only a condition is added, refer to the following figure.
The following example processes a list
of numbers, one element at a time, stores each element in the variable item
and applies the math.log()
over each item
if item
is positive.
import math
= [3, 1, 4, -10, 900, 5, -5,-30,-40,-12,-345]
my_list for item in my_list if item>0] [math.log(item)
[1.0986122886681098,
0.0,
1.3862943611198906,
6.802394763324311,
1.6094379124341003]
= ['orange','banana','strawberry','plum','blackberry','blueberry']
my_fruits for item in my_fruits if item.startswith('b')] [item.upper()
['BANANA', 'BLACKBERRY', 'BLUEBERRY']
Bear in mind that conditional list comprehensions elements that not “pass” the condition are discarded. This is quite useful to avoid errors and inconsistencies while processing large datasets
12.4 Dictionaries
Dictionaries are one of Python’s best features; they are the building blocks of many efficient and elegant algorithms.
A dictionary
contains a collection of indices, which are called keys
, and a collection of values
. Each key
is associated with a single value
. The association of a key and a value is called a key-value pair or sometimes an item.
Key Characteristics of Python Dictionaries:
- Unordered: in Python 3.7 and later, dictionaries maintain the insertion order. This means that items are stored in the order in which they were added.
- Mutable: Dictionaries are mutable, meaning you can change them after they are created. You can add, update, or remove key-value pairs.
- Heterogeneous: The values in a dictionary can be of any data type such as strings, numbers, lists, tuples, other dictionaries, etc.
- Unique keys: Each key in a dictionary is unique. If you try to create a dictionary with duplicate keys, the last value for that key will be retained.
- Dynamic: Dictionaries in Python can grow and shrink as needed. You can add or remove items, and the dictionary will automatically adjust its size to accommodate the changes.
To create a dictionary
, for each item you need to specify a key and a value, for instance:
= {
my_dict "apple": "a red fruit",
"banana": "a yellow fruit",
"cucumber": "a green vegetable"
} my_dict
{'apple': 'a red fruit',
'banana': 'a yellow fruit',
'cucumber': 'a green vegetable'}
Dictionaries are usually created from existing objects such as lists:
= ["apple", "banana", "cherry"]
keys = ["red", "yellow", "dark red"]
values = dict(zip(keys, values))
my_dict print(my_dict)
{'apple': 'red', 'banana': 'yellow', 'cherry': 'dark red'}
the zip()
method is quite useful in data related contexts. Send to your favorite code assistant the following prompt:
“Explain the zip() method in Python”
Dictionaries Built-In Methods
Dictionaries in Python come with a variety of built-in methods:
Dictionary Method | Description |
---|---|
dict.get(key, [default]) |
Retrieves the value for a given key. |
dict.keys() |
Returns a view object displaying a list of all the keys in the dictionary. |
dict.values() |
Returns a view object displaying a list of all the values in the dictionary. |
dict.items() |
Returns a view object displaying a list of the dictionary’s key-value tuple pairs. |
dict.update([other]) |
Updates the dictionary with elements from another dictionary object or from an iterable of key-value pairs. |
dict.pop(key, [default]) |
Removes the specified key and returns the corresponding value. Returns default if key not found. |
dict.popitem() |
Removes and returns a (key, value) pair from the dictionary in LIFO order in Python 3.7+. |
dict.clear() |
Removes all items from the dictionary. |
dict.copy() |
Returns a shallow copy of the dictionary. |
= {"apple": "red", "banana": "yellow"}
my_dict print(my_dict.get("apple"))
print(my_dict.get("banana"))
print(my_dict.get("kiwi"))
red
yellow
None
print(my_dict.keys())
dict_keys(['apple', 'banana'])
print(my_dict.values())
dict_values(['red', 'yellow'])
print(my_dict.items())
dict_items([('apple', 'red'), ('banana', 'yellow')])
= {"orange": "orange", "grape": "purple"}
new_entries
my_dict.update(new_entries)print(my_dict)
{'apple': 'red', 'banana': 'yellow', 'orange': 'orange', 'grape': 'purple'}
Dictionaries Operations
Dictionaries can also be manipulated using Python native operators.
In addition to the Dictionary.get()
method, a common way to access elements in a dictionary is by using the keys directly.
= {"name": "Alice", "age": 25, "city": "New York"}
my_dict
print(my_dict)
# Accessing elements
print(my_dict["name"])
print(my_dict["age"])
{'name': 'Alice', 'age': 25, 'city': 'New York'}
Alice
25
In addition to the Dictionary.update()
method, a common way to update elements in a dictionary is by direct assignment.
= {"name": "Alice", "age": 25, "city": "New York"}
my_dict
# Update an existing element
"age"] = 26
my_dict[
# Add a new element
"city"] = "Beijing"
my_dict[
print(my_dict)
{'name': 'Alice', 'age': 26, 'city': 'Beijing'}
Indeed you can iterate over a dictionary
s using for
loops. You can iterate on key
, on value
or both:
for item in my_dict.keys():
print('I got the following key:', item)
I got the following key: name
I got the following key: age
I got the following key: city
for item in my_dict.values():
print('I got the following value:', item)
I got the following value: Alice
I got the following value: 26
I got the following value: Beijing
for item in my_dict.items():
print('I got the following key-value pair:', item)
I got the following key-value pair: ('name', 'Alice')
I got the following key-value pair: ('age', 26)
I got the following key-value pair: ('city', 'Beijing')
for item in my_dict.items():
print('I got the following key:', item[0])
print('I got the following value:', item[1])
I got the following key: name
I got the following value: Alice
I got the following key: age
I got the following value: 26
I got the following key: city
I got the following value: Beijing
12.5 Dictionary Comprehensions
Dictionary comprehensions provide a concise way to create and process dictionaries. Dictionary comprehensions also are consisted of three elements an expression an item and a list, refer to the following figure.
The dictionary.items()
is a collection of elements we want to process, the item
is a variable representing each element in the loop. key_expression
and value_expression
are operations we want to apply to each key
and value
respectively.
The following example processes a dictionary
, extracting name
and age
and subsequently applying the upper()
method to name
.
= {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30}
employees for name,age in employees.items()} {name.upper(): age
{'ALICE': 28, 'BOB': 34, 'CHARLIE': 25, 'DIANA': 30}
The following example processes a dictionary
, extracting name
and age
and subsequently applying the upper()
method to name
and the math.log()
method to age
.
import math
= {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30}
employees for name,age in employees.items()} {name.upper(): math.log(age)
{'ALICE': 3.332204510175204,
'BOB': 3.5263605246161616,
'CHARLIE': 3.2188758248682006,
'DIANA': 3.4011973816621555}
Dictionary comprehensions are often used to create a new dictionary out of several lists, for instance the following comprehension iterates over each pair of (name
,age
) produced by the zip()
method and construct a new key-value pair in the dictionary employees
.
= ["Alice", "Bob", "Charlie", "Diana"]
names = [28, 34, 25, 30]
ages
= {name: age for name, age in zip(names, ages)}
employees
print(employees)
{'Alice': 28, 'Bob': 34, 'Charlie': 25, 'Diana': 30}
12.6 Conditional Dictionary Comprehensions
Dictionary comprehensions can also be expanded with conditional operations. The format of a conditional dictionary comprehension is quite similar to its regular counterpart, only a condition is added, refer to the following figure.
The following example processes a dictionary
, extracting name
and age
and subsequently applying the upper()
method to name
. This time, however, only those elements having age
larger than 30 are processed.
= {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30, "Bryan":55}
employees
= {name.upper(): age for name, age in employees.items() if age > 30}
above_30 above_30
{'BOB': 34, 'BRYAN': 55}
= {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30, "Bryan":55}
employees
= {name.upper(): age for name, age in employees.items() if name.startswith('A') }
bnames bnames
{'ALICE': 28}
12.7 Comprehensions in Practice
12.7.1 Example (Fixing Non-Valid Data I)
As a data scientist or AI engineer you will be spending a significant amount of your valuable time making sure that the datasets are free of errors and inconsistencies (e.g negative customer purchases).
Say we have the following measurements:
=[23,34,22,33,-2,-4,10,20,21,36]
measurements measurements
[23, 34, 22, 33, -2, -4, 10, 20, 21, 36]
The following code simply removes negative values from the list.
=[item for item in measurements if item>=0]
measurements_positive measurements_positive
[23, 34, 22, 33, 10, 20, 21, 36]
Sometimes we want to explicitly flag erroneous data for subsequent investigation.The following code illustrates how do it by checking for non-valid data in a list of measurements. We flag negative measurements as ‘non-valid’
= ['non-valid' if item < 0 else item for item in measurements]
measurements_fixed measurements_fixed
[23, 34, 22, 33, 'non-valid', 'non-valid', 10, 20, 21, 36]
Comprehensions can be intimidating at the beginning. The good news is that you can use GenAI-based services to explain what the code is trying to accomplish.
In the previous example try the following prompt:
“Explain the following code:” followed by the code
Something similar could have been a
12.7.2 Example (Fixing Non-Valid Data II)
The previous example provided a simplified scenario in which observations are just a list. In real life observations are oftentimes stored as dictionaries (or JSONS), for instance:
=[
sensorData
"timestamp":"01-01-2024","temperature":34.5},
{"timestamp":"04-01-2024","temperature":54.3},
{"timestamp":"05-01-2024","temperature":-34.8},
{"timestamp":"06-01-2024","temperature": 24.6},
{"timestamp":"05-01-2024","temperature":-104.8},
{
] sensorData
[{'timestamp': '01-01-2024', 'temperature': 34.5},
{'timestamp': '04-01-2024', 'temperature': 54.3},
{'timestamp': '05-01-2024', 'temperature': -34.8},
{'timestamp': '06-01-2024', 'temperature': 24.6},
{'timestamp': '05-01-2024', 'temperature': -104.8}]
The following code removes negative observations from the dataset
for item in sensorData if item['temperature']>=0] [item
[{'timestamp': '01-01-2024', 'temperature': 34.5},
{'timestamp': '04-01-2024', 'temperature': 54.3},
{'timestamp': '06-01-2024', 'temperature': 24.6}]
We might prefer to have non-valid data flagged rather than removed, in this case:
= [
new_sensorData
{"timestamp": item["timestamp"],
"temperature": None if item["temperature"] < 0 else item["temperature"]
}for item in sensorData
] new_sensorData
[{'timestamp': '01-01-2024', 'temperature': 34.5},
{'timestamp': '04-01-2024', 'temperature': 54.3},
{'timestamp': '05-01-2024', 'temperature': None},
{'timestamp': '06-01-2024', 'temperature': 24.6},
{'timestamp': '05-01-2024', 'temperature': None}]
Pandas-based approach
You might have noticed that things can get quite messy when trying to process real data using comprehensions. For illustration purposes let’s try to develop pandas-based code to fix the previous dataset sensorData
.
import pandas as pd
=pd.DataFrame(sensorData)
sensorDataFrame sensorDataFrame
timestamp | temperature | |
---|---|---|
0 | 01-01-2024 | 34.5 |
1 | 04-01-2024 | 54.3 |
2 | 05-01-2024 | -34.8 |
3 | 06-01-2024 | 24.6 |
4 | 05-01-2024 | -104.8 |
'temperature']>0] sensorDataFrame[sensorDataFrame[
timestamp | temperature | |
---|---|---|
0 | 01-01-2024 | 34.5 |
1 | 04-01-2024 | 54.3 |
3 | 06-01-2024 | 24.6 |
'temperature'].apply(lambda x: 'non-valid' if x < 0 else x) sensorDataFrame[
0 34.5
1 54.3
2 non-valid
3 24.6
4 non-valid
Name: temperature, dtype: object
'temperaturefixed']=sensorDataFrame['temperature'].apply(lambda x: 'non-valid' if x < 0 else x)
sensorDataFrame[ sensorDataFrame
timestamp | temperature | temperaturefixed | |
---|---|---|---|
0 | 01-01-2024 | 34.5 | 34.5 |
1 | 04-01-2024 | 54.3 | 54.3 |
2 | 05-01-2024 | -34.8 | non-valid |
3 | 06-01-2024 | 24.6 | 24.6 |
4 | 05-01-2024 | -104.8 | non-valid |
As a matter of personal preference I tend to rely on Pandas for data transformation operations rather than on native Python (e.g. Comprehensions). In my experience Pandas methods complemented with lambda functions cover a vast majority of situations.
12.7.3 Example (Comprehensions over lists of dictionaries)
It is possible to perform really convoluted data transformations by applying comprehensions to both lists and dictionaries.
The following code processes a list of students computing the average of each student’s grades.
= [
students
{"name": "Alice",
"age": 20,
"grades": [88, 92, 75, 89]
},
{"name": "Bob",
"age": 21,
"grades": [79, 85, 90, 91]
},
{"name": "Charlie",
"age": 22,
"grades": [95, 100, 92, 93]
},
{"name": "David",
"age": 23,
"grades": [84, 87, 88, 90]
}
]
students
[{'name': 'Alice', 'age': 20, 'grades': [88, 92, 75, 89]},
{'name': 'Bob', 'age': 21, 'grades': [79, 85, 90, 91]},
{'name': 'Charlie', 'age': 22, 'grades': [95, 100, 92, 93]},
{'name': 'David', 'age': 23, 'grades': [84, 87, 88, 90]}]
from statistics import mean
# Process the list of students and compute the mean of grades
= [
processed_students
{"name": student["name"],
"age": student["age"],
"average_grade": mean(student["grades"])
for student in students
}
]
processed_students
[{'name': 'Alice', 'age': 20, 'average_grade': 86},
{'name': 'Bob', 'age': 21, 'average_grade': 86.25},
{'name': 'Charlie', 'age': 22, 'average_grade': 95},
{'name': 'David', 'age': 23, 'average_grade': 87.25}]
Comprehensions can be intimidating at the beginning. The good news is that you can use GenAI-based services to explain what the code is trying to accomplish.
In the previous example try the following prompt:
“Explain the following code:” followed by the code
Pandas-based approach
The previous use case could have been implemented using Pandas capabilities.
import pandas as pd
=pd.DataFrame(students)
studentsDataFrame studentsDataFrame
name | age | grades | |
---|---|---|---|
0 | Alice | 20 | [88, 92, 75, 89] |
1 | Bob | 21 | [79, 85, 90, 91] |
2 | Charlie | 22 | [95, 100, 92, 93] |
3 | David | 23 | [84, 87, 88, 90] |
import numpy as np
=studentsDataFrame['grades'].apply(lambda x:np.mean(x))
meanOfGrades'gradesmean']=meanOfGrades
studentsDataFrame[ studentsDataFrame
name | age | grades | gradesmean | |
---|---|---|---|---|
0 | Alice | 20 | [88, 92, 75, 89] | 86.00 |
1 | Bob | 21 | [79, 85, 90, 91] | 86.25 |
2 | Charlie | 22 | [95, 100, 92, 93] | 95.00 |
3 | David | 23 | [84, 87, 88, 90] | 87.25 |
12.8 Conclusion
This chapter builds on foundational Python programming concepts, introducing advanced data structures such as lists
and dictionaries
, and their capabilities in data processing. It explains the key characteristics of lists, including their ordered, mutable, and heterogeneous nature, and provides a comprehensive overview of list operations and built-in methods like append()
, sort()
, and pop()
. Examples demonstrate accessing, modifying, and combining lists, as well as iterating through them using loops. A key highlight is the introduction of list comprehensions, a concise way to create and process lists, making code more readable and efficient. The chapter explores both standard and conditional list comprehensions, illustrating how they can handle large datasets or complex operations, such as filtering out invalid data or applying transformations to specific elements.
The chapter also introduces dictionaries
, emphasizing their use as key-value pair data structures ideal for efficient data manipulation. It showcases built-in methods like get()
, update()
, and items()
for accessing and modifying dictionary content. Dictionary comprehensions are presented as a powerful way to construct dictionaries from lists or other dictionaries, with examples highlighting conditional and multi-step transformations. The chapter concludes by contrasting native Python operations with Pandas-based approaches, emphasizing the practicality and efficiency of using Pandas for complex data transformations. By blending comprehensions with Pandas capabilities, readers can handle real-world datasets effectively, transitioning seamlessly between basic Python and data analysis workflows.
12.9 Further Readings
For those interested in additional examples and references on comprehensions feel free to check the following:
Tuples Tutorial
Dictionaries Tutorial
Advanced material: Tuples, List, Dictionaries