12  List, Dictionaries and Comprehensions

In previous chapters, we covered the fundamentals of defining and processing basic Python structures. We began with simple data types such as integers, floats, and strings understanding how to define, manipulate, and utilize them in various contexts.

Next, we delved into the world of data analysis by learning how to define and manipulate pandas DataFrames. We explored their powerful capabilities for handling and analyzing tabular data, making it possible to perform complex data manipulations with ease.

In this chapter, we will shift our focus to another dimension of Python programming. We will explore lists and other advanced data structures like dictionaries. These structures are fundamental for an efficient data processing providing versatile ways to store and manipulate data. Additionally, we will learn how to utilize comprehensions to simplify and enhance our code when working with these data structures. This will include list comprehensions, dictionary comprehensions, and more, showcasing how these concise and readable constructs can improve our code efficiency and clarity.

By the end of this chapter, you’ll have a robust understanding of these data structures and how to harness the power of comprehensions to manipulate them effectively.

12.1 Lists

Lists are one of the most commonly used and versatile data structures in Python. Lists are ordered, mutable collections of elements. They can store elements of various data types, including integers, floats, strings, and even other lists.

Key Characteristics of Python lists:

  • Ordered: Lists maintain the order of elements as they are inserted. This means that the order in which elements appear is preserved.
  • Mutable: Lists can be changed (mutated) after they are created. You can add, remove, or change elements.
  • Heterogeneous: Lists can hold elements of different data types. For example, a single list can contain integers, strings, and even other lists.
  • Dynamic: Python lists can grow and shrink as needed, meaning they do not have a fixed size.

To create a list, you simply need to give it a name and assign it a collection of elements using the equals sign (=).

EvenNumbers = [2,4,6,8,10]  
CustomerNames=['John','Gerard','Lola','Elena','Xie']
FloatNumbers=[3.14, 5.6,7.899,12.01]
MixedList=['David',3.4,39,'Regents Park','London']

List Built-In Methods

Lists in Python come with a variety of built-in methods:

List Method Description
list.append() Adds an element to the end of the list.
list.extend() Adds all elements of an iterable to the end of the list.
list.insert() Inserts an element at a specified position.
list.remove() Removes the first occurrence of an element.
list.pop() Removes and returns an element at a specified index.
list.clear() Removes all elements from the list.
list.count() Returns the number of occurrences of an element.
list.sort() Sorts the list in ascending order.
my_list = [3, 1, 4, 1, 5, 9, 5, 5]

# Sorting the list
my_list.sort()
print(f'The ordered list is {my_list}')  

# Reversing the list
my_list.reverse()
print(f'The reversed list is {my_list}')  

# Counting occurrences of '1'
print(f'I have found {my_list.count(1)} ones in the list')  

# Counting occurrences of '5'
print(f'I have found {my_list.count(5)} fives in the list')  
The ordered list is [1, 1, 3, 4, 5, 5, 5, 9]
The reversed list is [9, 5, 5, 5, 4, 3, 1, 1]
I have found 2 ones in the list
I have found 3 fives in the list

List Operations

In addition to the methods above, lists can be manipulated using common Python operators.

You can access elements using their index (starting from 0) or using negative indices (starting from -1).

numbers = [1, 2, 3, 4, 5]

print(f'The first element in the list is: {numbers[0]}')  
print(f'The third element in the list is: {numbers[2]}')  
print(f'The second element, starting from the end, in the list is: {numbers[-2]}') 
print(f'The first element, starting from the end, in the list is: {numbers[-1]}') 
The first element in the list is: 1
The third element in the list is: 3
The second element, starting from the end, in the list is: 4
The first element, starting from the end, in the list is: 5

You can modify elements by assigning a new value to a specific index.

numbers[0] = 10
print(numbers)  # Output: [10, 2, 3, 4, 5]
[10, 2, 3, 4, 5]

You can combine two or more lists.

list1 = [1, 2, 3]
list2 = [4, 5, 6]

combined_list = list1 + list2
print(combined_list)  # Output: [1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]

Inded you can iterate over a list using for loops.

my_list = [3, 1, 4, 5, 9,]
for item in my_list:
    internalVariable=item*item
    print(f'the power of {item} is {internalVariable}') 
the power of 3 is 9
the power of 1 is 1
the power of 4 is 16
the power of 5 is 25
the power of 9 is 81
Using Generative AI for coding purposes

The combination of Python operators and list methods give you endless possibilities to manipulate lists. Send to your favorite code assistant the following prompt:

“Develop code to process a list and compute the natural logarithm of each element”

12.2 List Comprehensions

List comprehensions provide a concise way to create and process lists, they are equivalent to for loops, yet they provide a more compact and often more readable alternatives to traditional for loops. List comprehensions are consisted of three elements an expression an item and a list, refer to the following figure.

The list is a collection of elements we want to process, the item is a variable representing each element in the loop and expression is the operation we want to apply to item.

List Comprehensions

The following example processes a list of numbers, one element at a time, stores each element in the variable item and applies a multiplication operation over that item.

my_list = [3, 1, 4, 1, 5, 9, 5, 5]
[item*item for item in my_list]
[9, 1, 16, 1, 25, 81, 25, 25]

The following example processes a list of strings, one element at a time, stores each element in the variable item and applies the string.upper() method over that item.

another_list = ['hi', 'there', 'it', 'is', 'me', 'david','lopez']
[item.upper() for item in another_list]
['HI', 'THERE', 'IT', 'IS', 'ME', 'DAVID', 'LOPEZ']
Output of list comprehensions

Bear in mind that list comprehensions return a new list containing the operations performed on the old list

The previous two examples could have been implemented using regular for loops:

for item in my_list:
    internalVariable=item*item
    print(internalVariable)
9
1
16
1
25
81
25
25
for item in another_list:
    
    print(item.upper())
HI
THERE
IT
IS
ME
DAVID
LOPEZ

12.3 Conditional List Comprehensions

List comprehensions can be expanded with conditional operations, this is really useful for instance to check for errors or inconsistencies in the data being processed. The format of a conditional list comprehension is quite similar to its regular counterpart, only a condition is added, refer to the following figure.

List Comprehensions

The following example processes a list of numbers, one element at a time, stores each element in the variable item and applies the math.log() over each item if item is positive.

import math
my_list = [3, 1, 4, -10, 900, 5, -5,-30,-40,-12,-345]
[math.log(item) for item in my_list if item>0]
[1.0986122886681098,
 0.0,
 1.3862943611198906,
 6.802394763324311,
 1.6094379124341003]
my_fruits = ['orange','banana','strawberry','plum','blackberry','blueberry']
[item.upper() for item in my_fruits if item.startswith('b')]
['BANANA', 'BLACKBERRY', 'BLUEBERRY']
Output of list comprehensions

Bear in mind that conditional list comprehensions elements that not “pass” the condition are discarded. This is quite useful to avoid errors and inconsistencies while processing large datasets

12.4 Dictionaries

Dictionaries are one of Python’s best features; they are the building blocks of many efficient and elegant algorithms.

A dictionary contains a collection of indices, which are called keys, and a collection of values. Each key is associated with a single value. The association of a key and a value is called a key-value pair or sometimes an item.

Key Characteristics of Python Dictionaries:

  • Unordered: in Python 3.7 and later, dictionaries maintain the insertion order. This means that items are stored in the order in which they were added.
  • Mutable: Dictionaries are mutable, meaning you can change them after they are created. You can add, update, or remove key-value pairs.
  • Heterogeneous: The values in a dictionary can be of any data type such as strings, numbers, lists, tuples, other dictionaries, etc.
  • Unique keys: Each key in a dictionary is unique. If you try to create a dictionary with duplicate keys, the last value for that key will be retained.
  • Dynamic: Dictionaries in Python can grow and shrink as needed. You can add or remove items, and the dictionary will automatically adjust its size to accommodate the changes.

To create a dictionary, for each item you need to specify a key and a value, for instance:

my_dict = {
    "apple": "a red fruit",
    "banana": "a yellow fruit",
    "cucumber": "a green vegetable"
}
my_dict
{'apple': 'a red fruit',
 'banana': 'a yellow fruit',
 'cucumber': 'a green vegetable'}

Dictionaries are usually created from existing objects such as lists:

keys = ["apple", "banana", "cherry"]
values = ["red", "yellow", "dark red"]
my_dict = dict(zip(keys, values))
print(my_dict)  
{'apple': 'red', 'banana': 'yellow', 'cherry': 'dark red'}
Using Generative AI for coding purposes

the zip() method is quite useful in data related contexts. Send to your favorite code assistant the following prompt:

“Explain the zip() method in Python”

Dictionaries Built-In Methods

Dictionaries in Python come with a variety of built-in methods:

Dictionary Method Description
dict.get(key, [default]) Retrieves the value for a given key.
dict.keys() Returns a view object displaying a list of all the keys in the dictionary.
dict.values() Returns a view object displaying a list of all the values in the dictionary.
dict.items() Returns a view object displaying a list of the dictionary’s key-value tuple pairs.
dict.update([other]) Updates the dictionary with elements from another dictionary object or from an iterable of key-value pairs.
dict.pop(key, [default]) Removes the specified key and returns the corresponding value. Returns default if key not found.
dict.popitem() Removes and returns a (key, value) pair from the dictionary in LIFO order in Python 3.7+.
dict.clear() Removes all items from the dictionary.
dict.copy() Returns a shallow copy of the dictionary.
my_dict = {"apple": "red", "banana": "yellow"}
print(my_dict.get("apple")) 
print(my_dict.get("banana")) 
print(my_dict.get("kiwi"))  
red
yellow
None
print(my_dict.keys())  
dict_keys(['apple', 'banana'])
print(my_dict.values())  
dict_values(['red', 'yellow'])
print(my_dict.items())  
dict_items([('apple', 'red'), ('banana', 'yellow')])
new_entries = {"orange": "orange", "grape": "purple"}
my_dict.update(new_entries)
print(my_dict)  
{'apple': 'red', 'banana': 'yellow', 'orange': 'orange', 'grape': 'purple'}

Dictionaries Operations

Dictionaries can also be manipulated using Python native operators.

In addition to the Dictionary.get() method, a common way to access elements in a dictionary is by using the keys directly.

my_dict = {"name": "Alice", "age": 25, "city": "New York"}

print(my_dict)
# Accessing elements
print(my_dict["name"])  
print(my_dict["age"])   
{'name': 'Alice', 'age': 25, 'city': 'New York'}
Alice
25

In addition to the Dictionary.update() method, a common way to update elements in a dictionary is by direct assignment.

my_dict = {"name": "Alice", "age": 25, "city": "New York"}

# Update an existing element
my_dict["age"] = 26

# Add a new element
my_dict["city"] = "Beijing"

print(my_dict)  
{'name': 'Alice', 'age': 26, 'city': 'Beijing'}

Indeed you can iterate over a dictionarys using for loops. You can iterate on key, on value or both:

for item in my_dict.keys():
    print('I got the following key:', item)
I got the following key: name
I got the following key: age
I got the following key: city
for item in my_dict.values():
    print('I got the following value:', item)
I got the following value: Alice
I got the following value: 26
I got the following value: Beijing
for item in my_dict.items():
    print('I got the following key-value pair:', item)
I got the following key-value pair: ('name', 'Alice')
I got the following key-value pair: ('age', 26)
I got the following key-value pair: ('city', 'Beijing')
for item in my_dict.items():
    print('I got the following key:', item[0])
    print('I got the following value:', item[1])
I got the following key: name
I got the following value: Alice
I got the following key: age
I got the following value: 26
I got the following key: city
I got the following value: Beijing

12.5 Dictionary Comprehensions

Dictionary comprehensions provide a concise way to create and process dictionaries. Dictionary comprehensions also are consisted of three elements an expression an item and a list, refer to the following figure.

The dictionary.items() is a collection of elements we want to process, the item is a variable representing each element in the loop. key_expression and value_expression are operations we want to apply to each key and value respectively.

Dictionary Comprehensions

The following example processes a dictionary, extracting name and age and subsequently applying the upper() method to name.

employees = {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30}
{name.upper(): age for name,age in employees.items()}
{'ALICE': 28, 'BOB': 34, 'CHARLIE': 25, 'DIANA': 30}

The following example processes a dictionary, extracting name and age and subsequently applying the upper() method to name and the math.log() method to age.

import math
employees = {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30}
{name.upper(): math.log(age) for name,age in employees.items()}
{'ALICE': 3.332204510175204,
 'BOB': 3.5263605246161616,
 'CHARLIE': 3.2188758248682006,
 'DIANA': 3.4011973816621555}

Dictionary comprehensions are often used to create a new dictionary out of several lists, for instance the following comprehension iterates over each pair of (name,age) produced by the zip() method and construct a new key-value pair in the dictionary employees.

names = ["Alice", "Bob", "Charlie", "Diana"]
ages = [28, 34, 25, 30]

employees = {name: age for name, age in zip(names, ages)}

print(employees)
{'Alice': 28, 'Bob': 34, 'Charlie': 25, 'Diana': 30}

12.6 Conditional Dictionary Comprehensions

Dictionary comprehensions can also be expanded with conditional operations. The format of a conditional dictionary comprehension is quite similar to its regular counterpart, only a condition is added, refer to the following figure.

Conditional Dictionary Comprehensions

The following example processes a dictionary, extracting name and age and subsequently applying the upper() method to name. This time, however, only those elements having age larger than 30 are processed.

employees = {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30, "Bryan":55}

above_30 = {name.upper(): age for name, age in employees.items() if age > 30}
above_30
{'BOB': 34, 'BRYAN': 55}
employees = {"Alice": 28, "Bob": 34, "Charlie": 25, "Diana": 30, "Bryan":55}

bnames = {name.upper(): age for name, age in employees.items() if name.startswith('A') }
bnames
{'ALICE': 28}

12.7 Comprehensions in Practice

12.7.1 Example (Fixing Non-Valid Data I)

As a data scientist or AI engineer you will be spending a significant amount of your valuable time making sure that the datasets are free of errors and inconsistencies (e.g negative customer purchases).

Say we have the following measurements:

measurements=[23,34,22,33,-2,-4,10,20,21,36]
measurements
[23, 34, 22, 33, -2, -4, 10, 20, 21, 36]

The following code simply removes negative values from the list.

measurements_positive=[item for item in measurements if item>=0]
measurements_positive
[23, 34, 22, 33, 10, 20, 21, 36]

Sometimes we want to explicitly flag erroneous data for subsequent investigation.The following code illustrates how do it by checking for non-valid data in a list of measurements. We flag negative measurements as ‘non-valid’

measurements_fixed = ['non-valid' if item < 0 else item for item in measurements]
measurements_fixed
[23, 34, 22, 33, 'non-valid', 'non-valid', 10, 20, 21, 36]
Using Generative AI for coding purposes

Comprehensions can be intimidating at the beginning. The good news is that you can use GenAI-based services to explain what the code is trying to accomplish.

In the previous example try the following prompt:

“Explain the following code:” followed by the code

Something similar could have been a

12.7.2 Example (Fixing Non-Valid Data II)

The previous example provided a simplified scenario in which observations are just a list. In real life observations are oftentimes stored as dictionaries (or JSONS), for instance:

sensorData=[
    
     {"timestamp":"01-01-2024","temperature":34.5},
     {"timestamp":"04-01-2024","temperature":54.3},
     {"timestamp":"05-01-2024","temperature":-34.8},
     {"timestamp":"06-01-2024","temperature": 24.6},
     {"timestamp":"05-01-2024","temperature":-104.8},
    
         
]
sensorData
[{'timestamp': '01-01-2024', 'temperature': 34.5},
 {'timestamp': '04-01-2024', 'temperature': 54.3},
 {'timestamp': '05-01-2024', 'temperature': -34.8},
 {'timestamp': '06-01-2024', 'temperature': 24.6},
 {'timestamp': '05-01-2024', 'temperature': -104.8}]

The following code removes negative observations from the dataset

[item for item in sensorData if item['temperature']>=0]
[{'timestamp': '01-01-2024', 'temperature': 34.5},
 {'timestamp': '04-01-2024', 'temperature': 54.3},
 {'timestamp': '06-01-2024', 'temperature': 24.6}]

We might prefer to have non-valid data flagged rather than removed, in this case:

new_sensorData = [
    {
        "timestamp": item["timestamp"],
        "temperature": None if item["temperature"] < 0 else item["temperature"]
    }
    for item in sensorData
]
new_sensorData
[{'timestamp': '01-01-2024', 'temperature': 34.5},
 {'timestamp': '04-01-2024', 'temperature': 54.3},
 {'timestamp': '05-01-2024', 'temperature': None},
 {'timestamp': '06-01-2024', 'temperature': 24.6},
 {'timestamp': '05-01-2024', 'temperature': None}]

Pandas-based approach

You might have noticed that things can get quite messy when trying to process real data using comprehensions. For illustration purposes let’s try to develop pandas-based code to fix the previous dataset sensorData.

import pandas as pd

sensorDataFrame=pd.DataFrame(sensorData)
sensorDataFrame
timestamp temperature
0 01-01-2024 34.5
1 04-01-2024 54.3
2 05-01-2024 -34.8
3 06-01-2024 24.6
4 05-01-2024 -104.8
sensorDataFrame[sensorDataFrame['temperature']>0]
timestamp temperature
0 01-01-2024 34.5
1 04-01-2024 54.3
3 06-01-2024 24.6
sensorDataFrame['temperature'].apply(lambda x: 'non-valid' if x < 0 else x)
0         34.5
1         54.3
2    non-valid
3         24.6
4    non-valid
Name: temperature, dtype: object
sensorDataFrame['temperaturefixed']=sensorDataFrame['temperature'].apply(lambda x: 'non-valid' if x < 0 else x)
sensorDataFrame
timestamp temperature temperaturefixed
0 01-01-2024 34.5 34.5
1 04-01-2024 54.3 54.3
2 05-01-2024 -34.8 non-valid
3 06-01-2024 24.6 24.6
4 05-01-2024 -104.8 non-valid
Dictionaries versus Pandas Operations

As a matter of personal preference I tend to rely on Pandas for data transformation operations rather than on native Python (e.g. Comprehensions). In my experience Pandas methods complemented with lambda functions cover a vast majority of situations.

12.7.3 Example (Comprehensions over lists of dictionaries)

It is possible to perform really convoluted data transformations by applying comprehensions to both lists and dictionaries.

The following code processes a list of students computing the average of each student’s grades.

students = [
    {
        "name": "Alice",
        "age": 20,
        "grades": [88, 92, 75, 89]
    },
    {
        "name": "Bob",
        "age": 21,
        "grades": [79, 85, 90, 91]
    },
    {
        "name": "Charlie",
        "age": 22,
        "grades": [95, 100, 92, 93]
    },
    {
        "name": "David",
        "age": 23,
        "grades": [84, 87, 88, 90]
    }
]

students
[{'name': 'Alice', 'age': 20, 'grades': [88, 92, 75, 89]},
 {'name': 'Bob', 'age': 21, 'grades': [79, 85, 90, 91]},
 {'name': 'Charlie', 'age': 22, 'grades': [95, 100, 92, 93]},
 {'name': 'David', 'age': 23, 'grades': [84, 87, 88, 90]}]
from statistics import mean

# Process the list of students and compute the mean of grades
processed_students = [
    {
        "name": student["name"],
        "age": student["age"],
        "average_grade": mean(student["grades"])
    } for student in students
]

processed_students
[{'name': 'Alice', 'age': 20, 'average_grade': 86},
 {'name': 'Bob', 'age': 21, 'average_grade': 86.25},
 {'name': 'Charlie', 'age': 22, 'average_grade': 95},
 {'name': 'David', 'age': 23, 'average_grade': 87.25}]
Using Generative AI for coding purposes

Comprehensions can be intimidating at the beginning. The good news is that you can use GenAI-based services to explain what the code is trying to accomplish.

In the previous example try the following prompt:

“Explain the following code:” followed by the code

Pandas-based approach

The previous use case could have been implemented using Pandas capabilities.

import pandas as pd

studentsDataFrame=pd.DataFrame(students)
studentsDataFrame
name age grades
0 Alice 20 [88, 92, 75, 89]
1 Bob 21 [79, 85, 90, 91]
2 Charlie 22 [95, 100, 92, 93]
3 David 23 [84, 87, 88, 90]
import numpy as np
meanOfGrades=studentsDataFrame['grades'].apply(lambda x:np.mean(x))
studentsDataFrame['gradesmean']=meanOfGrades
studentsDataFrame
name age grades gradesmean
0 Alice 20 [88, 92, 75, 89] 86.00
1 Bob 21 [79, 85, 90, 91] 86.25
2 Charlie 22 [95, 100, 92, 93] 95.00
3 David 23 [84, 87, 88, 90] 87.25

12.8 Conclusion

This chapter builds on foundational Python programming concepts, introducing advanced data structures such as lists and dictionaries, and their capabilities in data processing. It explains the key characteristics of lists, including their ordered, mutable, and heterogeneous nature, and provides a comprehensive overview of list operations and built-in methods like append(), sort(), and pop(). Examples demonstrate accessing, modifying, and combining lists, as well as iterating through them using loops. A key highlight is the introduction of list comprehensions, a concise way to create and process lists, making code more readable and efficient. The chapter explores both standard and conditional list comprehensions, illustrating how they can handle large datasets or complex operations, such as filtering out invalid data or applying transformations to specific elements.

The chapter also introduces dictionaries, emphasizing their use as key-value pair data structures ideal for efficient data manipulation. It showcases built-in methods like get(), update(), and items() for accessing and modifying dictionary content. Dictionary comprehensions are presented as a powerful way to construct dictionaries from lists or other dictionaries, with examples highlighting conditional and multi-step transformations. The chapter concludes by contrasting native Python operations with Pandas-based approaches, emphasizing the practicality and efficiency of using Pandas for complex data transformations. By blending comprehensions with Pandas capabilities, readers can handle real-world datasets effectively, transitioning seamlessly between basic Python and data analysis workflows.

12.9 Further Readings

For those interested in additional examples and references on comprehensions feel free to check the following: