def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")
9 Introduction to Functions
Functions are important concepts in software programming, they allow you to reuse your code, or others’, as well as to modularize your code making it easier to debug and maintain.
Thus far you have been using functions all the time, to load datasets, to filter data, to print information or just to perform arithmetic operations. As a matter of fact libraries such as Pandas, Numpy or Seaborn are nothing but collections of functions (a.k.a methods) defined by the community for you to benefit from.
From a practical standpoint functions, and methods for that matter, are blocks of code that take some input (e.g. an external dataset in CSV format) and produce some outputs (e.g. a new Pandas DataFrame), refer to the following figure.
In this chapter you will learn how to extend existing functionality with your own functions. This will allow you to implement new features and customize existing code to tackle complex requirements. You will practice defining your own regular functions in Python and have them applied in data science or artificial intelligence contexts. You will also learn how to develop lambda functions and have them applied to process Pandas DataFrames.
Happy codings!
9.1 Regular Functions
Formally speaking a function definition in Python specifies: (1) the name of the function, (2) the arguments it takes, (3) the sequence of statements that run when the function is called and (4) the value (or object) returned to the main program, refer to the following figure.
Example
The following example defines a basic function, called print_lyrics()
that simply prints a set of sentences.
The execution of the function print_lyrics()
returns:
print_lyrics()
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
Proper code Indentation in Python is very important. Please note how the code is indented inside the definition of any function in Python.
Example
The following code defines a function called print_name()
, this time taking one argument NameArgument
and printing some text.
def print_name(NameArgument):
print('Hello')
print('Your name is: '+NameArgument)
The execution of the function print_name
with the argument ‘David’ returns:
'David') print_name(
Hello
Your name is: David
Example
For illustration purposes let’s assume we need to define a function named fahr_to_celsius()
that: (1) takes an argument temp
, (2) performs a mathematical operation, (4) stores the result in a new variable internalVariable
and (3) returns the result to the main program:
def fahr_to_celsius(temp):
=(temp - 32) * (5/9)
internalVariablereturn(internalVariable)
When invoked this function transforms temperature values from Fahrenheit to Celsius
120) fahr_to_celsius(
48.88888888888889
Example
Functions can be defined to take as many arguments as needed. The following code defines a function that takes two arguments and performs a product operation.
def multiplication(argument1,argument2):
=argument1*argument2
internalVariablereturn(internalVariable)
10,20) multiplication(
200
Functions can take objects (e.g. lists, DataFrames) as arguments. The following code defines a multiplication()
function that takes a list of numbers.
def multiplication(list):
=list[0]*list[1]*list[2]
internalVariablereturn(internalVariable)
=[2,3,4]
numbers multiplication(numbers)
24
9.2 Lambda Functions
A lambda function is a regular function except that it has no name and is contained in a single line of code. Lambda functions might look intimidating at first sight yet they are extremely useful once you get to know them well. In my experience lambda functions are extremely useful to implement complex processing of Pandas DataFrames such as conditional extraction, text manipulation and many others.
Lambda functions are typically used in places where you need a small function for a short period of time. Common scenarios include:
- Function Arguments: When a function requires another function as an argument.
- Inline Operations: When performing simple operations inline without having to define a full function.
The following code defines a lambda function that takes an argument and multiplies it by two. The name of the function is myFirstLambdaFunction
=lambda x: 2*x
myFirstLambdaFunction3) myFirstLambdaFunction(
6
The following figure outlines the building elements of a lambda function, as any other python function it takes some arguments, applies an expression and produces a result.
Example
The following example defines a lambda function that takes an argument and returns it in uppercase format.
=lambda x:x.upper()
myStringProcessor'Hi there, how are things on your side ?') myStringProcessor(
'HI THERE, HOW ARE THINGS ON YOUR SIDE ?'
Example
The following defines a function that takes an argument and returns the natural logarithm of the argument.
import numpy as np
=lambda x:np.log(x)
myLog600) myLog(
6.396929655216146
Example
The following defines a function that takes two arguments and returns the first argument raised to the power of the second argument.
=lambda x,y:x**y
myPower2,3) myPower(
8
9.3 Applying regular functions to Pandas DataFrames
Sometimes the methods provided by Pandas are not enough to perform complex operations or transformations to a DataFrame. The good news is that you can always define your own functions and have them apply to a Pandas DataFrame using the method DataFrame.apply()
.
Example
The following example illustrates how to compute the natural logarithm of a DataFrame column. To do so we define a regular function called my_natural_log()
and have it applied to the column Salary
of the Dataframe df
.
import pandas as pd
import numpy as np
import warnings
"ignore")
warnings.filterwarnings(
# Sample data
= {
data 'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 120000],
'Bonus' : [15000, 10000, 80000]
}
= pd.DataFrame(data)
df df
Name | Age | Salary | Bonus | |
---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 |
1 | Bob | 30 | 80000 | 10000 |
2 | Charlie | 35 | 120000 | 80000 |
def my_natural_log(number):
=np.log(number)
internalvariablereturn(internalvariable)
'Salary'].apply(my_natural_log) df[
0 11.156251
1 11.289782
2 11.695247
Name: Salary, dtype: float64
Indeed we could have applied the np.log()
method straight to the column ‘Salary’ of df
DataFrame:
'Salary'].apply(np.log) df[
0 11.156251
1 11.289782
2 11.695247
Name: Salary, dtype: float64
Example
Given the previous example we could define our own function and have it applied to process two columns
def totalsalaryfunction(row):
=row['Salary']+row['Bonus']
internalVariablereturn(internalVariable)
which applied to the DataFrame df
gives:
'Salary','Bonus']].apply(totalsalaryfunction,axis=1) df[[
0 85000
1 90000
2 200000
dtype: int64
Try to avoid using ad-hoc functions to process Pandas DataFrames. It is always best to use existing methods provided by the Pandas library as they are optimized to perform in large data contexts
Example
We use the following function to print the name and age of each person in the DataFrame df
def nameagefunction(row):
=row['Name']
internalname=row['Age']
internalage=f" Employee {internalname} is {internalage} years old"
internalVariablereturn(internalVariable)
'Name','Age']].apply(nameagefunction,axis=1) df[[
0 Employee Alice is 25 years old
1 Employee Bob is 30 years old
2 Employee Charlie is 35 years old
dtype: object
Try to ask Gemini the following prompt: “create a function to take Name and Age from the DataFrame df and have them printed”
9.4 Applying lambda functions to Pandas DataFrames
The combination of the method DataFrame.apply()
combined with lambda functions is, in my experience, extremely helpful to perform complex operations in pandas.
The application of lambda functions to a Pandas DataFrame can be intimidating at first. The following example will, hopefully, make things easier to follow:
Example (row-by-row processing)
Given the dataframe df
:
df.head()
Name | Age | Salary | Bonus | |
---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 |
1 | Bob | 30 | 80000 | 10000 |
2 | Charlie | 35 | 120000 | 80000 |
We apply a lambda function to process the DataFrame df
on a row by row basis (we specify axis=1
). In this case it simply reads the DataFrame on a row by row basis.
apply(lambda row:row, axis=1) df.
Name | Age | Salary | Bonus | |
---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 |
1 | Bob | 30 | 80000 | 10000 |
2 | Charlie | 35 | 120000 | 80000 |
The following code processes the DataFrame df
reading it on a row by row basis. This time, however, we extract only the first element of each row.
apply(lambda row:row[0],axis=1) df.
0 Alice
1 Bob
2 Charlie
dtype: object
The following code processes the DataFrame df
reading it on a row by row basis. This time, however, we extract the second and third element OF EACH ROW.
apply(lambda row:[row[1],row[2]],axis=1) df.
0 [25, 70000]
1 [30, 80000]
2 [35, 120000]
dtype: object
Example (row-by-row processing)
The following code processes the DataFrame df
reading it on a row by row basis. This time we apply the function upper()
to the first element of each row.
apply(lambda row:row[0].upper(),axis=1) df.
0 ALICE
1 BOB
2 CHARLIE
dtype: object
Example (row-by-row processing)
The following code processes the DataFrame df
reading it on a row by row basis. This time we apply the function np.log()
to the third element of each row (remember than Python counts from 0 to n-1).
apply(lambda row:np.log(row[2]),axis=1) df.
0 11.156251
1 11.289782
2 11.695247
dtype: float64
Fourth example (row-by-row processing)
# Combine name and updated age into a new column
'Name_Age'] = df.apply(lambda row: row['Name'] +' is '+ str(row['Age'])+ ' years old', axis=1)
df[ df
Name | Age | Salary | Bonus | Name_Age | |
---|---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 | Alice is 25 years old |
1 | Bob | 30 | 80000 | 10000 | Bob is 30 years old |
2 | Charlie | 35 | 120000 | 80000 | Charlie is 35 years old |
Example (row-by-row processing)
Indeed we can perform complex manipulations on Pandas DataFrames by combining several regular functions with lambda ones.
The following code:
- Extracts, on a row-by-row basis, the ‘Name’ and the ‘Salary’.
- Applies the methods
upper()
andnp.log()
to ‘Name’ and ‘Salary’ respectively. - Combines the results of the previous operations into a list.
- The list is saved in a new column named ‘Name_Salary’.
'Name_Salary'] = df.apply(lambda row: [row['Name'].upper(),np.log(row['Salary'])], axis=1)
df[ df
Name | Age | Salary | Bonus | Name_Age | Name_Salary | |
---|---|---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 | Alice is 25 years old | [ALICE, 11.156250521031495] |
1 | Bob | 30 | 80000 | 10000 | Bob is 30 years old | [BOB, 11.289781913656018] |
2 | Charlie | 35 | 120000 | 80000 | Charlie is 35 years old | [CHARLIE, 11.695247021764184] |
Example (column-by-column processing)
Sometimes you might need to apply a lambda function to a Pandas DataFrame on a column by column basis.
Given the same DataFrame df
:
= pd.DataFrame(data)
df
df
df.head()
Name | Age | Salary | Bonus | |
---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 |
1 | Bob | 30 | 80000 | 10000 |
2 | Charlie | 35 | 120000 | 80000 |
The following code processes the DataFrame df reading it on a column by column basis. This time, however, we extract only the first element of each column.
apply(lambda row:row[0],axis=0) df.
Name Alice
Age 25
Salary 70000
Bonus 15000
dtype: object
The following code processes the DataFrame df
reading it on a column by column basis. This time, however, we extract the first and second elements of EACH COLUMN.
apply(lambda row:[row[0],row[1]],axis=0) df.
Name | Age | Salary | Bonus | |
---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 |
1 | Bob | 30 | 80000 | 10000 |
The most frequent use case is the application of a lambda function to a Pandas DataFrame on a row by row basis we specify axis=1. If we need to process on a column-by-column basis we specify axis=0
Example (Combining several functions)
Indeed we can perform complex manipulations on Pandas DataFrames by combining several regular functions with lambda ones.
# Combine name and updated age into a new column
'Name_Age'] = df.apply(lambda row: row['Name'] +' is '+ str(row['Age'])+ ' years old', axis=1)
df[ df
Name | Age | Salary | Bonus | Name_Age | |
---|---|---|---|---|---|
0 | Alice | 25 | 70000 | 15000 | Alice is 25 years old |
1 | Bob | 30 | 80000 | 10000 | Bob is 30 years old |
2 | Charlie | 35 | 120000 | 80000 | Charlie is 35 years old |
9.5 Conclusions
This chapter emphasizes the importance of functions in programming, highlighting their role in code reuse, modularization, and simplification of complex tasks. Functions are fundamental in Python, allowing you to package code into reusable blocks that take inputs (arguments), perform specific operations, and return outputs. The chapter demonstrates how functions can handle various data types, including numbers, lists, and even Pandas DataFrames. It introduces the basic structure of regular functions, covering function definition, arguments, return statements, and the importance of proper indentation in Python.
The chapter also introduces lambda functions, which are anonymous, single-line functions ideal for short, simple operations. Lambda functions are particularly powerful when combined with Pandas for data manipulation tasks, such as applying transformations to DataFrame columns or rows. Examples include using lambda functions for arithmetic operations, text formatting, and applying mathematical functions like logarithms. Additionally, the chapter explores how to apply both regular and lambda functions to Pandas DataFrames using the DataFrame.apply()
method, explaining how to process data row-by-row or column-by-column.
In the next chapter you will learn how to combine several DataFrames based on common characteristics (e.g. customer ID).
9.6 Further Reading
For those of you in need of additional, more advanced, topics please refer to the following references:
Functions in Python: Official Python Documentation
Functions in Python : Python for Data Analysis
Python Functions Tutorial: Real Python
Pandas apply: Official reference
Lambda functions: How to use them