9 Introduction to Functions

Functions are important concepts in software programming, they allow you to reuse your code, or others’, as well as to modularize your code making it easier to debug and maintain.

Thus far you have been using functions all the time, to load datasets, to filter data, to print information or just to perform arithmetic operations. As a matter of fact libraries such as Pandas, Numpy or Seaborn are nothing but collections of functions (a.k.a methods) defined by the community for you to benefit from.

From a practical standpoint functions, and methods for that matter, are blocks of code that take some input (e.g. an external dataset in CSV format) and produce some outputs (e.g. a new Pandas DataFrame), refer to the following figure.

In this chapter you will learn how to extend existing functionality with your own functions. This will allow you to implement new features and customize existing code to tackle complex requirements. You will practice defining your own regular functions in Python and have them applied in data science or artificial intelligence contexts. You will also learn how to develop lambda functions and have them applied to process Pandas DataFrames.

Happy codings!

9.1 Regular Functions

Formally speaking a function definition in Python specifies: (1) the name of the function, (2) the arguments it takes, (3) the sequence of statements that run when the function is called and (4) the value (or object) returned to the main program, refer to the following figure.

Example

The following example defines a basic function, called print_lyrics() that simply prints a set of sentences.

def print_lyrics():
  print("I'm a lumberjack, and I'm okay.")
  print("I sleep all night and I work all day.")

The execution of the function print_lyrics() returns:

print_lyrics()

I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.

Tip. Code Identation

Proper code Indentation in Python is very important. Please note how the code is indented inside the definition of any function in Python.

Example

The following code defines a function called print_name(), this time taking one argument NameArgument and printing some text.

def print_name(NameArgument):
  print('Hello')
  print('Your name is: '+NameArgument)

The execution of the function print_name with the argument ‘David’ returns:

print_name('David')

Hello
Your name is: David

Example

For illustration purposes let’s assume we need to define a function named fahr_to_celsius() that: (1) takes an argument temp, (2) performs a mathematical operation, (4) stores the result in a new variable internalVariable and (3) returns the result to the main program:

def fahr_to_celsius(temp):
    internalVariable=(temp - 32) * (5/9)
    return(internalVariable)

When invoked this function transforms temperature values from Fahrenheit to Celsius

fahr_to_celsius(120)

48.88888888888889

Example

Functions can be defined to take as many arguments as needed. The following code defines a function that takes two arguments and performs a product operation.

def multiplication(argument1,argument2):
   internalVariable=argument1*argument2
   return(internalVariable)

multiplication(10,20)

Functions can take objects (e.g. lists, DataFrames) as arguments. The following code defines a multiplication() function that takes a list of numbers.

def multiplication(list):
   internalVariable=list[0]*list[1]*list[2]
   return(internalVariable)

numbers=[2,3,4]
multiplication(numbers)

9.2 Lambda Functions

A lambda function is a regular function except that it has no name and is contained in a single line of code. Lambda functions might look intimidating at first sight yet they are extremely useful once you get to know them well. In my experience lambda functions are extremely useful to implement complex processing of Pandas DataFrames such as conditional extraction, text manipulation and many others.

Lambda functions are typically used in places where you need a small function for a short period of time. Common scenarios include:

Function Arguments: When a function requires another function as an argument.
Inline Operations: When performing simple operations inline without having to define a full function.

The following code defines a lambda function that takes an argument and multiplies it by two. The name of the function is myFirstLambdaFunction

myFirstLambdaFunction=lambda x: 2*x
myFirstLambdaFunction(3)

The following figure outlines the building elements of a lambda function, as any other python function it takes some arguments, applies an expression and produces a result.

Example

The following example defines a lambda function that takes an argument and returns it in uppercase format.

myStringProcessor=lambda x:x.upper()
myStringProcessor('Hi there, how are things on your side ?')

'HI THERE, HOW ARE THINGS ON YOUR SIDE ?'

Example

The following defines a function that takes an argument and returns the natural logarithm of the argument.

import numpy as np
myLog=lambda x:np.log(x)
myLog(600)

6.396929655216146

Example

The following defines a function that takes two arguments and returns the first argument raised to the power of the second argument.

myPower=lambda x,y:x**y
myPower(2,3)

9.3 Applying regular functions to Pandas DataFrames

Sometimes the methods provided by Pandas are not enough to perform complex operations or transformations to a DataFrame. The good news is that you can always define your own functions and have them apply to a Pandas DataFrame using the method DataFrame.apply().

Example

The following example illustrates how to compute the natural logarithm of a DataFrame column. To do so we define a regular function called my_natural_log() and have it applied to the column Salary of the Dataframe df.

import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")


# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 120000],
    'Bonus' : [15000, 10000, 80000]

}

df = pd.DataFrame(data)
df

	Name	Age	Salary	Bonus
0	Alice	25	70000	15000
1	Bob	30	80000	10000
2	Charlie	35	120000	80000

def my_natural_log(number):
    internalvariable=np.log(number)
    return(internalvariable)



df['Salary'].apply(my_natural_log)

0    11.156251
1    11.289782
2    11.695247
Name: Salary, dtype: float64

Indeed we could have applied the np.log() method straight to the column ‘Salary’ of df DataFrame:

df['Salary'].apply(np.log)

0    11.156251
1    11.289782
2    11.695247
Name: Salary, dtype: float64

Example

Given the previous example we could define our own function and have it applied to process two columns

def totalsalaryfunction(row):
   internalVariable=row['Salary']+row['Bonus']
   return(internalVariable)

which applied to the DataFrame df gives:

df[['Salary','Bonus']].apply(totalsalaryfunction,axis=1)

0     85000
1     90000
2    200000
dtype: int64

Applying functions to Pandas DataFrames

Try to avoid using ad-hoc functions to process Pandas DataFrames. It is always best to use existing methods provided by the Pandas library as they are optimized to perform in large data contexts

Example

We use the following function to print the name and age of each person in the DataFrame df

def nameagefunction(row):
   internalname=row['Name']
   internalage=row['Age']
   internalVariable=f" Employee {internalname} is {internalage} years old"
   return(internalVariable)

df[['Name','Age']].apply(nameagefunction,axis=1)

0       Employee Alice is 25 years old
1         Employee Bob is 30 years old
2     Employee Charlie is 35 years old
dtype: object

Using Generative AI for coding purposes

Try to ask Gemini the following prompt: “create a function to take Name and Age from the DataFrame df and have them printed”

9.4 Applying lambda functions to Pandas DataFrames

The combination of the method DataFrame.apply() combined with lambda functions is, in my experience, extremely helpful to perform complex operations in pandas.

The application of lambda functions to a Pandas DataFrame can be intimidating at first. The following example will, hopefully, make things easier to follow:

Example (row-by-row processing)

Given the dataframe df:

df.head()

	Name	Age	Salary	Bonus
0	Alice	25	70000	15000
1	Bob	30	80000	10000
2	Charlie	35	120000	80000

We apply a lambda function to process the DataFrame df on a row by row basis (we specify axis=1). In this case it simply reads the DataFrame on a row by row basis.

df.apply(lambda row:row, axis=1)

	Name	Age	Salary	Bonus
0	Alice	25	70000	15000
1	Bob	30	80000	10000
2	Charlie	35	120000	80000

The following code processes the DataFrame df reading it on a row by row basis. This time, however, we extract only the first element of each row.

df.apply(lambda row:row[0],axis=1)

0      Alice
1        Bob
2    Charlie
dtype: object

The following code processes the DataFrame df reading it on a row by row basis. This time, however, we extract the second and third element OF EACH ROW.

df.apply(lambda row:[row[1],row[2]],axis=1)

0     [25, 70000]
1     [30, 80000]
2    [35, 120000]
dtype: object

Example (row-by-row processing)

The following code processes the DataFrame df reading it on a row by row basis. This time we apply the function upper() to the first element of each row.

df.apply(lambda row:row[0].upper(),axis=1)

0      ALICE
1        BOB
2    CHARLIE
dtype: object

Example (row-by-row processing)

The following code processes the DataFrame df reading it on a row by row basis. This time we apply the function np.log() to the third element of each row (remember than Python counts from 0 to n-1).

df.apply(lambda row:np.log(row[2]),axis=1)

0    11.156251
1    11.289782
2    11.695247
dtype: float64

Fourth example (row-by-row processing)

# Combine name and updated age into a new column
df['Name_Age'] = df.apply(lambda row: row['Name'] +' is '+ str(row['Age'])+ ' years old', axis=1)
df

	Name	Age	Salary	Bonus	Name_Age
0	Alice	25	70000	15000	Alice is 25 years old
1	Bob	30	80000	10000	Bob is 30 years old
2	Charlie	35	120000	80000	Charlie is 35 years old

Example (row-by-row processing)

Indeed we can perform complex manipulations on Pandas DataFrames by combining several regular functions with lambda ones.

The following code:

Extracts, on a row-by-row basis, the ‘Name’ and the ‘Salary’.
Applies the methods upper() and np.log() to ‘Name’ and ‘Salary’ respectively.
Combines the results of the previous operations into a list.
The list is saved in a new column named ‘Name_Salary’.

df['Name_Salary'] = df.apply(lambda row: [row['Name'].upper(),np.log(row['Salary'])], axis=1)
df

	Name	Age	Salary	Bonus	Name_Age	Name_Salary
0	Alice	25	70000	15000	Alice is 25 years old	[ALICE, 11.156250521031495]
1	Bob	30	80000	10000	Bob is 30 years old	[BOB, 11.289781913656018]
2	Charlie	35	120000	80000	Charlie is 35 years old	[CHARLIE, 11.695247021764184]

Example (column-by-column processing)

Sometimes you might need to apply a lambda function to a Pandas DataFrame on a column by column basis.

Given the same DataFrame df:

df = pd.DataFrame(data)
df

df.head()

	Name	Age	Salary	Bonus
0	Alice	25	70000	15000
1	Bob	30	80000	10000
2	Charlie	35	120000	80000

The following code processes the DataFrame df reading it on a column by column basis. This time, however, we extract only the first element of each column.

df.apply(lambda row:row[0],axis=0)

Name      Alice
Age          25
Salary    70000
Bonus     15000
dtype: object

The following code processes the DataFrame df reading it on a column by column basis. This time, however, we extract the first and second elements of EACH COLUMN.

df.apply(lambda row:[row[0],row[1]],axis=0)

	Name	Age	Salary	Bonus
0	Alice	25	70000	15000
1	Bob	30	80000	10000

axis=0 v.s. axis=1

The most frequent use case is the application of a lambda function to a Pandas DataFrame on a row by row basis we specify axis=1. If we need to process on a column-by-column basis we specify axis=0

Example (Combining several functions)

Indeed we can perform complex manipulations on Pandas DataFrames by combining several regular functions with lambda ones.

# Combine name and updated age into a new column
df['Name_Age'] = df.apply(lambda row: row['Name'] +' is '+ str(row['Age'])+ ' years old', axis=1)
df

	Name	Age	Salary	Bonus	Name_Age
0	Alice	25	70000	15000	Alice is 25 years old
1	Bob	30	80000	10000	Bob is 30 years old
2	Charlie	35	120000	80000	Charlie is 35 years old

9.5 Code Interpretation Challenge

Following please find some examples of python code. Try to understand what the code is trying to accomplish before checking the solution below:

Example: Squaring numbers

# Function definition
def square_number(num):
    
    return num ** 2

# Applying the function
result = square_number(5)

# Display the result
print("The square of 5 is:", result)

Explanation:

The function square_number() takes an input num and returns its square using the ** operator. The function is then called with the argument 5, and the result is printed.

The square of 5 is: 25

Example: Powering numbers

# Function definition
def power_function(base,exponent):
    
    return base ** exponent

# Applying the function
result = power_function(5,3)

# Display the result
print("The result is:", result)

Explanation:

The function power_function() takes two inputs base and exponent and returns the base raised to the power of the exponent using the ** operator. The function is then called with the arguments 5 and 3, and the result is printed.

The result is: 125

Example: Processing Strings

# Function definition
def concatenate_strings(str1, str2):
   
    return str1 + " " + str2

# Applying the function
full_name = concatenate_strings("John", "Doe")

# Display the result
print("Full name:", full_name)

Explanation:

The function concatenate_strings() takes two string arguments, concatenates them with a space in between, and returns the result. It is applied to concatenate the strings “John” and “Doe”.

Full name: John Doe

Example: Powering numbers using lambda functions

# Lambda function to raise a number to a given power
power_function = lambda base, exponent: base ** exponent

# Applying the lambda function
result = power_function(5, 3)

# Display the result
print("The result is:", result)

Explanation:

The lambda base, exponent: base ** exponent creates an anonymous function that takes base and exponent as inputs and returns base raised to the power of exponent. The function is called with values 5 and 3, and the result is printed.

The result is: 125

Example: Processing Strings using lambda functions

concatenate_strings = lambda str1, str2: str1 + " " + str2

# Applying the lambda function
full_name = concatenate_strings("John", "Doe")

# Display the result
print(full_name)

Explanation:

The lambda function lambda str1, str2: str1 + ” ” + str2 takes two string inputs and concatenates them with a space in between. The function is called with “John” and “Doe”, and the result is printed.

John Doe

9.6 Functions in Practice

The following Jupyter Notebook provides examples of data processing using functions a healthcare context.Please execute the code making sure that you understand how it works and the logic behind the analysis.

9.7 Conclusions

This chapter emphasizes the importance of functions in programming, highlighting their role in code reuse, modularization, and simplification of complex tasks. Functions are fundamental in Python, allowing you to package code into reusable blocks that take inputs (arguments), perform specific operations, and return outputs. The chapter demonstrates how functions can handle various data types, including numbers, lists, and even Pandas DataFrames. It introduces the basic structure of regular functions, covering function definition, arguments, return statements, and the importance of proper indentation in Python.

The chapter also introduces lambda functions, which are anonymous, single-line functions ideal for short, simple operations. Lambda functions are particularly powerful when combined with Pandas for data manipulation tasks, such as applying transformations to DataFrame columns or rows. Examples include using lambda functions for arithmetic operations, text formatting, and applying mathematical functions like logarithms. Additionally, the chapter explores how to apply both regular and lambda functions to Pandas DataFrames using the DataFrame.apply() method, explaining how to process data row-by-row or column-by-column.

In the next chapter you will learn how to combine several DataFrames based on common characteristics (e.g. customer ID).

9.8 Further Reading

For those of you in need of additional, more advanced, topics please refer to the following references:

Functions in Python: Official Python Documentation
Functions in Python : Python for Data Analysis
Python Functions Tutorial: Real Python
Pandas apply: Official reference
Lambda functions: How to use them