9  Introduction to Functions

Functions are important concepts in software programming, they allow you to reuse your code, or others’, as well as to modularize your code making it easier to debug and maintain.

Thus far you have been using functions all the time, to load datasets, to filter data, to print information or just to perform arithmetic operations. As a matter of fact libraries such as Pandas, Numpy or Seaborn are nothing but collections of functions (a.k.a methods) defined by the community for you to benefit from.

From a practical standpoint functions, and methods for that matter, are blocks of code that take some input (e.g. an external dataset in CSV format) and produce some outputs (e.g. a new Pandas DataFrame), refer to the following figure.

Functions as processing entities

In this chapter you will learn how to extend existing functionality with your own functions. This will allow you to implement new features and customize existing code to tackle complex requirements. You will practice defining your own regular functions in Python and have them applied in data science or artificial intelligence contexts. You will also learn how to develop lambda functions and have them applied to process Pandas DataFrames.

Happy codings!

9.1 Regular Functions

Formally speaking a function definition in Python specifies: (1) the name of the function, (2) the arguments it takes, (3) the sequence of statements that run when the function is called and (4) the value (or object) returned to the main program, refer to the following figure.

Regular function definition

Example

The following example defines a basic function, called print_lyrics() that simply prints a set of sentences.

def print_lyrics():
  print("I'm a lumberjack, and I'm okay.")
  print("I sleep all night and I work all day.")

The execution of the function print_lyrics() returns:

print_lyrics()
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
Tip. Code Identation

Proper code Indentation in Python is very important. Please note how the code is indented inside the definition of any function in Python.

Example

The following code defines a function called print_name(), this time taking one argument NameArgument and printing some text.

def print_name(NameArgument):
  print('Hello')
  print('Your name is: '+NameArgument)

The execution of the function print_name with the argument ‘David’ returns:

print_name('David')
Hello
Your name is: David

Example

For illustration purposes let’s assume we need to define a function named fahr_to_celsius() that: (1) takes an argument temp, (2) performs a mathematical operation, (4) stores the result in a new variable internalVariable and (3) returns the result to the main program:

def fahr_to_celsius(temp):
    internalVariable=(temp - 32) * (5/9)
    return(internalVariable)

When invoked this function transforms temperature values from Fahrenheit to Celsius

fahr_to_celsius(120)
48.88888888888889

Example

Functions can be defined to take as many arguments as needed. The following code defines a function that takes two arguments and performs a product operation.

def multiplication(argument1,argument2):
   internalVariable=argument1*argument2
   return(internalVariable)
multiplication(10,20)
200

Functions can take objects (e.g. lists, DataFrames) as arguments. The following code defines a multiplication() function that takes a list of numbers.

def multiplication(list):
   internalVariable=list[0]*list[1]*list[2]
   return(internalVariable)
numbers=[2,3,4]
multiplication(numbers)
24

9.2 Lambda Functions

A lambda function is a regular function except that it has no name and is contained in a single line of code. Lambda functions might look intimidating at first sight yet they are extremely useful once you get to know them well. In my experience lambda functions are extremely useful to implement complex processing of Pandas DataFrames such as conditional extraction, text manipulation and many others.

Lambda functions are typically used in places where you need a small function for a short period of time. Common scenarios include:

  • Function Arguments: When a function requires another function as an argument.
  • Inline Operations: When performing simple operations inline without having to define a full function.

The following code defines a lambda function that takes an argument and multiplies it by two. The name of the function is myFirstLambdaFunction

myFirstLambdaFunction=lambda x: 2*x
myFirstLambdaFunction(3)
6

The following figure outlines the building elements of a lambda function, as any other python function it takes some arguments, applies an expression and produces a result.

Lambda Functions

Example

The following example defines a lambda function that takes an argument and returns it in uppercase format.

myStringProcessor=lambda x:x.upper()
myStringProcessor('Hi there, how are things on your side ?')
'HI THERE, HOW ARE THINGS ON YOUR SIDE ?'

Example

The following defines a function that takes an argument and returns the natural logarithm of the argument.

import numpy as np
myLog=lambda x:np.log(x)
myLog(600)
6.396929655216146

Example

The following defines a function that takes two arguments and returns the first argument raised to the power of the second argument.

myPower=lambda x,y:x**y
myPower(2,3)
8

9.3 Applying regular functions to Pandas DataFrames

Sometimes the methods provided by Pandas are not enough to perform complex operations or transformations to a DataFrame. The good news is that you can always define your own functions and have them apply to a Pandas DataFrame using the method DataFrame.apply().

Example

The following example illustrates how to compute the natural logarithm of a DataFrame column. To do so we define a regular function called my_natural_log() and have it applied to the column Salary of the Dataframe df.

import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")


# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 120000],
    'Bonus' : [15000, 10000, 80000]

}

df = pd.DataFrame(data)
df
Name Age Salary Bonus
0 Alice 25 70000 15000
1 Bob 30 80000 10000
2 Charlie 35 120000 80000
def my_natural_log(number):
    internalvariable=np.log(number)
    return(internalvariable)



df['Salary'].apply(my_natural_log)
0    11.156251
1    11.289782
2    11.695247
Name: Salary, dtype: float64

Indeed we could have applied the np.log() method straight to the column ‘Salary’ of df DataFrame:

df['Salary'].apply(np.log)
0    11.156251
1    11.289782
2    11.695247
Name: Salary, dtype: float64

Example

Given the previous example we could define our own function and have it applied to process two columns

def totalsalaryfunction(row):
   internalVariable=row['Salary']+row['Bonus']
   return(internalVariable)

which applied to the DataFrame df gives:

df[['Salary','Bonus']].apply(totalsalaryfunction,axis=1)
0     85000
1     90000
2    200000
dtype: int64
Applying functions to Pandas DataFrames

Try to avoid using ad-hoc functions to process Pandas DataFrames. It is always best to use existing methods provided by the Pandas library as they are optimized to perform in large data contexts

Example

We use the following function to print the name and age of each person in the DataFrame df

def nameagefunction(row):
   internalname=row['Name']
   internalage=row['Age']
   internalVariable=f" Employee {internalname} is {internalage} years old"
   return(internalVariable)
df[['Name','Age']].apply(nameagefunction,axis=1)
0       Employee Alice is 25 years old
1         Employee Bob is 30 years old
2     Employee Charlie is 35 years old
dtype: object
Using Generative AI for coding purposes

Try to ask Gemini the following prompt: “create a function to take Name and Age from the DataFrame df and have them printed”

9.4 Applying lambda functions to Pandas DataFrames

The combination of the method DataFrame.apply() combined with lambda functions is, in my experience, extremely helpful to perform complex operations in pandas.

The application of lambda functions to a Pandas DataFrame can be intimidating at first. The following example will, hopefully, make things easier to follow:

Example (row-by-row processing)

Given the dataframe df:

df.head()
Name Age Salary Bonus
0 Alice 25 70000 15000
1 Bob 30 80000 10000
2 Charlie 35 120000 80000

We apply a lambda function to process the DataFrame df on a row by row basis (we specify axis=1). In this case it simply reads the DataFrame on a row by row basis.

df.apply(lambda row:row, axis=1)
Name Age Salary Bonus
0 Alice 25 70000 15000
1 Bob 30 80000 10000
2 Charlie 35 120000 80000

The following code processes the DataFrame df reading it on a row by row basis. This time, however, we extract only the first element of each row.

df.apply(lambda row:row[0],axis=1)
0      Alice
1        Bob
2    Charlie
dtype: object

The following code processes the DataFrame df reading it on a row by row basis. This time, however, we extract the second and third element OF EACH ROW.

df.apply(lambda row:[row[1],row[2]],axis=1)
0     [25, 70000]
1     [30, 80000]
2    [35, 120000]
dtype: object

Example (row-by-row processing)

The following code processes the DataFrame df reading it on a row by row basis. This time we apply the function upper() to the first element of each row.

df.apply(lambda row:row[0].upper(),axis=1)
0      ALICE
1        BOB
2    CHARLIE
dtype: object

Example (row-by-row processing)

The following code processes the DataFrame df reading it on a row by row basis. This time we apply the function np.log() to the third element of each row (remember than Python counts from 0 to n-1).

df.apply(lambda row:np.log(row[2]),axis=1)
0    11.156251
1    11.289782
2    11.695247
dtype: float64

Fourth example (row-by-row processing)

# Combine name and updated age into a new column
df['Name_Age'] = df.apply(lambda row: row['Name'] +' is '+ str(row['Age'])+ ' years old', axis=1)
df
Name Age Salary Bonus Name_Age
0 Alice 25 70000 15000 Alice is 25 years old
1 Bob 30 80000 10000 Bob is 30 years old
2 Charlie 35 120000 80000 Charlie is 35 years old

Example (row-by-row processing)

Indeed we can perform complex manipulations on Pandas DataFrames by combining several regular functions with lambda ones.

The following code:

  1. Extracts, on a row-by-row basis, the ‘Name’ and the ‘Salary’.
  2. Applies the methods upper() and np.log() to ‘Name’ and ‘Salary’ respectively.
  3. Combines the results of the previous operations into a list.
  4. The list is saved in a new column named ‘Name_Salary’.
df['Name_Salary'] = df.apply(lambda row: [row['Name'].upper(),np.log(row['Salary'])], axis=1)
df
Name Age Salary Bonus Name_Age Name_Salary
0 Alice 25 70000 15000 Alice is 25 years old [ALICE, 11.156250521031495]
1 Bob 30 80000 10000 Bob is 30 years old [BOB, 11.289781913656018]
2 Charlie 35 120000 80000 Charlie is 35 years old [CHARLIE, 11.695247021764184]

Example (column-by-column processing)

Sometimes you might need to apply a lambda function to a Pandas DataFrame on a column by column basis.

Given the same DataFrame df:

df = pd.DataFrame(data)
df

df.head()
Name Age Salary Bonus
0 Alice 25 70000 15000
1 Bob 30 80000 10000
2 Charlie 35 120000 80000

The following code processes the DataFrame df reading it on a column by column basis. This time, however, we extract only the first element of each column.

df.apply(lambda row:row[0],axis=0)
Name      Alice
Age          25
Salary    70000
Bonus     15000
dtype: object

The following code processes the DataFrame df reading it on a column by column basis. This time, however, we extract the first and second elements of EACH COLUMN.

df.apply(lambda row:[row[0],row[1]],axis=0)
Name Age Salary Bonus
0 Alice 25 70000 15000
1 Bob 30 80000 10000
axis=0 v.s. axis=1

The most frequent use case is the application of a lambda function to a Pandas DataFrame on a row by row basis we specify axis=1. If we need to process on a column-by-column basis we specify axis=0

Example (Combining several functions)

Indeed we can perform complex manipulations on Pandas DataFrames by combining several regular functions with lambda ones.

# Combine name and updated age into a new column
df['Name_Age'] = df.apply(lambda row: row['Name'] +' is '+ str(row['Age'])+ ' years old', axis=1)
df
Name Age Salary Bonus Name_Age
0 Alice 25 70000 15000 Alice is 25 years old
1 Bob 30 80000 10000 Bob is 30 years old
2 Charlie 35 120000 80000 Charlie is 35 years old

9.5 Conclusions

This chapter emphasizes the importance of functions in programming, highlighting their role in code reuse, modularization, and simplification of complex tasks. Functions are fundamental in Python, allowing you to package code into reusable blocks that take inputs (arguments), perform specific operations, and return outputs. The chapter demonstrates how functions can handle various data types, including numbers, lists, and even Pandas DataFrames. It introduces the basic structure of regular functions, covering function definition, arguments, return statements, and the importance of proper indentation in Python.

The chapter also introduces lambda functions, which are anonymous, single-line functions ideal for short, simple operations. Lambda functions are particularly powerful when combined with Pandas for data manipulation tasks, such as applying transformations to DataFrame columns or rows. Examples include using lambda functions for arithmetic operations, text formatting, and applying mathematical functions like logarithms. Additionally, the chapter explores how to apply both regular and lambda functions to Pandas DataFrames using the DataFrame.apply() method, explaining how to process data row-by-row or column-by-column.

In the next chapter you will learn how to combine several DataFrames based on common characteristics (e.g. customer ID).

9.6 Further Reading

For those of you in need of additional, more advanced, topics please refer to the following references: