Python for Data Science

A Crash Course



Introduction to Python Programming



Khalil El Mahrsi
2023
Creative Commons License

Outline

Basics (variables, objects, built-in types, ...)
Control Flow (if, while, for statements, ...)
Functions (writing reusable code)
Errors and Exceptions (handling when things go wrong)
Data Structures (lists, tuples, dictionaries, ...)

Basics

What is Python?

  • High-level, interpreted, general-purpose programming language
  • Named after the Monty Python's Flying Circus TV series , not the snake species
  • One specification, multiple implementations
    • CPython: reference implementation, written in C, offers the highest compatibility with packages and extensions
    • PyPy, Jython (Java implementation), ...
  • Currently the most popular programming language for mainstream data science (but not as rich as R for niche needs)

Environment Setup Recommendations

Executing Python Code

Two ways to run Python code
  • By using the Python interpreter in interactive mode
  • By executing Python scripts (source files)

Interacting With the Python Interpreter

  • The Python interpreter can be launched in interactive mode by typing the python command in the Unix shell (terminal)
  • The interpreter prompts for the next instruction with the primary prompt (>>>)
  • Instructions are read from the standard input (e.g., keyboard)
  • The results are directed to the standard output (e.g., screen)
  • Use quit() to exit the interpreter
% python
Python 3.8.5 (default, Aug  5 2020, 03:39:04) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> welcome_message = "Hello world!"
>>> print(welcome_message)
Hello world!
  >>> 1 + 2
  3
>>> quit() # exit() or [ctrl + d] ([ctrl + z] for Windows) also works%

Using Python Source Files

  • The Python interpreter can also be invoked while passing the name of a Python script (.py source file) as an argument
  • The statements in the file are read and executed by the interpreter

Content of hello_world.py file


# -*- coding: utf-8 -*-
welcome_message = "Hello world!"
print(welcome_message)
1 + 2

Execution

% python hello_world.pyHello world!%

Statements and Expressions

  • Instructions that can be executed by the Python interpreter are called statements
    • A simple statement is contained within a single logical line
  • Expressions are combinations of values, variables, operators, and function calls that the interpreter can evaluate
    • The evaluation of an expression produces a value
    • The expression's content is evaluated in a specific order defined by the operator precedence

>>> welcome_message = "Hello world!" # this is an assignment statement
>>> print(welcome_message) # another statement
Hello world!
>>> 1 + 2 # expression statement with one expression combining the + operator with the values 1 and 2
3

Objects

  • In Python, everything is referred to as an object
  • Objects have types that define possible values and operations
    • "Hello world!" is an object of type str (string)
    • 1 and 2 are objects of type int (integer)
    • Even the function print is an object of type builtin_function_or_method (more on functions later)

Variables

  • Variables are names used to reference objects
  • The (re)binding between names and objects is done through assignment statements
    • The expression on the right is evaluated and the name on the left is assigned to the resulting object

Syntax


        variable_name = expression
        

Examples


>>> welcome_message = "Hello world!"
>>> print(welcome_message)
Hello world!
>>> a, b = 1, 2 # multiple assignments can be done in the same statement
>>> print(a)
1
>>> print(b)
2
>>> a += b # augmented assignment statement, equivalent to a = a + b
>>> print(a)
3

Naming Rules

  • Names given to variables (and functions) must adhere to three rules
    1. The name can only contain letters (A-Z, a-z), digits (0-9), and underscores ( _ )
    2. The first character must be a letter or an underscore (it cannot start with a digit)
    3. The name must not be one of the keywords reserved by Python
  • Examples of valid names: a, B, foo, bar_2_3, _name, snake_case_name, mixedCaseName, CamelCaseName, UPPERCASENAME, ...
  • Examples of invalid names: 2items, return, name with spaces, ab>cd@3, X Æ A-12 (sorry, Elon Musk 😢), ...

Naming Rules

Comments

  • A comment starts with the hash character (#) and ends at the end of a physical line
  • Comments are for humans, they are ignored by the interpreter
  • Obvious or redundant comments must be avoided
  • Comments should explain why something is done in a specific way
  • The how (the code) should be self-explanatory (if that is not the case, it can very likely be simplified or restructured!)

Numbers

  • Python mainly provides three built-in numeric types
    • Integers (int) : 1, 3, -5, ...
    • Floating point numbers (float): 1., 2.45, 3.5e-6, ...
    • Complex numbers (complex): 1 + 2j, 4.5j, ...
  • Numbers are created by numeric literals or built-in functions and operators
    • Unadorned numeric literal → integer
    • Numeric literal containing a decimal point (.) or exponent sign (e) → floating point number
    • Numeric literal with appended j (or J) → complex number

Operations on Numbers

  • Numeric types support a wide variety of operations
    • Arithmetics (+, -, *, /, %, ...)
    • Comparisons (==, !=, <, >, <=, >=)
    • Type conversions
    • Type-specific methods
  • Execution of operations in the same statement follows a strict order (operator precedence)
  • Mixed binary arithmetic operations are fully supported: when the types of the operands differ, the “narrower” operand is widened

>>> 1 + 2 * 3

7

>>> (1 + 2) * 3 # () change operations' order

9

>>> 3 ** 2 # power, equivalent to pow(3, 2)

9

>>> 7.5 / 2. # quotient

3.75

>>> 7.5 // 2. # floored quotient

3.0

>>> 7.5 % 2. # remainder

1.5

>>> 2.5 * 1e2

250.0

>>> 2.5 == 3.75

False

>>> (2.5 <= 3.75) and (3.75 < 5)

True

>>> 2.5 <= 3.75 < 5 # this is also permitted

True

>>> float(3) # cast (convert) int to float

3.0

>>> 2 + 4.7 # 2 is converted to float before sum

6.7

>>> 1 + 3j + 2 + 0.4J # both j and J accepted

(3+3.4j)

>>> 1 + 4 j # beware of space in imaginary part

File "<stdin>", line 1
1 + 4 j
      ^
SyntaxError: invalid syntax

Booleans (Truth Values)

  • Two boolean values, the constants False and True are used by Python to represent truth values
  • Any Python object can be tested for truth value or used in boolean operations (cf. next slide)
  • Objects that are considered false
    • The False and None constants
    • Zero in any numeric type: 0, 0.0, 0j, ...
    • Empty sequences and collections: "", {}, (), [], ... (more on these later)
  • Booleans behave like integers (0 and 1) in numeric contexts (e.g., with arithmetic operations)

Truth Value Testing

Comparisons

Operation Meaning
< Strictly less than
<= Less than or equal to
> Strictly greater than
>= Greater than or equal to
== Equal to
!= Not equal to
is Object identity
is not Negated object identity
in* Member of
not in* Not member of

* Supported by iterable types (more details later)

  • <, <=, >, and >= are only defined where it makes sense
  • Objects of different types (except numeric types) never compare equal

Boolean Operations

Operation Result
x or y if x is false, then y, else x
x and y if x is false, then x, else y
not x if x is false, then True, else False
  • and and or are short-circuit operators: the second argument is not always evaluated
  • and and or return one of their operands, not boolean results
  • Other operations and built-in functions that have boolean results return 0 or False for false and 1 or True for true

Truth Value Testing Examples


>>> (1 + 3) == 4

True

>>> 4 == 4.0 # different numeric types, same value

True

>>> 4 == "4" # "4" is non-numeric (string)

False

>>> 3 * False + 5. * True + True

6.0

>>> (3 < 5) and (6 < 2) # both operands evaluated

False

>>> (3 < 5) or (6 < 2) # only (3 < 5) evaluated

True

>>> (3 > 5) and (2 < 6) # only (3 > 5) evaluated

False

>>> bool(-4.75) # cast to boolean

True

>>> 7.5 or True # and and or return operands (!!!)

7.5

>>> True or 7.5 # and and or return operands (!!!)

True

>>> 1 < (1 + 4j) # < does not make sense

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances
of 'int' and 'complex'

>>> # don't worry if you don't fully understand the next
>>> # examples
>>> animals = ["cat", "dog", "chicken"] # a list of animals
>>> cat in animals

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'cat' is not defined

>>> "cat" in animals # check membership

True

>>> ("dog" in animals) and ("elephant" not in animals)

True

>>> domestic_animals = ["cat", "dog", "chicken"]
>>> domestic_animals == animals # equality test

True

>>> domestic_animals is animals # identity test

False
  

Strings

  • Strings are sequences of characters that represent textual data
  • Handled with str objects
  • String literals can be written in three ways
    • Single quotes: 'This is a string'
    • Double quotes: "This is another string"
    • Triple quotes (can span multiple lines): """Yet another string""" or '''Same but with single quotes'''
  • Special characters, e.g., quotes (') in single quote strings, can be escaped using backslash (\)
    • Useful escape sequences: \' (single quote), \" (double quote), \n (line feed), \t (horizontal tab), \v (vertical tab), ...

String Operations and Methods

  • Strings can be concatenated with the + operator (can be incompatible with other types)
  • Strings support indexing and slicing string[start:end:step]
    • Positions always start at 0
    • start is included, end is excluded
    • start omitted → 0
    • end omitted → last_index + 1
    • step omitted → 1
    • Negative indexing or slicing means reverse order
  • Strings implement many useful methods for text manipulation
    • lower(), upper(), ... → change letters casing
    • lstrip(), rstrip(), ... → handle trailing white space
    • islower(), isalpha(), isdecimal(), ... → verify properties

>>> "Hello" + "world" + "!" # string concatenation

'Helloworld!'

>>> "John's age is " + 30 # try to concatenate with int

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str

>>> "John's age is " + str(30)

"John's age is 30"

>>> 3 * "Hello! "

'Hello! Hello! Hello! '

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> alphabet[0]

'a'

>>> alphabet[-1] # reverse order indexing

'z'

>>> alphabet[:4] # slicing (same as alphabet[0:4])

'abcd'

>>> alphabet[4:7] # 5th to 6th character 

'efg'

>>> alphabet[20:] # slice from 21st character to end of string

'uvwxyz'

>>> alphabet[::2] # one every two characters

'acegikmoqsuwy'

>>> alphabet[-1:-6:-1] # last five characters in reverse order

'zyxwv'

>>> book = "The Hound of the Baskervilles"
>>> book.upper()

'THE HOUND OF THE BASKERVILLES'

>>> book.split() # split into a list of words

['The', 'Hound', 'of', 'the', 'Baskervilles']

>>> "      Too much trailing whitespace        ".strip()

'Too much trailing whitespace'

>>> "Hi!".isalpha()

False
>>> "1234".isnumeric()

True

String Formatting

  • Strings can be constructed from other values and variables by using the format() method
    • The format string contains replacement fields, specified using curly braces {}
    • Replacement fields
      • Are substituted by the provided values
      • Can contain names (e.g., {name}, {age}, ...) of keyword arguments or numeric indexes (e.g., {0}, {1}, ...) of positional arguments
      • Can contain format specifications (e.g., :%, :.2f, ...) that customize their presentation

Syntax


format_string.format(value_1, value_2, ...)
        

String Formatting Examples


>>> author = "Arthur Conan Doyle"
>>> book = "The Hound of the Baskervilles"
>>> birthplace = "Edinburgh"
>>> print("{} was originally published in {}.".format(book, 1902))

The Hound of the Baskervilles was originally published in 1902.

>>> print("{1} is the author of {0}.".format(book, author)) # using indexes of positional arguments

Arthur Conan Doyle is the author of The Hound of the Baskervilles.

>>> print("{name} was born in {birthplace}.".format(name=author, birthplace=birthplace)) # using names of keyword arguments

Arthur Conan Doyle was born in Edinburgh.

>>> print("You scored {:.2%} on the exam!".format(0.95)) # format as percentage w/2 significant digits

You scored 95.00% on the exam!

>>> print("The world's population in {year} was {population:,.0f}.".format(year=2018, population=7.59e9))

The world's population in 2018 was 7,590,000,000.

Mutable vs. Immutable Objects

  • An immutable object can't be modified after it is created
    • Operations on immutable objects yield new objects
  • int, float, bool, and str objects are all immutable
  • Mutable objects can be modified after they are created
    • Operations on mutable objects can either modify the object in place or yield new objects (so read the docs)
  • Mutables you will use the most (detailed in Data Structures section)
    • Lists (list): ordered sequence of objects
      • e.g., ["cat", "dog", "chicken"]
    • Sets (set): unordered collections of unique objects
      • e.g., {"cat", "dog", "chicken"}
    • Dictionaries (dict): key → value mappings
      • e.g., {"name" : "Cloud", "age": 21, "job": "mercenary"}

Mutable vs. Immutable Objects


>>> text = "Meat"
>>> id(text)  # check object's identity

140385175741424

>>> text[2] = "e"  # trying to modify the string will raise an exception (result in an error)
  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

>>> text = text + " is delicious!"
>>> print(text)

Meat is delicious!

>>> id(text) # text is now bound to a new object

140385175731376

>>> animals = ["cat", "dog", "chicken"]
>>> id(animals)
140385175649664

>>> animals[2] = "duck" # try to replace the third member of the list
>>> animals
['cat', 'dog', 'duck']

>>> id(animals) # re-check the object's identity (it remains the same)
140385175649664

Control Flow

Control Flow

  • Python executes statements in a sequential (i.e., top-down) order
  • Control flow statements make it possible to alter this flow
    • Execute one or more statements only if a specific condition is met (if statement)
    • Repeatedly execute one or more statements...
      • ... as long as a specific condition is met (while statement)
      • ... for each member of a sequence of objects (for statement)

if Statements

The statement is executed as follows

  • The truth value of the expression in the if clause's header is evaluated
  • If (and only if) it evaluates to true, then the block of indented statements (called suite) is executed
  • The rest of the program is executed

Syntax (simple form)


if expression:
    statement_1
    statement_2
    ...
    statement_n
rest_of_program
      

Flow diagram (simple form)

if statement flow diagram.

if Statements

The statement is executed as follows

  • If the expression in the if clause's header is true
    • That clause's suite is executed
    • The rest of the program is executed
  • If not, the same principle is applied in order to the following elif (optional) clauses
  • If all expressions are false
    • The suite of the else (optional) clause is executed
    • The rest of the program is executed

Syntax (general form)


if expression_1:
    statement_1
    ...
elif expression_2:
    ...
...
elif expression_n:
    ...
else:
    ...
rest_of_program
      

if Statements

Flow diagram (general form)

if statement flow diagram.

Syntax (general form)


if expression_1:
    statement_1
    ...
elif expression_2:
    ...
...
elif expression_n:
    ...
else:
    ...
rest_of_program
      

if Statements Example

Code


number = int(input("Please enter an integer: ")) # ask user to provide input
if (number % 2):
    print("{} is an odd number".format(number))
elif not ((number % 3) or (number % 4)):
    print("{} is even and can be divided by both 3 and 4".format(number))
elif number > 10:
    print("{} is even and greater than 10".format(number))
print("Finished") # rest of the program

Outputs of different executions


Please enter an integer: 1
1 is an odd number
Finished

Please enter an integer: 8

Finished

Please enter an integer: 12
12 is even and can be divided by both 3 and 4
Finished

Please enter an integer: 16
16 is even and greater than 10
Finished

while Statements

The statement is executed as follows

  • The expression in the while clause's header is evaluated
  • If the expression is true
    • The suite is executed
    • The while statement loops on itself (i.e., goes back to the expression evaluation step)
  • If the expression is false, the rest of the program is executed

Syntax


while expression:
    statement_1
    statement_2
    ...
    statement_n
rest_of_program
      

Flow diagram

if statement flow diagram.

while Statements Examples


>>> i = 0
>>> while (i < 5):
...     print(i)
...     i += 1 # if i never changes, the loop will print 0 indefinitely
... 

0
1
2
3
4

  >>> i, j = 0, 1
  >>> while (i < 10):
  ...     j += 1
  ...     print("i = {}. j = {}.".format(i, j))
  ...     if (j == 4): # if j reaches 4, break out of the loop
  ...         break
  ... 

i = 0. j = 2.
i = 0. j = 3.
i = 0. j = 4.

for Statements

The for statement is executed as follows

  • The expression in the clause's header is evaluated to produce an iterable
  • For each object in the iterable
    • element is bound to that object
    • The suite is executed
  • Once the end of the iterable is reached, the rest of the program is executed

Syntax


for element in expression:
    statement_1
    statement_2
    ...
    statement_n
rest_of_program
      

Flow diagram

if statement flow diagram.

Iterables

  • An iterable is an object capable of returning its members one at a time
    • Iterables contain a collection of objects (members)...
    • ... that can be iterated over (traversed) one by one
  • Iterables you will interact most with are sequences and collections
    • Sequences (ordered collections): strings (str), lists* (list), tuples* (tuple), ranges (range), ...
    • Unordered collections: sets* (set), frozen sets* (frozenset), dictionaries* (dict), ...

* Will be discussed in detail in the Data Structures section of the course

for Statements Examples


>>> for char in "Hello": # you can iterate over strings (sequences of characters)
...     print(char)
... 

H
e
l
l
o

  >>> for i in range(10): # range objects are immutable sequences of integers (0–9 here)
  ...     print(i)
  ...     if i == 5:
  ...         break # you can also break out of for loops
  ... 
  
  0
  1
  2
  3
  4
  5
  
  >>> animals = ["cat", "dog", "chicken"] # a list of animals
  >>> for animal in animals:
  ...     print(animal)
  ... 
  
  cat
  dog
  chicken

>>> for i, animal in enumerate(animals): # use the enumerate() function if you need the index while looping
...     print("animals[{}] contains {}.".format(i, animal))
... 

animals[0] contains cat.
animals[1] contains dog.
animals[2] contains chicken.

Unidiomatic Control Flow


>>> animals = ["cat", "dog", "chicken"] # a list of animals
>>> # this is bad
>>> i = 0
>>> while (i < len(animals)): # len() returns the length (number of elements) of a sequence or a collection
...     print(animals[i])
...     i += 1
... 

cat
dog
chicken

>>> for i in range(len(animals)): # this is equally bad
...   print(animals[i])
... 

cat
dog
chicken

>>> for animal in animals: # the most idiomatic (pythonic) way
...     print(animal)
... 

cat
dog
chicken

break and continue Statements

  • Loop iterations can be skipped using break or continue statements
    • Can only occur in for and while loops
    • break → terminates the nearest enclosing loop
    • continue → continues with the next cycle of the nearest enclosing loop (terminates the current cycle only)

>>> for i in range(1, 10):
...     if not (i % 3):
...         continue # only iterations involving multiples of 3 are skipped
...     print(i)
...      

1
2
4
5
7
8

>>> for i in range(1, 10):
...     if not (i % 3):
...         break # as soon as the 1st multiple of 3 is encountred, execution of all subsequent loops is skipped
...     print(i)
... 

1
2

Functions

Functions and Methods

  • Functions are reusable blocks of statements that can be executed (called) any number of times
  • Functions are one of the fundamental and most important building blocks in programming
  • Methods are functions associated with objects of a specific type
  • Benefits
  • Examples of functions used so far: print(), input(), len(), enumerate(), ...
  • Examples of methods used so far: format(), upper(), isnumeric(), isalpha() (all defined for str objects), ...

Defining Functions

Syntax


      def function_name(param_1, param_2, ...):
          statement_1
          statement_2
          ...
      
  • Functions are defined through def statements
  • Function definition statements are compound statements that contain
    • The function's signature
      • The function's name
      • A pair of parentheses (), potentially enclosing a sequence of comma-separated parameters (inputs used by the function)
    • The function's body: an indented block of statements, executed when the function is called
  • Executing the statement binds the function name in the current local namespace to a function object (the executable code)

Calling Functions

Syntax


      function_name(arg_1, arg_2, ...)
      
  • Functions are executed through function calls
  • Function calls are executed as follows
    • The arguments are evaluated
    • A new namespace is created (more on those in a bit)
    • The formal parameters (in the function definition) are bound to the corresponding arguments in the new namespace
    • The function's body is executed
    • The namespace is discarded
    • The function call is replaced by a value (cf. next slide)

>>> def say_hi(): # function definition with no parameters
...     print("Hi!")
... 
>>> say_hi()  

Hi!

  >>> def add(a, b): # function definition with two parameters a and b
  ...     print(a + b)
  ... 
  >>> add(5, 10) # function call: a is bound to 5, b to 10, and the body is executed  
  
  15
  

return Statements

Syntax (inside function definition only)


      return expression_list
      

Example


>>> def add(a, b):
...     return a + b
... 
>>> a = add(7, 10) # a is not the parameter
>>> b = add(20, 5) # same for b
>>> c = add(a, b)
>>> print(c)

42
  • Functions can return values that can be used later on
  • This is done by using return statements
  • When a return statement is encountred
    • The function's execution is terminated
    • Its value is the value of the expression list
  • If the function does not contain a return statement, then its value is None

Namespaces

  • A namespace is a collection of mappings between names and the objects they are bound to
  • Multiple namespaces can co-exist at the same time
  • Namespaces are isolated: the same name can exist in different namespaces with different bindings
  • Python has three namespace “types”
    • Built-in namespace
      • Created when the Python interpreter starts up
      • Contains all built-in names and exceptions
    • Global namespace
      • Created when the execution of a program starts
      • Lasts until the interpreter quits
    • Local (function) namespace
      • Created when a function is called
      • Deleted when the function finishes executing

Scopes

  • A scope is the part of the code where a namespace is directly accessible
  • To resolve a name, Python looks for it in the different namespaces in the following order
    1. Local namespace (if the name reference is in a function)
    2. Enclosing namespaces (from inner-most to outer-most)
    3. Global namespace
    4. Built-in namespace
  • global and nonlocal statements change how a name is resolved
    • global → direct reference to global namespace
    • nonlocal → direct reference to nearest enclosing namespace

Scopes and Namespaces

Code (scroll down to see the rest)


l = [1, 2, 3]

def outer_func():
    l = [4, 5, 6]

    def inner_func_a():
        l = [7, 8, 9]
        print("Inside inner_func_a:", l)

    def inner_func_b():
        global l # will be looked up in top-most scope -> global
        print("Inside inner_func_b:", l)

    def inner_func_c():
        nonlocal l  # looked up in nearest enclosing scope -> outer_func()
        print("Inside inner_func_c:", l)

    def inner_func_d():
        nonlocal l
        l = [10, 11, 12] # modify binding
        print("Inside inner_func_d:", l)

    def inner_func_e():
        nonlocal l
        l.append(42) # modify list
        print("Inside inner_func_e:", l)

    print("Before inner_func_* calls:", l)
    inner_func_a()
    print("After inner_func_a call:", l)
    inner_func_b()
    print("After inner_func_b call:", l)
    inner_func_c()
    print("After inner_func_c call:", l)
    inner_func_d()
    print("After inner_func_d call:", l)
    inner_func_e()
    print("After inner_func_e call:", l)

print("Before outer_func call:", l)
outer_func()
print("After outer_func call:", l)

Output


Before outer_func call: [1, 2, 3]

Before inner_func_* calls: [4, 5, 6]

Inside inner_func_a: [7, 8, 9]

After inner_func_a call: [4, 5, 6]

Inside inner_func_b: [1, 2, 3]

After inner_func_b call: [4, 5, 6]

Inside inner_func_c: [4, 5, 6]

After inner_func_c call: [4, 5, 6]

Inside inner_func_d: [10, 11, 12]

After inner_func_d call: [10, 11, 12]

Inside inner_func_e: [10, 11, 12, 42]

After inner_func_e call: [10, 11, 12, 42]

After outer_func call: [1, 2, 3]

Passing Arguments

  • Arguments can be passed to function calls in two ways
    • As positional arguments: parameters are bound to arguments in order of appearance (from left to right)
      function_name(arg_1, arg_2, ...)
    • As keyword arguments: parameters are bound to the arguments they are associated with in the call (order ignored)
    • function_name(param_1=arg_1,param_2=arg_2, ...)
  • The number of arguments in the call must be the same as the number of parameters in the function's definition, unless the latter contains default parameter values (presented next)

Passing Arguments


>>> def add(a, b):
...     return a + b
... 
>>> add(1, 2) # 1 and 2 are positional args, params are bound in order: a -> 1, b -> 2

3

>>> add(b=3, a=4) # 3 and 4 are keyword args, bindings are based on name associations, not order: a -> 4, b -> 3

7

>>> add(4, b=10) # positional and keyword args can be mixed: a -> 4 (based on order), b -> 10 (based on name)

14

>>> add(a=1, 3) # positional args must appear first (once a kwarg appears, all those that follow must be kwargs)

  File "<stdin>", line 1
SyntaxError: positional argument follows keyword argument

>>> add(*[3, 5]) # (advanced) lists can be "unpacked" into positional args (equivalent to add(3, 5))

8

>>> add(**{ "b": 7, "a": 4}) # (advanced) dicts can be unpacked into kwargs (equivalent to add(a=4, b=7))

11

Passing Arguments

  • Modifications to a mutable argument inside the function will be visible to the caller
  • Rebindings inside the function will not change those of the caller

Passing Arguments


>>> def add_animal(animal, animal_list):
...     animal_list.append(animal) # add new animal to list
...     return animal_list
... 
>>> def reassign_animals(animal_list):
...     animal_list = ["dove", "sparrow", "eagle"] # rebind the name to a new list
...     return animal_list
... 
>>> animals = ["cat", "dog", "chicken"]
>>> 
>>> reassigned_animals = reassign_animals(animals)
>>> 
>>> print("animals:", animals) # animals is still bound to the same list

animals: ['cat', 'dog', 'chicken']

>>> print("reassigned_animals:", reassigned_animals)
reassigned_animals: ['dove', 'sparrow', 'eagle']

>>> new_animals = add_animal("elephant", animals)
>>> 
>>> print("animals:", animals) # animals did get altered
animals: ['cat', 'dog', 'chicken', 'elephant']

>>> print("new_animals:", new_animals)
new_animals: ['cat', 'dog', 'chicken', 'elephant']

Default Parameter Values

  • Function definitions can assign default values to parameters
  • def function_name(param_1, ..., param_i=expression_i, ..., param_n=expression_n):
        function_body
  • If a parameter has a default value, the corresponding argument can be omitted in function calls
    • The default value is used in this case

Default Parameter Values


>>> def add(a=1, b): # params with non-default values must precede those that have defaults
...     return a + b
... 

  File "<stdin>", line 1
SyntaxError: non-default argument follows default argument

>>> def add(a, b=1): # b has a default value of 1
...     return a + b
... 
>>> add(5) # equivalent to add(5, 1)

6

Default Parameter Values Pitfall

  • No re-evaluation each time the function is called
  • Mutable defaults (e.g., lists) can be mutated for subsequent calls

The following code...


def append_to(element, to=[]):
    to.append(element)
    return to

my_list = append_to(12)
print(my_list)

my_other_list = append_to(42)
print(my_other_list)

... outputs


[12]
[12, 42]

Recursion

  • A recursive function is a function that contains calls to itself (within its body)
  • A recursive function has
    • One or more base cases → the result is produced directly
    • One or more recursive cases → the function calls itself (recurs)

Example*


>>> def factorial(number):
...     if number > 0: # recursive case
...         return number * factorial(number - 1)
...     else: # trivial case (number == 0)
...         return 1
... 
>>> factorial(0) # solved directly

1

>>> factorial(3) # factorial(3) -> calls factorial(2) -> calls factorial(1) -> calls factorial(0)

6

* Something is fishy in this example. Can you point it out?

Function Annotations and Type Hinting

  • Function definitions can include annotations that indicate variable and return type hints (Python 3.5+)
  • The Python runtime does not enforce function and variable type annotations
  • But they can be used by third-party tools (IDEs, linters, type checkers, ...)

Syntax


      def function_name(param_1: type_1, param_2: type_2, ...) -> return_type:
          function_body
      

Example


>>> def add(a: int, b: int = 1) -> int:
...     return a + b
... 
>>> add(1, 2)

3

>>> add("Hello", "world") # this does not generate an error even though strings are used!

'Helloworld'

Documenting Functions (Docstrings)

Syntax


      def function_name(...):
          """Function documentation (docstring)"""
          rest_of_function_body
      
  • The very first statement of a function's body can be a string literal that documents the function (docstring)
    • Can be accessed using the help() function
    • Can be used by tools such as Sphinx to generate documentation

>>> def add(a: int, b: int = 1) -> int:
...     """Add two integers.
... 
...     Args:
...         a (int): The first integer.
...         b (int, optional): The second integer. Defaults to 1.
... 
...     Returns:
...         int: The sum of the two integers.
...     """
...     return a + b
... 

Errors and Exceptions

Errors and Exceptions

  • Syntax errors occur when statements or expressions are not syntactically correct and can't be understood by the interpreter (e.g., a : is missing, a statement is wrongly indented, ...)
  • Exceptions are errors that occur during code execution (e.g., attempt to divide by zero)

>>> for i in [1 , 2, 3] print(i)

File "<stdin>", line 1
  for i in [1 , 2, 3] print(i)
                          ^
SyntaxError: invalid syntax

  >>> x = 3 / 0

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

Raising Exceptions

  • raise statements can be used to raise exceptions so that they can be handled by the surrounding code (cf. next slide)
    • Python provides built-in exception types for handling common errors (arithmetic errors, type errors, ...)
    • Unhandled exceptions immediately terminate the program

>>> def add(a: int, b: int = 1) -> int:
...     """Add two integers.
... 
...     Args:
...         a (int): The first integer.
...         b (int, optional): The second integer. Defaults to 1.
... 
...     Returns:
...         int: The sum of the two integers.
...     """
...     if not (isinstance(a, int) and isinstance(b, int)):
...         raise TypeError("Arguments must be of type int") # stops the function's execution
...     return a + b # only executed if no exceptions are raised
... 

Handling Exceptions

Syntax


try:
    try_body
except (ErrorType1, ErrorType2, ...):
    except_body
      
  • Exceptions can be handled using try (compound) statements
  • The statement is executed as follows
    • The try clause is executed
    • If no exception occurs, the except clause is skipped
    • If an exception occurs
      • If its type matches one of those after the except keyword
        • The except clause is executed
        • Execution continues after the try statement
      • Otherwise, it's an unhandled exception → the execution stops

>>> try:
...     add("toto", 1)
... except TypeError:
...     print("Oups! Something went wrong!!!")

Oups! Something went wrong!!!

Data Structures

Data Structures

  • Data structures are collections of related data (objects)
  • Python offers 4 built-in data structures
    • Lists (list): mutable sequences of objects
    • Tuples (tuple): immutable sequences of objects
    • Sets (set): collections of unique objects
    • Dictionaries (dict): mappings between key-value pairs

Lists

  • Lists (built-in type list) are sequences of objects
  • Properties
    • Ordered
    • Iterable
    • Mutable
    • Can contain objects of different types
    • Subscriptable: an item of the list can be selected (l[i])
    • Support slicing, i.e., selection of a range of objects (l[start:end:step])

>>> animals = ["cat", "dog", "chicken"]
>>> mixed_list = [1, "pen", True, [1., 2.5, 3+4j]]
>>> print(animals)

['cat', 'dog', 'chicken']

>>> print(mixed_list)
[1, 'pen', True, [1.0, 2.5, (3+4j)]]

Constructing Lists

  • Lists can be constructed in several ways
    • A pair of square brackets denotes an empty list: []
    • Using square brackets with comma-separated values inside: [item_1, item_2, ...]
    • Using list comprehensions (presented later)
    • Using the type constructor: list() (for an empty list) or list(iterable)
    • Using list concetenations: list_1 + list_2
    • Using list replication: n * l (concat the list l with itself n times)

Constructing Lists


>>> empty_list = []
>>> print(empty_list)
      
[]

>>> print(len(empty_list)) # print the length (number of members) of the list

0

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> alphabet_list = list(alphabet) # using the type constructor list() applied to an iterable
>>> print(alphabet_list)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

>>> wild_animals = ["lion", "giraffe", "elephant"]
>>> domestic_animals = ["cat", "dog", "chicken"]
>>> animals = wild_animals + domestic_animals # list concatenation
>>> print(animals)
['lion', 'giraffe', 'elephant', 'cat', 'dog', 'chicken']

  >>> 3 * [1, 2, 3] # concatenate the list with itself three time
 
  [1, 2, 3, 1, 2, 3, 1, 2, 3]
  

Accessing List Content

  • Lists can be iterated over using for loops
  • Lists can be indexed: l[i]
    • Positions start at 0
    • Negative indexes indicate reverse order (i.e., going back from the end of the list)
  • Lists support slicing: l[start:end:step]
    • start is inclusive
    • end is exclusive
    • Elements of the slice can be omitted
      • Defaults: start0; endlen(l); step1
    • Slicing produces a new list (based on the original list)

Accessing List Content


>>> animals = ["lion", "giraffe", "elephant", "cat", "dog", "chicken"]
>>> print(animals[0]) # remember: positions start at 0
    
lion

>>> print(animals[-3]) # third member from the end of the list

cat

>>> print(animals[:3]) # slice of the three first members of the list

['lion', 'giraffe', 'elephant']

  >>> domestic_animals = animals[3:] # create a new list based on members from 4th pos. -> end of list
  >>> print(domestic_animals)

['cat', 'dog', 'chicken']

  >>> print(animals[1::2]) # select one every two animals, starting from 2nd item

['giraffe', 'cat', 'chicken']

  >>> domestic_animals[1] = "sheep" # you can use indexes to modify individual list members
  >>> print(domestic_animals)
['cat', 'sheep', 'chicken']

>>> print(animals) # the original list from which domestic_animals was created was not modified
['lion', 'giraffe', 'elephant', 'cat', 'dog', 'chicken']

>>> animals[:4:2] = ["tiger", "zebra"] # slicing can also be used to modify multiple items simultaneously
>>> print(animals)

  ['tiger', 'giraffe', 'zebra', 'cat', 'dog', 'chicken']
  

Common List Methods and Operations*

  • len(l): length (number of members) of the list l
  • x in l: check if object x is a member of the list l
    • Evaluates to True if a member of l equals x, False otherwise
    • Negation: x not in l (check that x is not a member of l)
  • l.append(x): insert object x at the end of the list l
  • l.insert(i, x): insert object x at position i of the list l
  • l.count(x): number of occurrences of object x in the list l
  • l.remove(x): remove the first occurrence of x in the list l
  • l.pop(i): pop (remove) the member at position i and return it
  • l.clear(): empty the list l
  • l.sort(): sort the list l (in-place)
  • l.reverse(): invert the list l (in-place)
  • max(l), min(l): largest / smallest item in the list

* The supported sequence operations and their priorities can be found here and here.

Common List Methods and Operations


>>> numbers = [2, 4, 27, 1, -5, 11, 3]
>>> len(numbers)
  
7

  >>> 25 not in numbers # membership test

True

>>> 3 in numbers

True

>>> numbers.append(27) # append 27 at the end of the list
>>> print(numbers)

[2, 4, 27, 1, -5, 11, 3, 27]

>>> numbers.insert(3, 42)
>>> print(numbers)

[2, 4, 27, 42, 1, -5, 11, 3, 27]

>>> numbers.count(27)
2

>>> numbers.remove(27) # remove the first occurrence of 27 from the list
>>> print(numbers)
[2, 4, 42, 1, -5, 11, 3, 27]

>>> numbers.remove(33) # trying to remove an item that doesn't exist raises an exception

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list

>>> numbers.sort(reverse=True) # sort the list in reverse order (the list is directly modified)
>>> print(numbers)
  
  [42, 27, 11, 4, 3, 2, 1, -5]
  
>>> n = numbers.pop(3) # pop the 4th item from the list and bind the name n to it 
>>> print(numbers)
  
  [42, 27, 11, 3, 2, 1, -5]
  
>>> print(n)

4

List Comprehensions

  • List comprehensions provide a concise syntax for creating lists based on other sequences or iterables
    • For each object in the iterable
      • element is bound to the object
      • expression is evaluated
    • The result is a list of all the values of the expression

Syntax


      [expression for element in iterable]
      

Example


>>> numbers_squared = [n**2 for n in range(1, 11)]
>>> print(numbers_squared)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

>>> # equivalent to
>>> numbers_squared = []
>>> for n in range(1, 11):
...     numbers_squared.append(n**2)
... 
>>> print(numbers_squared)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Conditional List Comprehensions

  • List comprehensions can include if clauses to filter expressions based on some condition

Syntax


      [expression for element in iterable if condition]
      

Example


>>> odd_numbers = [n for n in range(10) if (n % 2)]
>>> print(odd_numbers)

[1, 3, 5, 7, 9]

>>> # equivalent to
>>> odd_numbers = []
>>> for n in range(10):
...     if (n % 2):
...         odd_numbers.append(n)
... 
>>> print(odd_numbers)

[1, 3, 5, 7, 9]

Nested List Comprehensions

  • List comprehensions can also involve multiple for statements

Syntax


      [expression for x in iterable_1 for y in iterable_2]
      

Example


>>> numbers = [2 * x + y for x in range(3) for y in range(0, 50, 10)]
>>> print(numbers)

[0, 10, 20, 30, 40, 2, 12, 22, 32, 42, 4, 14, 24, 34, 44]]

>>> # equivalent to
>>> numbers = []
>>> for x in range(3):
...     for y in range(0, 50, 10):
...         numbers.append(2 * x + y)
... 
>>> print(numbers)

[0, 10, 20, 30, 40, 2, 12, 22, 32, 42, 4, 14, 24, 34, 44]

Tuples

  • Tuples (built-in type tuple) are sequences used typically to store collections of heterogeneous data
  • Properties
    • Ordered
    • Iterable
    • Immutable ( unlike lists)
    • Can contain objects of different types
    • Subsciptable
    • Support slicing

>>> animals = ("cat", "dog", "chicken")
>>> mixed_tuple = (1, "pen", True, [1., 2.5, 3+4j])
>>> print(animals)

['cat', 'dog', 'chicken']

>>> print(mixed_tuple)
(1, 'pen', True, [1.0, 2.5, (3+4j)])

Constructing Tuples

  • Tuples can be constructed in several ways
    • A pair of empty parentheses denotes an empty tuple: ()
    • Using a trailing comma for a singleton tuple: (a,) or a,
    • Separating items with commas: (a, b, c) or a, b, c
    • Using the type constructor: tuple() (for an empty tuple) or tuple(iterable)
    • Using tuple concatenation: tuple_1 + tuple_2
    • Using tuple replication: n * t

Constructing Tuples


>>> t = () # an empty tuple, equivalent to using the type constructor tuple()
>>> print(t)
    
    ()
    
>>> t = 1, # singleton tuple
>>> type(t)
    
    <class 'tuple'>
    
      >>> print(t)

(1,)

>>> t = (1, 2, 3) # same as t = 1, 2, 3
>>> print(t)
(1, 2, 3)

>>> t = (1, 2, 3) + (4, 5, 6) # tuple concatenation
>>> print(t)

(1, 2, 3, 4, 5, 6)


>>> t = 3 * (1, 2, 3) # tuple replication (concatenate tuple with itself multiple times)
>>> print(t)

(1, 2, 3, 1, 2, 3, 1, 2, 3)

>>> t = tuple(3*[[]]) # a tuple from a list of "three" empty lists
>>> print(t)

([], [], [])

>>> t[0].append("cat") # a little brain teaser: what is happening here???
>>> print(t)

(['cat'], ['cat'], ['cat'])

Immutability, Revisited...


>>> t = ("John", 23, ["Pasta", "Pizza", "Tiramisu"]) # name, age, and list of favorite dishes
>>> t[0] = "Jane" # The name (immutable str) can't be changed
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

t[2] = ["Fondue", "Raclette", "Tartiflette"] # the binding to the list can't be changed...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

>>> t[2][0] = "Raclette" # ... but the list itself (mutable) can!
>>> t[2].append("Fondue")
>>> print(t)
('John', 23, ['Raclette', 'Pizza', 'Tiramisu', 'Fondue'])

Tuple Operations

Operation Description
len(t) Length (number of items) of the tuple t
x in t True if a member of t equals x, False otherwise
x not in t False if a member of t equals x, True otherwise
t.count(x) Number of occurrences of x in the tuple t
t.index(x) Index of the first occurrence of x in the tuple t
max(t) Largest item in the tuple t
min(t) Smallest item in the tuple t

Tuple Operations


>>> numbers = (1, 5, 2, 7, 8, 11, -3, 7, 42)
>>> print(len(numbers))

9

>>> print(numbers[3::2]) # tuples are subscriptable and support slicing

(7, 11, 7)

>>> (5 in numbers) and (12 not in numbers) # membership tests

True

>>> numbers.count(7)

2

>>> numbers.index(7) # index of first occurrence of 7 in the tuple

3

>>> numbers.index(7, 5) # index of first occurrence of 7 starting from index 5

7

>>> print("Smallest number: {}. Largest number: {}.".format(min(numbers), max(numbers)))

Smallest number: -3. Largest number: 42.

>>> t = (i for i in range(10) if (i % 2)) # There are no tuple comprehensions!!!
>>> type(t)

<class 'generator'>

Enumerating Iterables

  • The enumerate() function builds a new iterable from another iterable (passed as an argument)
    • Each object of the new iterable is a 2-tuple containing
      • The index in the original iterable...
      • ... and the corresponding object
  • Used when the index is needed while iterating over an iterable
    • Usually, the 2-tuple is unpacked
    • Better than unidiomatic alternatives w/additional state variables

>>> animals = ["cat", "dog", "chicken"]
>>> for t in enumerate(animals):
...     print(t)
... 

(0, 'cat')
(1, 'dog')
(2, 'chicken')

>>> for i, animal in enumerate(animals):
...     print("The animal at position {} is {}".format(i, animal))
... 
The animal at position 0 is cat
The animal at position 1 is dog
The animal at position 2 is chicken

Sets

  • Sets (built-in type set) are unordered collections of unique hashable objects
  • Sets are very helpful when mathematical set operations (intersection, union, ...) are involved
  • Properties
    • Unordered
    • Iterable
    • Mutable (for immutable sets, use the frozenset built-in type)
    • Can contain objects of different types
    • Can only contain objects that are hashable
    • Not subscriptable
    • Do not support slicing

Constructing Sets

  • Sets can be constructed in different ways
    • Using curly brackets with comma-separated values inside: {item_1, item_2, ...}
    • Using the type constructor: set() (for an empty set) or set(iterable)
    • Using set comprehensions

Constructing Sets


>>> type({}) # this is an empty dict (presented next), not an empty set (use set() instead)

<class 'dict'>

>>> s = {"Hello", 1, 3.14, 5+4j, (1, 2)} # type mixing is allowed...
>>> print(s)

{'Hello', 1, (1, 2), 3.14, (5+4j)}

>>> s = { "Hello", 1, 3.14, 5+4j, (1, 2), [1, 4, 5] } # ... as long as you don't use unhashables

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> numbers = {1, 5, 2, 7, -2, 7, 4, 42, -8, 36, 42} # sets can't contain duplicates
>>> print(numbers)

{1, 2, 4, 5, 36, 7, 42, -8, -2}

>>> s = {1, 2, 3} + {4, 5, 6} # sets do not support concatenations (or replication)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'set' and 'set'

>>> s = set(range(10)) # set() can be used to construct sets from iterables
>>> print(s)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Set Comprehensions

  • Set comprehensions can be used to construct sets efficiently
    • Similar syntax to list comprehensions (using {} instead of [])
    • Nesting and conditional comprehensions are supported
    • Additional constraint: expression must evaluate to a hashable

Syntax


      {expression for element in iterable}
      

Examples


>>> numbers_squared = {n**2 for n in range(1, 11)}
>>> print(numbers_squared)

{64, 1, 4, 36, 100, 9, 16, 49, 81, 25}

>>> s = { (i, j) for i in range(4) for j in range(4) if i < j}
>>> print(s)
{(0, 1), (1, 2), (1, 3), (2, 3), (0, 3), (0, 2)}

Set Operations

Operation Description
x in s Returns True if x is an element of s
x not in s Returns True if x is not an element of s
s1 == s2 Returns True if s1 and s2 contain the same elements
s1.isdisjoint(s2) Returns True if s1 and s2 have no elements in common
s1.issubset(s2)
s1 <= s2
Tests if s1 is a subset of s2 (use s1.issuperset(s2) or s1 >= s2 for the other way around)
s1 < s2 Tests if s1 is a proper subset of s2 (use s1 > s2 for the other way around)
s1.intersection(s2)
s1 & s2
Returns a new sets containing elements common to both s1 and s2
s1.union(s2)
s1 | s2
Returns a new sets containing elements of both s1 and s2
s1.difference(s2)
s1 - s2
Returns a new sets containing elements that are in s1 but not in s2
s1.symmetric_difference(s2)
s1 ^ s2
Returns a new sets containing elements that are in either s1 or s2 but not both

Set Operations


>>> s1 = set(range(0, 30, 3))
>>> print(s1)

{0, 3, 6, 9, 12, 15, 18, 21, 24, 27}

>>> s2 = set(range(0, 30, 4))
>>> print(s2)

{0, 4, 8, 12, 16, 20, 24, 28}

>>> (9 in s1) and (12 in s2) # membership testing

True

>>> s1 & s2 # equivalent to s1.intersection(s2)

{0, 24, 12}

  >>> s1 | s2 # equivalent to s1.union(s2)

{0, 3, 4, 6, 8, 9, 12, 15, 16, 18, 20, 21, 24, 27, 28}

>>> s1 - s2 # equivalent to s1.difference(s2)

{3, 6, 9, 15, 18, 21, 27}

>>> s2 - s1

{4, 8, 16, 20, 28}

>>> s1 ^ s2 # equivalent to s1.symmetric_difference(s2)

{3, 4, 6, 8, 9, 15, 16, 18, 20, 21, 27, 28}

  >>> s1 <= s2 # equivalent to s1.issubset(s2)
  
  False
  
>>> {3, 6, 9} <= s1

True

>>> s1 < s1 # s1 is not a proper subset of itself (since s1 == s1)

False

Modifying Sets

  • Since sets are not subscriptable, elements cannot be modified using index-based assignments (i.e., s[i] = expression)
  • However, since sets are mutable, they provide many element-oriented and set-oriented modification methods
Operation Description
s.add(x) Adds x to the set s (x must be hashable)
s.remove(x) Removes x from s (raises an exception if x not in s)
s.discard(x) Removes x from s (does not raises an exception if x not in s)
s.clear() Empties the set s from all its elements
s.pop() Removes an arbitrary element from the set s and returns it
s1 &= s2
s1 |= s2
s1 -= s2
s1 ^= s2
Update the set s1 by adding the result of the corresponding set operation (intersection, union, difference, and symmetric difference respectively) to s1 and s2

Modifying Sets


>>> s = {5, 2, 42, -7, 3, -16}
>>> s.add(9)
>>> print(s)

{2, 3, 5, 9, 42, -16, -7}

>>> s.remove(3)
>>> print(s)

{2, 5, 9, 42, -16, -7}

>>> s.remove(101) # attempting to remove an alement that doesn't exist raises an error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 101

>>> n = s.pop()
>>> print(s, ",", n)

{5, 9, 42, -16, -7} , 2

>>> s -= {5, 4, -7}
>>> print(s)

{9, 42, -16}

>>> s |= {3, 12} | {45, 9} ^ {3, 9}
>>> print(s)

{3, 9, 42, 12, 45, -16}

>>> s.clear()
>>> print(s)

set()

Dictionaries

  • Dictionaries (built-in type dict) are objects that map hashable values (keys) to arbitrary objects (values)
    • Can be seen as (key, value) pairs
    • Keys are unique (they form a set)
    • Values are accessed through keys instead of integer indices

>>> person = {"name": "John", "age": 24, "profession": "Data Scientist"}
>>> print(person)

{'name': 'John', 'age': 24, 'profession': 'Data Scientist'}

>>> print(person["name"])

John

Dictionaries

  • Dictionaries (built-in type dict) are objects that map hashable values (keys) to arbitrary objects (values)
  • Properties
    • Iterable
    • Mutable
    • Preserve insertion order (Python 3.7+)
    • Can contain keys of different types (must be hashable)
    • Can contain values of different types
    • Subscriptable (indexed by keys)
    • Do not support slicing

Constructing Dictionaries

  • Dictionaries can be created in a multitude of ways
    • A pair of empty curly brackets denotes an empty dictionary: {}
    • Using curly brackets with comma-separated key:value pairs inside: {key_1: value_1, key_2: value_2, ...}
    • Using dictionary comprehensions
    • Using the type constructor dict() (for an empty dictionary), dict(**kwargs), dict(mapping, **kwargs), or dict(iterable, **kwargs)

Constructing Dictionaries


>>> d = {} # an empty dict, equivalent to d = dict()
>>> print(d, ", ", len(d))

{} ,  0

>>> d = { 1: "Hello", (3, 5): 3+4j, "list": [1, 5]} # type mixing is accepted (for keys and values)...
>>> print(d)



>>> d = { 1: "Hello", [3, 4, 2] : (1, 2)} # ... as long as you use hashable keys

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> # The following are all equivalent constructions
>>> d1 = {"name": "John", "age": 24, "profession": "Data Scientist"}
>>> d2 = dict(name="John", age=24, profession="Data Scientist")
>>> d3 = dict([("name", "John"), ("age", 24), ("profession", "Data Scientist")])
>>> d4 = dict(zip(["name", "age", "profession"], ["John", 24, "Data Scientist"]))
>>> d1 == d2 == d3 == d4

True

>>> print(d1)

{'name': 'John', 'age': 24, 'profession': 'Data Scientist'}

Dictionary Comprehensions

  • Comprehensions can be used to construct dictionaries
    • Similar to list and set comprehensions (uses {} like the latter)
    • Nesting and conditional comprehensions are supported
    • Constraint: key_expression must produce a hashable

Syntax


      {key_expression: value_expression for element in iterable}
      

Examples


>>> print({i : i**2 for i in range(11)})
    
    {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}
    
>>> print({i: animal for i, animal in enumerate(["cat", "dog", "chicken"])}

{0: 'cat', 1: 'dog', 2: 'chicken'}
>>> print({ animal: len(animal) for animal in ["cat", "dog", "chicken"] if len(animal) < 5 })

{'cat': 3, 'dog': 3}

Indexing Dictionaries

  • Dictionaries are subscriptable
    • Indexation by keys: d[key] (not numeric indice)
      • If key exists in the dictionary → returns the associated value
      • Else → raises a KeyError exception
  • d.get(key, default_value) can also be used to access the dictionary's values
    • If key exists in the dictionary → returns the associated value
    • Else
      • default_value is returned if it is specified
      • None is returned if default_value is omitted
    • Dictionaries are iterable
      • By default, iteration over keys
      • Possible to iterate over values and key-value pairs through views (presented in a bit)

Indexing Dictionaries


>>> d = {"a": 1, "b": 2, "c": 3}
>>> print(d["a"]) # indexation is done by keys

1

>>> print(d["e"]) # using a key that doesn't exist raises an exception

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'e'

>>> print(d.get("e")) # using an inexistant key with get() doesn't raise an exception

None

>>> print(d.get("e", 0) # the default value can be customized

0

>>> for i in d: # iterating directly on the dict operates on its keys
...     print(i)
... 

a
b
c

Dictionary View Objects

  • Dictionaries provide views on their entries through three methods
    • d.keys() → Collection of keys in the dictionary d
    • d.values() → Collection of values in the dictionary d
    • d.items() → Collection of (key, value) in the dictionary d
  • Dictionary views
    • Are iterable
    • Support membership tests (in and not in operators)
    • Are dynamic: changes to the dictionary are reflected in all its views

Dictionary View Objects


>>> d = {"a": 3, "b": 7, "c": 42, "d": 13}
>>> for key in d.keys():
...     print(key)
... 
  
a
b
c
d
  
>>> for value in d.values():
...     print(value)
... 
  
3
7
42
13
  
>>> for key, value in d.items():
...     print(key, " -> ", value)
... 
  
a  ->  3
b  ->  7
c  ->  42
d  ->  13
  

Operating on Dictionaries

Operation Description
d[key] = value Insert the (key, value) pair in d (updates d[key] if it already exists)
del d[key] Removes key from d (raises a KeyError exception if the key is not in the dict)
len(d) Length (number of items) of the dictionary d
x in d True if a key of d equals x, False otherwise
x not in d False if a key of d equals x, True otherwise
d.pop(keydefault) If key in d, removes the key and returns the associated value, else returns default
d.popitem() Removes and returns a (key, value) pair from d in LIFO (last in first out) order
d.clear() Empty d from all its items

Argument Unpacking in Function Calls

  • The content of lists, tuples, and dictionaries can be “unpacked” and used as arguments in function calls
    • Lists and tuples can be unpacked as positional
      arguments: func(*args)
    • Dictionaries can be unpacked as keyword
      arguments: func(**kwargs)

>>> def add(a, b):
...     return a + b
... 
>>> l = [1, 2]
>>> d = {"a": 3, "b": 4}

>>> add(*l) # equivalent to add(l[0], l[1])

3

  >>> add(**d) # equivalent to add(a=d["a"], b=d["b"])
  
  7
  

Where to Go from Here

  • You are not Python experts yet! Your journey is just beginning...
  • Too many remaining topics to explore
    • Packaging Python code
    • Context managers
    • Generators
    • Decorators
    • Object oriented programming
    • Data model
  • Interesting readings
  • If books are not your thing, check Corey Schafer's Youtube channel
  • Equip yourselves with the right tools
    • A full-fledged IDE (e.g., Visual Studio Code or PyCharm)
    • Code linters and formatters
This work is licensed under the
Creative Commons
Attribution-NonCommercial-ShareAlike 4.0
International Public License
(CC BY-NC-SA 4.0)