Python for Data Science: a Crash Course
Prerequisites
Most of the course is self-contained, but you are expected to be familiar with
mathematical tools associated to an economics curriculum (linear algebra,
calculus, probability, and statistics) at an undergraduate level. The course
does not assume any prior knowledge in programming in general and Python in
particular. However, familiarity with another programming language can be
useful in understanding the discussed concepts and topics.
Contents
The course is organised as follows.
This first part introduces the fundamentals of Python programming. It covers
topics such as working with basic built-in types (numbers, strings, booleans,
...), control flow statements, writing reusable code (functions), handling
errors and exception that can occur during the execution of Python code,
advanced data structures (lists, sets, dictionaries, ...), ...
This part focuses on using NumPy, a scientific computing package that provides
a wide assortment of useful and highly-optimized routines for working with
multi-dimensional arrays (matrices, tensors, ...), linear algebra, statistics
and random simulation, and much more.
The third part of the course is dedicated to pandas, a fundamental Python
package when it comes to data science and data analysis. pandas provides
functionalities for efficient manipulation of data frames, i.e., tabular data
(stored in csv files, Excel sheets, ...). With the help of pandas, you can
easily conduct tasks such as data cleaning (filling missing data, replacing
outliers, ...), reshaping, merging, ...
The last part of the course is a quick introduction to data visualization
functionalities in Python using the Matplotlib and seaborn packages. Data
visualization is a very powerful tool for making sens of large volumes of
data, identifying patterns, and extracting useful insights that can help
understand and solve real-world business cases.
You will be evaluated based on a team project (conducted in pairs) in which
you will apply the knowledge and skills you acquired during the course. The
project takes the form of an exploratory data analysis in which you will work
on a tabular data set in order to extract valuable insights that can help
solve a business problem.
The expected deliverables of the project are:
- A 5–10 pages report;
-
The source code (Jupyter notebooks or Python scripts) of your work, either
in a Github repository or as a zip file.
You are expected to present your main findings during a 10-minutes
presentation, which will be followed by approximatively 5 minutes of
questions.