Python for Data Science: a Crash Course

Prerequisites

Most of the course is self-contained, but you are expected to be familiar with mathematical tools associated to an economics curriculum (linear algebra, calculus, probability, and statistics) at an undergraduate level. The course does not assume any prior knowledge in programming in general and Python in particular. However, familiarity with another programming language can be useful in understanding the discussed concepts and topics.

The course is organised as follows.

Introduction to Python Programming

This first part introduces the fundamentals of Python programming. It covers topics such as working with basic built-in types (numbers, strings, booleans, ...), control flow statements, writing reusable code (functions), handling errors and exception that can occur during the execution of Python code, advanced data structures (lists, sets, dictionaries, ...), ...

Scientific Computing With NumPy

This part focuses on using NumPy, a scientific computing package that provides a wide assortment of useful and highly-optimized routines for working with multi-dimensional arrays (matrices, tensors, ...), linear algebra, statistics and random simulation, and much more.

Processing Tabular Data With pandas

The third part of the course is dedicated to pandas, a fundamental Python package when it comes to data science and data analysis. pandas provides functionalities for efficient manipulation of data frames, i.e., tabular data (stored in csv files, Excel sheets, ...). With the help of pandas, you can easily conduct tasks such as data cleaning (filling missing data, replacing outliers, ...), reshaping, merging, ...

Visualizing Data With Matplotlib and seaborn

The last part of the course is a quick introduction to data visualization functionalities in Python using the Matplotlib and seaborn packages. Data visualization is a very powerful tool for making sens of large volumes of data, identifying patterns, and extracting useful insights that can help understand and solve real-world business cases.

Evaluation

You will be evaluated based on a team project (conducted in pairs) in which you will apply the knowledge and skills you acquired during the course. The project takes the form of an exploratory data analysis in which you will work on a tabular data set in order to extract valuable insights that can help solve a business problem.

The expected deliverables of the project are:

A 5–10 pages report;
The source code (Jupyter notebooks or Python scripts) of your work, either in a Github repository or as a zip file.

You are expected to present your main findings during a 10-minutes presentation, which will be followed by approximatively 5 minutes of questions.