Theodoros Evgeniou, Professor of Decision Sciences and Technology Management

Anton Ovchinnikov, Visiting Professor of Operations

Course Description

The abundance of data revolutionizes many industries, and creates new, data-intensive business models. To take advantage of this trend, today’s MBAs need to be more comfortable with “data science” – an emerging discipline that combines data analytics and business. The goal of this course is to build your capability in data science so that you can effectively add value through the intelligent management and use of data in your organizations.

The course will combine three key elements: analytics techniques, business applications, and basic coding/programming (in R, one of the leading open-source tools for analyzing data that you will be able to use in your jobs.) The emphasis will be not on the technicalities or theory, but rather on applications to various business cases in finance, marketing, and operations, among other disciplines.

A pre-requisite for the course is the material covered in the INSEAD core course Uncertainty, Data & Judgment. This course is a follow-up to UDJ. No prior coding experience is required: for most classes you will receive a starter code, by running and modifying which you will learn analytics techniques and coding principles, and which you will also be able to use in your jobs. Because of that, much of the course will be in a form of a “hands-on” workshop; you will be expected to bring your laptop to class (with all the necessary software tools installed) and actively participate in the learning process.

What you will take away from this course:

  • Understand key principles and processes for analyzing data and managing analytics projects;
  • Learn to better identify new business opportunities for data analytics, and the specific strategies for extracting business value from data
  • Learn several advanced analytics/machine learning techniques: generalized linear models (logistic regression), CART, random forests, methods for segmentation and clustering, and neural networks
  • Get an introductory exposure to coding (in R) on which you will be able to build in your jobs
  • Get an introductory understanding of data science, “data scientists”, and how to work with them

The course is built around specific business cases that we will solve in a step-by-step approach, while getting introduced to the topics above.

This is not a course to become “data scientists” or even to become “experts in analytics”. The goal is to familiarize participants with what is available and possible for analytics. It is meant to be a starting point.

Course Tools

The course is using the R programming language and the main tool is Rstudio. All participants are required to install Rstudio before the first session.

Since coding is likely new to many of you, if you experience any issues using the class tools, please post them “Issues” on the course website’s GitHub page, after exploring any related past issues there. While answers will be provided by the course TAs, participation points will be awarded to participants for responding to peers’ issues. What is Github? – it is an open-source platform for coding collaboration, and as the course progresses we hope you will find it useful to register there and use it to share your work.

A key lesson of the course is that an important success factor for data analytics projects is to have a good balance between creative customization and codified, reproducible and reusable end-to-end analytics processes (“solutions”). We will develop an example of a codified end-to-end process (“solution”) during the course. More examples of reproducible and reusable analytics solutions can also be found in the Microsoft Azure Machine Learning Studio platform. This space is growing and changing fast, with various cloud-based platforms being developed such as Google Cloud Machine Learning Engine and Amazon’s Artificial Intelligence on AWS, among others.

Cases and Final Project

Three cases will be assigned and they will be due in groups. Each case will be on a real business application (relating to finance, marketing, and operations, among others). The cases will focus on implementing analytics techniques or models taught in class, and deriving relevant managerial insights, both through a step-by-step solution approach.

A central part of the course is the group final project. For the final project, every group is required to develop a data analytics solution to a business problem, and share the relevant data1. The project should include three parts: A clear process for how to solve the business problem with steps codified using R code and an interactive toolkit; an application of the process using a specific dataset; and specification for others to use the process, e.g. with different data. The professors will be meeting with all groups to discuss final project ideas and progress; the TA would be available to help with specific implementation issues.

Tutorials

There will be a teaching assistant for the course, who will run 5 tutorial sessions throughout the course to assist you with getting comfortable with R, understanding and using the R implementation of the machine learning techniques presented in class, as well as implementing the final projects.

Books

There is no required textbook, but these books are recommended as optional background reading:

Data Science for Business: Fundamental Principles of Data Mining and Data-Analytic Thinking (DSB) by F. Provost and T.Fawcett (2013)

An open-source (free) online textbook covering much of the material in the early part of the course is Forecasting Principles and Practice (FPP). It also uses R for examples. Browse through the book and don’t hesitate to use it as an extended help file.

Course Outline

“AO” refers to Prof. Anton Ovchinnikov, “TE” refers to Prof. Theos Evgeniou

SESSIONS 1-2 (AO)

Data analytics process; from Excel to R

Tutorial to follow: Getting comfortable with R

SESSIONS 3-4 (AO)

Time Series Models

SESSIONS 5-6 (AO)

Introduction to classification; Logistic regression and machine learning models for classification

Due: Time series assignment case

Tutorial to follow: Midterm R help

SESSIONS 7-8 (TE)

Advanced classification methods; Dimensionality reduction/Principal component analysis

Due: Classification assignment case

SESSIONS 9-10 (TE)

Segmentation and clustering

Tutorial to follow: Q&A on R implementation of three main modules

SESSIONS 11-12 (TE)

Catch-up and wrap-up; Guest speaker

Due: Clustering assignment case

Tutorials to follow: Hands-on help on projects (2 tutorials)

SESSIONS 13-14 (AO+TE)

Project presentations

Due: Final projects