Data Science – Advanced

Return to schedule



Course Title Data Science – Advanced
Course number 900-082-EQ
Platform R
Duration 21 hours
Prerequisites Intermediate Data Science course
Target Audience Data Analysts; Computer Analysts; Professionals dealing with small or large amounts of data needing to apply Machine Learning Methods.
Dates March 29, April 4-9-11-16-17-23, 2018
Instructor Diego Perea  – Ph.D.
Room BH- 210
Schedule Thursday, March 29, Monday & Wednesday  6: 30 p.m. – 9:30 p.m., April 4-23: no class on April 2
Gouvernement du Québec fee $42.00
General public fee $338.03

Recommended textbook: An Introduction to Statistical Learning with Aapplications in R by G. James, D. Whitten, R. Tibshirani and T. Hastie.

NB: Certificate provided for all participants who have completed 80% of course hours

Course Description
Please note that this is a non-credit course.
This course deals with advanced methods in Machine Learning. The focus is in understanding the methods and applying them in practical data sets. At the end of the course the participant will have at their disposal a large set of methods to apply.
The course methodology is based on lectures led by the instructor, who will present the concepts using examples. Each lecture is followed by a lab using real data, where the participants will complete specific tasks in R designed to reinforce the concepts introduced in the lecture.


Topics Covered in this Course
  1. Introduction and lLab setup
  2. Linear model selection and moving beyond lLinearity methods
  3. Tree-based methods
  4. Support Vector Machines
  5. Unsupervised classification methods
  6. Big data machine learning systems: Spark ML


Weekly Topics
Please note that the instructor reserves the right to modify this schedule
Week 1 Topics 1 and 2
Week 2 Topics 2 and 3
Week 3 Topics 4 and 5
Week 4 Topic 6



For the course, we will mainly use R, which is the industry standard for statistical learning and provides functions for most of the methods. Other software will be addressed in the course to give the participant a holistic view of statistical learning.



In the labs, the participant will apply the prediction and classification methods seen in class using practical datasets. We will use datasets from the textbook and from public sources such: