Data Science – Level 3

Return to schedule


Course Title Data Science – Level 3
Course number 900-082-EQ-01
Platform Python, Knime
Duration 21 hours
Prerequisites Data Science – Level 2 course
Target Audience Data Analysts; Computer Analysts; Professionals dealing with small or large amounts of data needing to apply Machine Learning Methods.
Schedule Monday & Wednesday  6:30 p.m.- 9:30 p.m.
Dates March 2, 4, 9, 11, 16, 18
Instructor Diego Perea  – Ph.D.
Room Brittain – BH- 309
Gouvernement du Québec fee $42.00
General public fee $344.79

Recommended textbook: “An Introduction to Statistical Learning with applications in R” by G. James, D. Whitten, R. Tibshirani and T. Hastie.

NB: Certificate provided for all participants who have completed 80% of course hours

Course Description
Please note that this is a non-credit course.
This course deals with advanced methods of Machine Learning for data science. At the end of the course the participant will have at disposal a large set of methods to apply for regression, supervised and unsupervised classification problems.

The course methodology is based on lectures led by the instructor, who will present the concepts using examples followed by a lab using real data where the participants will complete specific tasks in Knime and Python designed to reinforce the concepts introduced in the lecture.

Students will complete a small data prediction project with data of their choice during the course, where they apply the methods learned in the course. Examples of previous students projects are listed below


Topics Covered in this Course
  1. Regression Review and advanced variable selection methods: Ridge, Lasso.
  2. Supervised and unsupervised classification methods
  3. Forecasting
  4. Big data and commercial machine learning systems
  5. Project presentation


Weekly Topics
Please note that the instructor reserves the right to modify this schedule
Week 1 Topic 1
Week 2 Topic 2
Week 3 Topic 5
Week 4 Topic 4



For the course, we will mainly use Knime and Python, which are the industry standard for statistical learning and provides functions for most of the methods.  Other software will be addressed in the course to give the participant a holistic view of statistical learning. 


In the labs, the participant will apply the prediction and classification methods seen in class using practical datasets. We will use datasets from the textbook and from public sources such: