Data Science – Level 2

Return to schedule


Course Title Data Science – 2
Course number 900-081-EQ-02
Platform Knime and Python
Duration 24 hours
Prerequisites Basic understanding of Probability and Algebra .Basic Data analysis and processing in Excel, Tableau or other database systems. Basic Linear regression – completion of course Data Science Level 1.
Target Audience Data Analysts; Computer Analysts; Individuals in any role dealing with small or large amounts of data needing to model the data to obtain predictions.
Dates January 20, 22, 27, 29; February 3, 5, 10, 12
Instructor Diego Perea  – Ph.D.
Room BH-309
Schedule Monday & Wednesday – 6:30 p.m. – 9:30 p.m.
Gouvernement du Québec fee $48.00
General public fee $400.39

NB: Certificate provided for all participants who have completed 80% of course hours

Recommended textbook
[1] An Introduction to Statistical Learning with Applications in R by G. James, D. Whitten, R. Tibshirani and T. Hastie.
Course Description
Please note that this is a non-credit course.
The Descriptive analytic methods, seen in previous big data courses, form the basis for the Predictive Analytic methods that are at the heart of this new course. In this course, participants will learn the standard statistical methods currently used in industry to perform predictive analytics. These include linear and non-linear regression and several classification methods such as Logistic regression, KNN, Decision Trees and SVM. Participants will learn how to research the available data and choose the best predictive method to apply. Key components of this course are the understanding of these methods, the methodology to evaluate them and the criteria to choose the best method.

The course methodology is based on lectures led by the instructor, who will present the concepts using examples. Each lecture is followed by a lab using real data, where the participants will complete specific tasks in Knime and Python designed to reinforce the concepts introduced in the lecture.

Students will also formulate a small data prediction project, which they will complete in the Data Science 3 course.


Topics Covered in this Course
  1. Introduction to Statistical Learning for Data Science
  2. Basic statistical analysis in Knime and Python: Histograms, box and scatter plots
  3. Continuous variables regression linear
  4. Regression beyond linearity
  5. Maximum Likelihood and Logistic regression Classifier. K-Nearest Neighbors (KNN) classifier
  6. Decision trees for regression and classification
  7. Support Vector Machine Classifier and Practical applications of Classification and regression.


Weekly Topics
Please note that the instructor reserves the right to modify this schedule
Week 1 Topics 1 and 2
Introduction, course description and Knime overview.
Basic statistical analysis: Histograms, box and scatter plots.
Week 2 Topics 3 and 4
Continuous variables regression beyond linearity
Week 3 Topics 5 and 6
Supervised classification
Week 4 Topic 7
Support Vector Machines and concluding remarks



For the course, we will mainly use Python, which is the industry standard for statistical learning and provides functions for most of the methods.  Other software used for statistical learning is Knime that has a friendly graphical interface.


In the labs, the participant will apply the prediction and classification methods seen in class using practical datasets.