|Course Title||Data Science – 2|
|Platform||Knime and Python|
|Prerequisites||Basic understanding of Probability and Algebra .Basic Data analysis and processing in Excel, Tableau or other database systems. Basic Linear regression – completion of course Data Science Level 1.|
|Target Audience||Data Analysts; Computer Analysts; Individuals in any role dealing with small or large amounts of data needing to model the data to obtain predictions.|
|Dates||January 20, 22, 27, 29; February 3, 5, 10, 12|
|Instructor||Diego Perea – Ph.D.|
|Schedule||Monday & Wednesday – 6:30 p.m. – 9:30 p.m.|
|Gouvernement du Québec fee||$48.00|
|General public fee||$400.39|
NB: Certificate provided for all participants who have completed 80% of course hours
| An Introduction to Statistical Learning with Applications in R by G. James, D. Whitten, R. Tibshirani and T. Hastie.|
|Please note that this is a non-credit course.|
|The Descriptive analytic methods, seen in previous big data courses, form the basis for the Predictive Analytic methods that are at the heart of this new course. In this course, participants will learn the standard statistical methods currently used in industry to perform predictive analytics. These include linear and non-linear regression and several classification methods such as Logistic regression, KNN, Decision Trees and SVM. Participants will learn how to research the available data and choose the best predictive method to apply. Key components of this course are the understanding of these methods, the methodology to evaluate them and the criteria to choose the best method.
The course methodology is based on lectures led by the instructor, who will present the concepts using examples. Each lecture is followed by a lab using real data, where the participants will complete specific tasks in Knime and Python designed to reinforce the concepts introduced in the lecture.
Students will also formulate a small data prediction project, which they will complete in the Data Science 3 course.
|Topics Covered in this Course|
Please note that the instructor reserves the right to modify this schedule
|Week 1||Topics 1 and 2
Introduction, course description and Knime overview.
Basic statistical analysis: Histograms, box and scatter plots.
|Week 2||Topics 3 and 4
Continuous variables regression beyond linearity
|Week 3||Topics 5 and 6
|Week 4||Topic 7
Support Vector Machines and concluding remarks
SOFTWARE TO BE USED
For the course, we will mainly use Python, which is the industry standard for statistical learning and provides functions for most of the methods. Other software used for statistical learning is Knime that has a friendly graphical interface.
LABS and DATASETS
In the labs, the participant will apply the prediction and classification methods seen in class using practical datasets.