|Course Title||BIG DATA – Reporting and Prediction
|Platform||Tableau Public, Knime & Google Cloud Platform|
|Prerequisites||Basic understanding of databases is preferred. Otherwise, Excel spreadsheet processing experience and an understanding of computer software systems. Basic understanding of data mining.|
|Target Audience||Data analysts; Computer Analysts; Individuals in any role dealing with small or large amounts of data needing to analyze it and produce insightful and actionable dashboards.|
|Dates||February 1, 8, 15, 22, 29; March 7, 14. Last class on March 14, 9 a.m.-12 p.m.|
|Instructor||Diego Perea – Ph.D.|
|Schedule||Saturday 9 a.m. – 4:30 p.m. (30 minute lunch)|
|Gouvernement du Québec fee (taxes incl.)||$90.00|
|General public fee (taxes incl.)||$750.73|
Recommended textbook: Tableau and Google Cloud Platform on-line documentation
NB: This is a non- credit course. Certificate provided for all participants who have completed 80% of course hours.
|This course provides an introduction to data mining and big data. It gives participants the concepts and software skills needed to research, load, process and analyze data to obtain actionable insights. It focuses on developing software skills in Tableau, Knime and Google Big Query required to process data and prepare presentations and dashboards that highlight the data’s added value and support decisions and actions to follow.
The course methodology is based on lectures led by the instructor who will present the concepts using industry examples. Each lecture is followed by a lab where participants complete specific tasks designed to reinforce the concepts introduced in the lecture.
In addition, participants are expected to complete a small assignment during the course and present it to the class. This will give them the confidence to apply the data mining skills learned in the course and to present the data insights in a clear, concise and engaging way.
|Topics Covered in this Course|
Please note that the instructor reserves the right to modify this schedule
|Week 1||Introduction and lab preparation and topic 1|
|Week 2||Topics 1 and 2
Data processing stages in Data mining
Hardware and software systems for data mining
|Week 3||Topic 3
Extracting, Transforming and Loading data
|Week 4||Topics 4 and 5
Principles of data analysis
Effective reports and dashboards displaying the data insights
|Week 6||Topics 6 and 7
Big Data distributed processing systems
Connecting analytics to Big data systems
|Week 7||Topic 8
Forecasting and prediction. Introduction to Machine Learning.
|Week 8||Project presentations and concluding remarks.|
SOFTWARE TO BE USED
For the course, we will primarily use Tableau public to load, process and analyze data, and produce reports and dashboards. For the large data portion of the course, we will use Google Big Query platform. Knime will be used for prediction and basic machine learning techniques. Other complimentary software includes MS Excel and MS Access. Data analytics software tools besides Tableau and Knime will be discussed in the course to provide participants a holistic view of data mining.
LABS and DATASETS
In the labs, participants will practice the skills needed for the different stages of the data mining process. Namely, ETL (Extraction Transformation and Load), Analysis and Reporting. Students are encouraged to bring their own datasets. However, for the lectures and labs we will use the following datasets.
- Uber trip data: trip information including Uber service type, source, destination, distance, duration and paid fare. Example
- On-line store purchasing behavior data: Characterization of on-line purchasing behavior. Example:
- Mobile video trending data: Characterization and trending analysis of video consumption from mobile devices. Example:
- Google Cloud NOAA data: Worldwide meteorological information including temperature, wind and rain for more than 60 years. Example:
- Google Cloud Shakespeare data: word count of all Shakespeare works. Example:
Students work on a small project during the course to get comfortable using the techniques learned in class. Find below a link to some of the student projects.
In addition, in the middle of the course we host a friendly competition in the style of a hackathon where students are given a dataset and a problem to solve. They work for half a day and present a dashboard showing how the problem was solved and the main data insights. Please find below links to the project winners in the spring of 2019.
First place: Analysis of Bixi Ridership data in Montreal
Second place: Dashboard for Real State sales in King County https://public.tableau.com/shared/WHDM2W78D