Reporting and Prediction of Big Data

Return to schedule

register-button24

Course Title BIG DATA – Reporting and Prediction
Course number 900-059-EQ-02
Platform Tableau Public, Knime & Google Cloud Platform
Duration 45 hours
Prerequisites Basic understanding of databases is preferred. Otherwise, Excel spreadsheet processing experience and an understanding of computer software systems. Basic understanding of data mining.
Target Audience Data analysts; Computer Analysts; Individuals in any role dealing with small or large amounts of data needing to analyze it and produce insightful and actionable dashboards.
Dates February 1, 8, 15, 22, 29; March 7, 14. Last class on March 14, 9 a.m.-12 p.m.
Instructor Diego Perea  – Ph.D.
Room BH-214
Schedule Saturday 9 a.m. – 4:30 p.m. (30 minute lunch)
Gouvernement du Québec fee (taxes incl.) $90.00
General public fee (taxes incl.) $750.73

Recommended textbook: Tableau and Google Cloud Platform on-line documentation

NB: This is a non- credit course. Certificate provided for all participants who have completed 80% of course hours.

Course Description
This course provides an introduction to data mining and big data. It gives participants the concepts and software skills needed to research, load, process and analyze data to obtain actionable insights. It focuses on developing software skills in Tableau, Knime and Google Big Query required to process data and prepare presentations and dashboards that highlight the data’s added value and support decisions and actions to follow.

The course methodology is based on lectures led by the instructor who will present the concepts using industry examples. Each lecture is followed by a lab where participants complete specific tasks designed to reinforce the concepts introduced in the lecture.

In addition, participants are expected to complete a small assignment during the course and present it to the class. This will give them the confidence to apply the data mining skills learned in the course and to present the data insights in a clear, concise and engaging way.

 

Topics Covered in this Course
  1. Data processing stages in data mining
  2. Hardware and software systems for data mining
  3. Extracting, transforming and loading data
  4. Principles of data analysis
  5. Effective reports and dashboards displaying the data insights
  6. Big Data distributed processing systems: Hadoop, Spark and Google Big Query
  7. Connecting analytics to Big Data distributed processing systems
  8. Forecasting and prediction. Introduction to machine learning.

 

Weekly Topics
Please note that the instructor reserves the right to modify this schedule
Week 1 Introduction and lab preparation and topic 1
Week 2 Topics 1 and 2
Data processing stages in Data mining
Hardware and software systems for data mining
Week 3 Topic 3
Extracting, Transforming and Loading data
Week 4 Topics 4 and 5
Principles of data analysis
Effective reports and dashboards displaying the data insights
Week 5
Topic 5
Mid-course review
Hackathon
Week 6 Topics 6 and 7
Big Data distributed processing systems
Connecting analytics to Big data systems
Week 7 Topic 8
Forecasting and prediction. Introduction to Machine Learning.
Week 8 Project presentations and concluding remarks.

 

SOFTWARE TO BE USED

For the course, we will primarily use Tableau public to load, process and analyze data, and produce reports and dashboards. For the large data portion of the course, we will use Google Big Query platform.  Knime will be used for prediction and basic machine learning techniques. Other complimentary software includes MS Excel and MS Access. Data analytics software tools besides Tableau and Knime will be discussed in the course to provide participants a holistic view of data mining.

 

LABS and DATASETS

In the labs, participants will practice the skills needed for the different stages of the data mining process. Namely, ETL (Extraction Transformation and Load), Analysis and Reporting. Students are encouraged to bring their own datasets. However, for the lectures and labs we will use the following datasets.

  1. Uber trip data: trip information including Uber service type, source, destination, distance, duration and paid fare. Example

 https://public.tableau.com/views/Lab4-DatacharacterizationA-categoricalfields/Dashboard2

  1. On-line store purchasing behavior data: Characterization of on-line purchasing behavior. Example:

https://public.tableau.com/views/Lab3-Gizmoon-linestore/Dashboard2

  1. Mobile video trending data: Characterization and trending analysis of video consumption from mobile devices. Example:

https://public.tableau.com/shared/XX6T2DZZ2

  1. Google Cloud NOAA data: Worldwide meteorological information including temperature, wind and rain for more than 60 years. Example:

https://public.tableau.com/shared/D3MPD5GYH

  1. Google Cloud Shakespeare data: word count of all Shakespeare works. Example:

https://public.tableau.com/views/Lab12A-Shakespearedataset/Story1

Students work on a small project during the course to get comfortable using the techniques learned in class. Find below a link to some of the student projects.

https://public.tableau.com/views/StudentAssignmentsFall2017/StudentAssignmentsStory

In addition, in the middle of the course we host a friendly competition in the style of a hackathon where students are given a dataset and a problem to solve. They work for half a day and present a dashboard showing how the problem was solved and the main data insights. Please find below links to the project winners in the spring of 2019.

First place: Analysis of Bixi Ridership data in Montreal

https://public.tableau.com/profile/sfmdorval#!/vizhome/Bixiproject-2/BIXIrides

Second place: Dashboard for Real State sales in King County https://public.tableau.com/shared/WHDM2W78D

 

TOP