Processing of Big Data with SPARK

Return to schedule

register-button24

 

Course Title Processing of Big Data with SPARK
Course Number 900-091-EQ
Platform Linux
Duration 24 hours
Gouvernement du Québec fee (taxes incl.)
$48
General Public fee (taxes incl.)
$394.04
Schedule Saturday 9 a.m. – 3:30 p.m.(Lunch 30 minutes)
Dates January 5, 19, 26; February 2, 2019 (no class on January 12)
Prerequisites Management and Processing of Big Data-level I and Level 2 Basic SQL. Good understanding of any of the following Python/Java/Scala Knowledge of Linux Shell Commands. Hive and HDFS
Target Audience Data engineer/architect;
Big data developer;
Enterprise application developer
Instructor Shyam Kantesaryia
Location Brittain Hall – BH-210

NB: This is a non-credit course. Certificate provided for all participants who have completed 80% of course hours

Recommended Textbook
We will not follow any specific Textbook

Course Description:

Spark is an open source distributed processing engine built around speed, ease of use, and analytics. Its unique in-memory data processing technique reduces processing time drastically compared to a typical MapReduce program. It has well integrated APls for Batch, SQL, Streaming, Machine Learning and Graph processing in popular programming languages including Java, Scala, Python and R. These advantages made it top level Apache project in 2014.

This course will help you understand and practice various transformation and action Spark APls for data processing pipeline. You will also get familiar with data holder APls including RDD, Dataframe and Dataset.

Topics Covered in this Course:

  • Data processing for machine learning algorithms
  • Explore GraphX library for graph processing
  • Dataframes and Spark SQL
  • Deploy Spark application with various configuration parameters
  • Interoperability of Spark with various other Big Data processing tools
  • Restartability in Spark jobs
  • Various file formats in Spark
  • Data ingestion pipeline with Spark Streaming
  • Spark job optimization techniques
Weekly Topics

Please note that the instructor reserves the right to modify this schedule

Week 1
  • Dataframes and Spark SQL
  • Various file formats in Spark
  • Interoperability of Spark with various other Big Data processing tools
Week  2
  • Data processing for machine learning algorithm
  • Restartability in Spark jobs
Week  3
  • Deploy spark application with various configuration parameters
  • Spark job optimization techniques
Week 4
  • Data ingestion pipeline with Spark Streaming
  • Explore GraphX library for graph processing
TOP