Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala.


Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.


What You Will Learn

  • Extend the tools available for processing and storage
  • Examine clustering and classification using MLlib
  • Discover Spark stream processing via Flume, HDFS
  • Create a schema in Spark SQL, and learn how a Spark schema can be populated with data
  • Study Spark based graph processing using Spark GraphX
  • Combine Spark with H20 and deep learning and learn why it is useful
  • Evaluate how graph storage works with Apache Spark, Titan, HBase and Cassandra
  • Use Apache Spark in the cloud with Databricks and AWS



A basic understanding of functional programming and object oriented programming will help. Knowledge of Scala will definitely be a plus, but is not mandatory.


Course Contents


Spark intro – Programming model

Components of Spark

Downloading and setup

“Core Spark – Driver Program & SparkContext,

worker nodes, Executor, tasks”

Spark standalone application



RDD intro

creating RDDs

RRD operations

Transformations and functions



Pair RDD

Key-value pairs

Transformations using pair RRDs


Data loading & SQL

Loading and saving your data






Broadcast variables

Numeric RRD operations

Spark runtime architecture

Deploying applications

Packaging code with dependencies


Cluster managers

Streaming API

Spark streaming



Output operations

Input sources

Streaming UI

Apache Spark Training in chennai is Primarily hands-On & available as

Classroom / Online / Corporate Training

Call – +91 9789968765 / +91 99627 74619 / +91 9176HADOOP / 044 – 42645495

Apache Spark Training in Chennai

Updated on 2016-03-05T12:57:40+00:00, by admin.