Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala.

 

Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.

 

What You Will Learn

  • Extend the tools available for processing and storage
  • Examine clustering and classification using MLlib
  • Discover Spark stream processing via Flume, HDFS
  • Create a schema in Spark SQL, and learn how a Spark schema can be populated with data
  • Study Spark based graph processing using Spark GraphX
  • Combine Spark with H20 and deep learning and learn why it is useful
  • Evaluate how graph storage works with Apache Spark, Titan, HBase and Cassandra
  • Use Apache Spark in the cloud with Databricks and AWS

spark

 Pre-requisites

A basic understanding of functional programming and object oriented programming will help. Knowledge of Scala will definitely be a plus, but is not mandatory.

 

Course Contents

Introduction

Spark intro – Programming model

Components of Spark

Downloading and setup

“Core Spark – Driver Program & SparkContext,

worker nodes, Executor, tasks”

Spark standalone application

 

RRD

RDD intro

creating RDDs

RRD operations

Transformations and functions

Caching

 

Pair RDD

Key-value pairs

Transformations using pair RRDs

 

Data loading & SQL

Loading and saving your data

SPARK SQL

Aggregations

 

Runtime

Accumulators

Broadcast variables

Numeric RRD operations

Spark runtime architecture

Deploying applications

Packaging code with dependencies

Scheduling

Cluster managers

Streaming API

Spark streaming

Architecture

Transformations

Output operations

Input sources

Streaming UI

Apache Spark Training in chennai is Primarily hands-On & available as

Classroom / Online / Corporate Training

http://cloud-computing-training.in/contact

Call – +91 9789968765 / +91 99627 74619 / +91 9176HADOOP / 044 – 42645495

Apache Spark Training in Chennai

Updated on 2016-03-05T12:57:40+00:00, by admin.