Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala.
Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations.
What You Will Learn
- Extend the tools available for processing and storage
- Examine clustering and classification using MLlib
- Discover Spark stream processing via Flume, HDFS
- Create a schema in Spark SQL, and learn how a Spark schema can be populated with data
- Study Spark based graph processing using Spark GraphX
- Combine Spark with H20 and deep learning and learn why it is useful
- Evaluate how graph storage works with Apache Spark, Titan, HBase and Cassandra
- Use Apache Spark in the cloud with Databricks and AWS
A basic understanding of functional programming and object oriented programming will help. Knowledge of Scala will definitely be a plus, but is not mandatory.
Spark intro – Programming model
Components of Spark
Downloading and setup
“Core Spark – Driver Program & SparkContext,
worker nodes, Executor, tasks”
Spark standalone application
Transformations and functions
Transformations using pair RRDs
Data loading & SQL
Loading and saving your data
Numeric RRD operations
Spark runtime architecture
Packaging code with dependencies
Apache Spark Training in chennai is Primarily hands-On & available as
Classroom / Online / Corporate Training
Call – +91 9789968765 / +91 99627 74619 / +91 9176HADOOP / 044 – 42645495
Apache Spark Training in Chennai
Updated on 2016-03-05T12:57:40+00:00, by .