Apache Spark
Success of many organizations depends on their ability to derive business insights from massive amount of raw data coming from various sources. Apache Spark offers many engineering improvements over the traditional MapReduce programming model as implemented in Hadoop by providing multi-pass in-memory processing of data which boosts the overall performance of your ETL and machine-learning algorithms.
This Spark training course covers theoretical and technical aspects of Spark programming. The course teaches developers Spark fundamentals, APIs, common programming idioms and more. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material and quickly get them up to speed on using Spark for data exploration.
AUDIENCE
PREREQUISITES
Students should have experience programming in a modern structured language. Experience with Python and SQL is recommended.
- Introduction to Functional Programming
- Introduction to Apache Spark
- Hadoop Distributed File System Overview
- The Spark Shell
- Spark Resilient Distributed Datasets (RDDs)
- Shared Variables in Spark
- Parallel Data Processing with Spark
- Introduction to Spark SQL
- Graph Processing with GraphX
- Machine Learning Algorithms
- The Spark Machine Learning Library
- Spark Streaming
Is there a discount available for current students?
UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Asynchronous courses are excluded from this offer.
What is the cancellation and refund policy?
Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.