View Class Schedule 
► ProgramsHortonworks HDP Analyst: Data Science

Hortonworks HDP Analyst: Data Science

Overview

This course provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Outcomes

Upon completion of this course, students will be able to:
  • Recognize use cases for data science on Hadoop
  • Describe the Hadoop and YARN architecture
  • Describe supervised and unsupervised learning differences
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Describe the data science life cycle
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Describe options for running Python code on a Hadoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Use machine learning algorithms
  • Describe use cases for Natural Language Processing (NLP)
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib
  • Take data science into production

Audience

This class is for architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

Prerequisites

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

HD-ILT

HD-ILT™ (High Definition Instructor Led Training) is a patented, state-of-the-art video conferencing/remote lab training modality that allows students to study from Reston, VA; Columbia, MD or any of our other 30+ locations across North America, or remotely from a home or office. Students in the HD-ILT lab will receive live instruction in a virtual environment, including the same hands-on labs used in classroom-based courses.  For more information on our HD-ILT learning environment, click here.

System Requirements

For students choosing the live online class format, a complete list of system requirements can be found here.

Duration

3 Days


Course Outline

Group Training Available

UMBC Training Centers can deliver any of our courses in a group training environment at our facilities or yours. Group training can be an effective and economical method to quickly assure competency and consistency of knowledge and skills within an organization or department.