Contact Us @ +1-210-503-7101

BIGDATA - HADOOP

Hadoop Training – Course Content

Overview:
Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.
Training Objectives of Hadoop:
Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux.
Introduction , The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach
Hadoop Basic Concepts
  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands on Exercise
  • How MapReduce Works
  • Hands on Exercies
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components
Writing a MapReduce Program
  • Examining a Sample MapReduce Program
  • With several examples
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API
Delving Deeper Into The Hadoop API
  • More About ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data With Combiners
  • The configure and close methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Hands-On Exercise
  • Directly Accessing HDFS
  • Using the Distributed Cache
  • Hands-On Exercise
Performing several hadoopjobs
  • The configure and close Methods
  • Sequence Files
  • Record Reader
  • Record Writer
  • Role of Reporter
  • Output Collector
  • Processing video files and audio files
  • Processing image files
  • Processing XML files
  • Counters
  • Directly Accessing HDFS
  • ToolRunner
  • Using The Distributed Cache
Common MapReduce Algorithms
  • Sorting and Searching
  • Indexing
  • Classification/Machine Learning
  • Term Frequency – Inverse Document Frequency
  • Word Co-Occurrence
  • Hands-On Exercise: Creating an Inverted Index
  • Identity Mapper
  • Identity Reducer
  • Exploring well known problems using MapReduce applications
Usining HBase
  • What is HBase?
  • HBase API
  • Managing large data sets with HBase
  • Using HBase in Hadoop applications
  • Hands-on Exercise
Using Hive and Pig
  • Hive Basics
  • Pig Basics
  • Hands on Exercise
Practical Development Tips and Techniques
  • Debugging MapReduce Code
  • Using LocalJobRunner Mode for Easier Debugging
  • Retrieving Job Information with Countrers
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Hands on Exercise
Debugging MapReduce Programs
  • Testing with MRUnit
  • Logging
  • Classification/Machine Learning
  • Advanced MapReduce Programming
  • A Recap of the MapReduce Flow
  • The Secondary Sort
  • CustomizedInputFormats and OutputFormats
  • Pipelining Jobs With Oozie
  • Map-Side Joins
  • Reduce-Side Joins
Joining Data Sets in MapReduce
  • Map-Side Joins
  • The Secondary Sort
  • Reduce-Side Joins
Monitoring and debugging on a Production Cluster
  • Counters
  • Skipping Bad Records
  • Rerunning failed tasks with Isolation Runner
Tuning for Performance in MapReduce
  • Reducing network traffic with combiner
  • Partitioners
  • Reducing the amount of input data
  • Using Compression
  • Reusing the JVM
  • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects

Enroll Here