Modular Programme – Data Analytics
Programme Summary
This modular course is intended for the candidates who would like to learn to store, manage, process and analyse massive amounts of unstructured data for competitive advantage, select and implement the correct Big Data stores and apply sophisticated analytic techniques and tools to process and analyse big data. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data.
The course provides an overview of how to plan and implement a Big Data solution and the various technologies that comprise Big Data. Many examples and exercises of Big Data systems are provided throughout the course. The programming examples are in Java but the primary focus is on best practices that can be applied to any supported programming language.
Those completing this course will be encouraged to take an accreditation test on Hadoop’s Big Data to obtain a professional certification at any time of their convenience.
Topics Covered
This modular course contains the following topics
- Introduction to Big Data
- Storing Big Data
- Processing Big Data
- Tools and Techniques to Analyze Big Data
- Developing and Implementing a Big Data Strategy
Duration
Full Time: 5 days
Course Objectives
At the end of the course, candidates will learn the following:
- An overview of Big Data
- Know what is Hadoop
- The basic Hadoop architecture
- The features of the Hadoop Distributed Filesystem (HDFS)
- The full MapReduce flow
- Know how to write a MapReduce job in Java
- Submitting a MapReduce job in the Hadoop cluster
- The basic and advanced MapReduce API
- Optimizing MapReduce jobs
- The common MapReduce Algorithms
- The different Hadoop Ecosystem Tools
- Data management using Sqoop and Flume
- Basic Programming with HBase
- The basics of writing a Pig Latin Script
- The different Hive commands
- Importance of HCatalog
- Writing Oozie workflows
Candidates will work through the following lab exercises using the Hadoop Data Platform:
- Data manipulation using the HDFS command-line interface (CLI)
- Writing and running a MapReduce job in Java
- Using Eclipse to speed up MapReduce development in Java
- Testing MapReduce jobs using the MRUnit Test and LocalJobRunner
- Practicing the common MapReduce Algorithms
- How to optimize MapReduce jobs
- Importing data from MySQL to HDFS using Sqoop
- Split and join datasets using Pig
- Data manipulation using Hive
Module Outline
Core Modules:
Computing Environment | The current mix of computing resources and demands that motivates use of a technology like Apache Hadoop |
Hadoop Distributed File System | How files are stored and managed in HDFS; the infrastructure that supports HDFS |
MapReduce | The phases of execution and framework for running a MapReduce job. Expected properties of job runs based on number of mappers, number of reducers and distribution of data |
Hadoop API | The Java classes that make up the API for developers who wish to write Apache Hadoop MapReduce jobs |
Hadoop Platform | The basic purpose, design and operation of tools that augment the Apache Hadoop core to make a comprehensive platform, including Hadoop Streaming, fuse-dfs, Apache Hive, Apache Pig, Apache Flume, Apache Sqoop, Apache HBase, Apache Oozie and HUE |
Delivery Format
- This modular course delivered as per the following
- Class room learning is 60 % of the duration
- Hands on tutorials is the balance 40 % of the duration
Applicable to Singaporeans/Singapore PRs ONLY
- The candidates who enrol under the Workforce Development Authority (WDA) funding program under Lithan Academy need to complete the assessments at the end of the module.
- The candidates who enrolled under Workforce Development Authority (WDA) funding program under Lithan Academy will be issued their Statement of Appointment (SOA) after successfully completing the module.