Big Data Hadoop Workshop | Call Us on: +91-9069695109

Overview

Big Data is a terminology being given to very large data sets which can be analyzed computationally to show us patterns or trends in the random data. Today whole IT Industry is re-structuring the way they used to maintain their database. This data could be anything right from email IDs, numbers of employees, clients or blood groups of patients, database collection of driving license numbers of whole world.

Big Data in simple words is a technique to manage the important and scattered database and analyze its behavior. This technology is the latest technology on which whole world is moving onto. Enormous Jobs and Opportunities to start own business will be created in the field.

IBM Says: Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is Data Science.

Prerequisite

The Workshop content consists of an approximately equal mixture of lecture and hands-on lab. This will be a minimum 1 / 2 days workshop. All students have at least moderate knowledge in Basic of C Programming Knowledge.

Recommendation: It is strongly recommended to bring your own LAPTOP during the training on which you can install and run programs if you would like to do the optional, hands-on experiments/exercises after the trainings/ workshops.

Course Details

Introduction to Big Data

What is Big data
Big Data opportunities
Big Data Challenges
Characteristics of Big data

Introduction to Hadoop

Hadoop Distributed File System
Hadoop Distributed File System
Industries using Hadoop.
Data Locality.
Hadoop Architecture.
Map Reduce & HDFS.
Using the Hadoop single node image (Clone).

The Hadoop Distributed File System (HDFS)

HDFS Design & Concepts
Blocks, Name nodes and Data nodes
HDFS High-Availability and HDFS Federation.
Hadoop DFS The Command-Line Interface.
Anatomy of File Read
Anatomy of File Write
Block Placement Policy and Modes
More detailed explanation about Configuration files.
Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.
How to add New Data Node dynamically.
How to decommission a Data Node dynamically (Without stopping cluster).
FSCK Utility. (Block report).
How to override default configuration at system level and Programming level.
HDFS Federation.
ZOOKEEPER Leader Election Algorithm.
Exercise and small use case on HDFS.

Map Reduce

Functional Programming Basics.
Map and Reduce Basics
How Map Reduce Works
Anatomy of a Map Reduce Job Run
Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task
Execution, Progress and Status Updates
Job Completion, Failures
Shuffling and Sorting
Splits, Record reader, Partition, Types of partitions & Combiner
Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots.
Types of Schedulers and Counters.
Comparisons between Old and New API at code and Architecture Level.
Getting the data from RDBMS into HDFS using Custom data types.
Distributed Cache and Hadoop Streaming (Python, Ruby and R).

Introduction to R

History of R
An Insight into R
Data Structure and Data Type

Data Management and Data Cleaning

Missing Value Treatment
Outlier Treatment
Sorting Datasets
Merging Datasets
Creating new variables
Binning variables
Reading datasets from other environments into R ( importing )
Writing datasets from R environment to other environments (exporting )

Data Visualization in R

Bar Chart
Dot Plot
Scatter Plot ( 3D )
Spinning Scatter Plots
Pie Chart
Histogram ( 3D ) [including colourful ones
Overlapping Histograms
Boxplot
Plotting with Base and Lattice Graphics
Plotting and Colouring
Geo Charts
Motion Charts
Case Study with Data Management

Register For Big Data Hadoop

Loading...

Task Progress

Wokshops 92%

Training 85%

Development 80%

Overview

Prerequisite

Course Details

Introduction to Big Data

Introduction to Hadoop

The Hadoop Distributed File System (HDFS)

Map Reduce

Introduction to R

Data Management and Data Cleaning

Data Visualization in R

Register For Big Data Hadoop

Our Technologies

Quick Links

Training Cources

Workshops

Contact Us