Technique of Processing Big Data

Monday, August 26, 2019

Technique of Processing Big Data

Typical Data centre with Hadoop

Sqoop

tool designed to transfer data between Hadoop and relational databases or mainframes

$ sqoop list –databases –connect jdbc:mysql://database.test.com/

Pig

high level scripting language
run on client
simple SQL-like scripting language is called Pig Latin
Uses

ETL data pipeline
Research on raw data
Iterative processing.

Hive

high level abstraction of map reduce
turn hiveql queries into mapreduce jobs
SQL like language
Hive makes analysis of data stored in Hadoop easier and more productive than by writing MapReduce code
HiveQL statements are interpreted by Hive. Hive then produces one or more MapReduce jobs, and then submits them for execu>on on the Hadoop cluster.

Analyzing the relatively static data
Less Responsive time
No rapid changes in data.

Impala

similar to HiveQL
runs on hadoop cluster
Impala is meant for interactive computing. Hive is more batch processing

Real time data to HDFS

Apache Flume is a system used for moving massive quantities of streaming data into HDFS.

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets

2 comments:

surenkumar said...: Thanks for Sharing This Article. It was a valuable content. Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
python Training in Hyderabad

python Course in Hyderabad; 10:37 PM
MVLTR Apps for Android said...: Happy to visit your blog, I am by all accounts forward to more solid articles and I figure we as a whole wish to thank such countless great articles, blog to impart to us…

AWS Training in Hyderabad; 5:50 PM

Database administration - Tips and Tricks (Oracle, SQLServer)

High Availbility

OS & Virtualization

Monday, August 26, 2019

Technique of Processing Big Data

Typical Data centre with Hadoop

Real time data to HDFS

2 comments:

SQLServer

MySQL