Typical Data centre with Hadoop
- tool designed to transfer data between Hadoop and relational databases or mainframes
Eg
- $ sqoop list –databases –connect jdbc:mysql://database.test.com/
Pig
- high level scripting language
- run on client
- simple SQL-like scripting language is called Pig Latin
- Uses
- ETL data pipeline
- Research on raw data
- Iterative processing.
Hive
- high level abstraction of map reduce
- turn hiveql queries into mapreduce jobs
- SQL like language
- Hive makes analysis of data stored in Hadoop easier and more productive than by writing MapReduce code
- HiveQL statements are interpreted by Hive. Hive then produces one or more MapReduce jobs, and then submits them for execu>on on the Hadoop cluster.
- Analyzing the relatively static data
- Less Responsive time
- No rapid changes in data.
Impala
- similar to HiveQL
- runs on hadoop cluster
- Impala is meant for interactive computing. Hive is more batch processing
Real time data to HDFS
Apache Flume is a system used for moving massive quantities of streaming data into HDFS.
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets
2 comments:
Thanks for Sharing This Article. It was a valuable content. Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
python Training in Hyderabad
python Course in Hyderabad
Happy to visit your blog, I am by all accounts forward to more solid articles and I figure we as a whole wish to thank such countless great articles, blog to impart to us…
AWS Training in Hyderabad
Post a Comment