点击选择搜索分类
首页 - 心理学- 正文
☆☆☆☆☆
||
[美] Mark,Grover,Ted,Malaska,Jonathan ... 著
出版社: 东南大学出版社 ISBN:9787564170011 版次:1 商品编码:12151372 包装:平装 外文名称:Hadoop Application Architectures 开本:16开 出版时间:2017-02-01 用纸:胶版纸 页数:371 字数:490000 正文语种:英文
为了增强学习效果,《Hadoop应用架构(影印版 英文版)》第二部分提供了各种详细的架构案例.涵盖部分常见的Hadoop应用场景。
无论你是在设计一个新的Hadoop应用还是正计划将Hadoop整合到现有的数据基础架构中,《Hadoop应用架构(影印版 英文版)》都将在这整个过程中提供技巧性的指导。
使用Hadoop存放数据和建模数据时需要考虑的要素 在系统中导入数据和从系统中导出数据的实践指导 数据处理的框架,包括MapReduce、Spark和Hive 常用Hadoop处理模式,例如移除重复记录和使用窗口分析 Giraph,GraphX以及其他Hadoop上的大图片处理工具 使用工作流协作和调度工具,例如Apache Oozie 使用Apache Storm、Apache Spark Streaming和Apache Flume处理准实时数据流 点击流分析、欺诈防止和数据仓库的架构实例
Preface
Part Ⅰ. Architectural Considerations for Hadoop Applications
1. Data Modeling in Hadoop
Data Storage Options
Standard File Formats
Hadoop File Types
Serialization Formats
Columnar Formats
Compression
HDFS Schema Design
Location of HDFS Files
Advanced HDFS Schema Design
HDFS Schema Design Summary
HBase Schema Design
Row Key
Timestamp
Hops
Tables and Regions
Using Columns
Using Column Families
Time-to-Live
Managing Metadata
What Is Metadata?
Why Care About Metadata?
Where to Store Metadata?
Examples of Managing Metadata
Limitations of the Hive Metastore and HCatalog
Other Ways of Storing Metadata
Conclusion
2. Data Movement
Data Ingestion Considerations
Timeliness of Data Ingestion
Incremental Updates
Access Patterns
Original Source System and Data Structure
Transformations
Network Bottlenecks
Network Security
Push or Pull
Failure Handling
Level of Complexity
Data Ingestion Options
File Transfers
Considerations for File Transfers versus Other Ingest Methods
Sqoop: Batch Transfer Between Hadoop and Relational Databases
Flume: Event-Based Data Collection and Processing
Kafka
Data Extraction
Conclusion
3. Processing Data in Hadoop
MapReduce
MapReduce Overview
Example for MapReduce
When to Use MapReduce
Spark
Spark Overview
Overview of Spark Components
Basic Spark Concepts
Benefits of Using Spark
Spark Example
When to Use Spark
Abstractions
Pig
Pig Example
When to Use Pig
Crunch
Crunch Example
When to Use Crunch
Cascading
Cascading Example
When to Use Cascading
Hive
Hive Overview
Example of Hive Code
When to Use Hive
Impala
Impala Overview
Speed-Oriented Design
Impala Example
When to Use Impala
Conclusion
4. Common Hadoop Processing Patterns
Pattern: Removing Duplicate Records by Primary Key
Data Generation for Deduplication Example
Code Example: Spark Deduplication in Scala
Code Example: Deduplication in SQL
Pattern: Windowing Analysis
Data Generation for Windowing Analysis Example
Code Example: Peaks and Valleys in Spark
Code Example: Peaks and Valleys in SQL
Pattern: Time Series Modifications
Use HBase and Versioning
Use HBase with a RowKey of RecordKey and StartTime
Use HDFS and Rewrite the Whole Table
Use Partitions on HDFS for Current and Historical Records
Data Generation for Time Series Example
Code Example: Time Series in Spark
Code Example: Time Series in SQL
Conclusion
5. Graph Processing on Hadoop
What Is a Graph?
What Is Graph Processing?
How Do You Process a Graph in a Distributed System?
The Bulk Synchronous Parallel Model
BSP by Example
Giraph
Read and Partition the Data
Batch Process the Graph with BSP
Write the Graph Back to Disk
Putting It All Together
When Should You Use Giraph?
GraphX
Just Another RDD
GraphX Pregel Interface
vprog0
sendMessage0
mergeMessage0
Which Tool to Use?
Conclusion
6. Orchestration
Why We Need Workflow Orchestration
The Limits of Scripting
The Enterprise Job Scheduler and Hadoop
Orchestration Frameworks in the Hadoop Ecosystem
Oozie Terminology
Oozie Overview
Oozie Workflow
Workflow Patterns
Point-to-Point Workflow
Fan- Out Workflow
Capture-and-Decide Workflow
Parameterizing Workflows
Classpath Definition
Scheduling Patterns
Frequency Scheduling
Time and Data Triggers
Executing Workflows
Conclusion
7. Near-Real-Time Processing with Hadoop
Stream Processing
Apache Storm
Storm High-Level Architecture
Storm Topologies
Tuples and Streams
Spouts and Bolts
Stream Groupings
Reliability of Storm Applications
Exactly-Once Processing
Fault Tolerance
Integrating Storm with HDFS
Integrating Storm with HBase
Storm Example: Simple Moving Average
Evaluating Storm
Trident
Trident Example: Simple Moving Average
Evaluating Trident
Spark Streaming
Overview of Spark Streaming
Spark Streaming Example: Simple Count
Spark Streaming Example: Multiple Inputs
Spark Streaming Example: Maintaining State
Spark Streaming Example: Windowing
Spark Streaming Example: Streaming versus ETL Code
Evaluating Spark Streaming
Flume Interceptors
Which Tool to Use?
Low-Latency Enrichment, Validation, Alerting, and Ingestion
NRT Counting, Rolling Averages, and Iterative Processing
Complex Data Pipelines
Conclusion
Part Ⅱ. Case Studies
8. Clickstream Analysis
Defining the Use Case
Using Hadoop for Clickstream Analysis
Design Overview
Storage
Ingestion
The Client Tier
The Collector Tier
Processing
Data Deduplication
Sessionization
Analyzing
Orchestration
Conclusion
9. Fraud Detection
Continuous Improvement
Taking Action
Architectural Requirements of Fraud Detection Systems
Introducing Our Use Case
High-Level Design
Client Architecture
Profile Storage and Retrieval
Caching
HBase Data Definition
Delivering Transaction Status: Approved or Denied?
Ingest
Path Between the Client and Flume
Near-Real-Time and Exploratory Analytics
Near-Real-Time Processing
Exploratory Analytics
What About Other Architectures?
Flume Interceptors
Kafka to Storm or Spark Streaming
External Business Rules Engine
Conclusion
10. Data Warehouse
Using Hadoop for Data Warehousing
Defining the Use Case
OLTP Schema
Data Warehouse: Introduction and Terminology
Data Warehousing with Hadoop
High-Level Design
Data Modeling and Storage
Ingestion
Data Processing and Access
Aggregations
Data Export
Orchestration
Conclusion
A. Joins in Impala
Index
Includes everything required for Hadoop applications to run,except data,Thisincludes JAR files,Oozie workflow definitions,Hive HQL files,and more.Theapplication code directory/app is used for application artifacts such as JARs forOozie actions or Hive user—defined functions(UDFs).It is not always necessaryto store such application artifacts in HDFS.but some Hadoop applications suchas Oozie and Hive require storing shared code and configuration on HDFS so itcan be used by code executing on any node of the cluster.This directory shouldhave a subdirectory for each group and application,similar to the structure usedin/etl.For a given application(say,Oozie),you would need a directory for eachversion of the artifacts you decide to store in HDFS,possibly tagging,via a symlink in HDFS,the latest artifact as latest and the currently used one as current.The directories containing the binary artifacts would be present under these versioned directories.This will look similar to:/appkgroup>kapplication>/< ver_sion >/< artrfact directory >/< artifact >.To continue our previous example,the JARfor the latest build of our aggregate preferences process would be in a directorystructure like/app/BI/clickstream/latest/aggregate—preferences/uber—aggregate—preferences.jar.
……
Hadoop应用架构(影印版 英文版) [Hadoop Application Architectures] 电子书 下载 mobi epub pdf txt
Hadoop应用架构(影印版 英文版) [Hadoop Application Architectures]-so88
Hadoop应用架构(影印版 英文版) [Hadoop Application Architectures] pdf epub mobi txt 电子书 下载 2022
图书介绍
☆☆☆☆☆
||
[美] Mark,Grover,Ted,Malaska,Jonathan ... 著
出版社: 东南大学出版社 ISBN:9787564170011 版次:1 商品编码:12151372 包装:平装 外文名称:Hadoop Application Architectures 开本:16开 出版时间:2017-02-01 用纸:胶版纸 页数:371 字数:490000 正文语种:英文
内容简介
在使用Apache Hadoop设计端到端数据管理解决方案时获得专家级指导。当其他很多渠道还停留在解释Hadoop生态系统中该如何使用各种纷繁复杂的组件时,这本专注实践的书已带领你从架构的整体角度思考,它对于你的特别应用场景而言是必不可少的,将所有组件紧密结合在一起,形成完整有针对性的应用程序。为了增强学习效果,《Hadoop应用架构(影印版 英文版)》第二部分提供了各种详细的架构案例.涵盖部分常见的Hadoop应用场景。
无论你是在设计一个新的Hadoop应用还是正计划将Hadoop整合到现有的数据基础架构中,《Hadoop应用架构(影印版 英文版)》都将在这整个过程中提供技巧性的指导。
使用Hadoop存放数据和建模数据时需要考虑的要素 在系统中导入数据和从系统中导出数据的实践指导 数据处理的框架,包括MapReduce、Spark和Hive 常用Hadoop处理模式,例如移除重复记录和使用窗口分析 Giraph,GraphX以及其他Hadoop上的大图片处理工具 使用工作流协作和调度工具,例如Apache Oozie 使用Apache Storm、Apache Spark Streaming和Apache Flume处理准实时数据流 点击流分析、欺诈防止和数据仓库的架构实例
目录
ForewordPreface
Part Ⅰ. Architectural Considerations for Hadoop Applications
1. Data Modeling in Hadoop
Data Storage Options
Standard File Formats
Hadoop File Types
Serialization Formats
Columnar Formats
Compression
HDFS Schema Design
Location of HDFS Files
Advanced HDFS Schema Design
HDFS Schema Design Summary
HBase Schema Design
Row Key
Timestamp
Hops
Tables and Regions
Using Columns
Using Column Families
Time-to-Live
Managing Metadata
What Is Metadata?
Why Care About Metadata?
Where to Store Metadata?
Examples of Managing Metadata
Limitations of the Hive Metastore and HCatalog
Other Ways of Storing Metadata
Conclusion
2. Data Movement
Data Ingestion Considerations
Timeliness of Data Ingestion
Incremental Updates
Access Patterns
Original Source System and Data Structure
Transformations
Network Bottlenecks
Network Security
Push or Pull
Failure Handling
Level of Complexity
Data Ingestion Options
File Transfers
Considerations for File Transfers versus Other Ingest Methods
Sqoop: Batch Transfer Between Hadoop and Relational Databases
Flume: Event-Based Data Collection and Processing
Kafka
Data Extraction
Conclusion
3. Processing Data in Hadoop
MapReduce
MapReduce Overview
Example for MapReduce
When to Use MapReduce
Spark
Spark Overview
Overview of Spark Components
Basic Spark Concepts
Benefits of Using Spark
Spark Example
When to Use Spark
Abstractions
Pig
Pig Example
When to Use Pig
Crunch
Crunch Example
When to Use Crunch
Cascading
Cascading Example
When to Use Cascading
Hive
Hive Overview
Example of Hive Code
When to Use Hive
Impala
Impala Overview
Speed-Oriented Design
Impala Example
When to Use Impala
Conclusion
4. Common Hadoop Processing Patterns
Pattern: Removing Duplicate Records by Primary Key
Data Generation for Deduplication Example
Code Example: Spark Deduplication in Scala
Code Example: Deduplication in SQL
Pattern: Windowing Analysis
Data Generation for Windowing Analysis Example
Code Example: Peaks and Valleys in Spark
Code Example: Peaks and Valleys in SQL
Pattern: Time Series Modifications
Use HBase and Versioning
Use HBase with a RowKey of RecordKey and StartTime
Use HDFS and Rewrite the Whole Table
Use Partitions on HDFS for Current and Historical Records
Data Generation for Time Series Example
Code Example: Time Series in Spark
Code Example: Time Series in SQL
Conclusion
5. Graph Processing on Hadoop
What Is a Graph?
What Is Graph Processing?
How Do You Process a Graph in a Distributed System?
The Bulk Synchronous Parallel Model
BSP by Example
Giraph
Read and Partition the Data
Batch Process the Graph with BSP
Write the Graph Back to Disk
Putting It All Together
When Should You Use Giraph?
GraphX
Just Another RDD
GraphX Pregel Interface
vprog0
sendMessage0
mergeMessage0
Which Tool to Use?
Conclusion
6. Orchestration
Why We Need Workflow Orchestration
The Limits of Scripting
The Enterprise Job Scheduler and Hadoop
Orchestration Frameworks in the Hadoop Ecosystem
Oozie Terminology
Oozie Overview
Oozie Workflow
Workflow Patterns
Point-to-Point Workflow
Fan- Out Workflow
Capture-and-Decide Workflow
Parameterizing Workflows
Classpath Definition
Scheduling Patterns
Frequency Scheduling
Time and Data Triggers
Executing Workflows
Conclusion
7. Near-Real-Time Processing with Hadoop
Stream Processing
Apache Storm
Storm High-Level Architecture
Storm Topologies
Tuples and Streams
Spouts and Bolts
Stream Groupings
Reliability of Storm Applications
Exactly-Once Processing
Fault Tolerance
Integrating Storm with HDFS
Integrating Storm with HBase
Storm Example: Simple Moving Average
Evaluating Storm
Trident
Trident Example: Simple Moving Average
Evaluating Trident
Spark Streaming
Overview of Spark Streaming
Spark Streaming Example: Simple Count
Spark Streaming Example: Multiple Inputs
Spark Streaming Example: Maintaining State
Spark Streaming Example: Windowing
Spark Streaming Example: Streaming versus ETL Code
Evaluating Spark Streaming
Flume Interceptors
Which Tool to Use?
Low-Latency Enrichment, Validation, Alerting, and Ingestion
NRT Counting, Rolling Averages, and Iterative Processing
Complex Data Pipelines
Conclusion
Part Ⅱ. Case Studies
8. Clickstream Analysis
Defining the Use Case
Using Hadoop for Clickstream Analysis
Design Overview
Storage
Ingestion
The Client Tier
The Collector Tier
Processing
Data Deduplication
Sessionization
Analyzing
Orchestration
Conclusion
9. Fraud Detection
Continuous Improvement
Taking Action
Architectural Requirements of Fraud Detection Systems
Introducing Our Use Case
High-Level Design
Client Architecture
Profile Storage and Retrieval
Caching
HBase Data Definition
Delivering Transaction Status: Approved or Denied?
Ingest
Path Between the Client and Flume
Near-Real-Time and Exploratory Analytics
Near-Real-Time Processing
Exploratory Analytics
What About Other Architectures?
Flume Interceptors
Kafka to Storm or Spark Streaming
External Business Rules Engine
Conclusion
10. Data Warehouse
Using Hadoop for Data Warehousing
Defining the Use Case
OLTP Schema
Data Warehouse: Introduction and Terminology
Data Warehousing with Hadoop
High-Level Design
Data Modeling and Storage
Ingestion
Data Processing and Access
Aggregations
Data Export
Orchestration
Conclusion
A. Joins in Impala
Index
精彩书摘
《Hadoop应用架构(影印版 英文版)》:Includes everything required for Hadoop applications to run,except data,Thisincludes JAR files,Oozie workflow definitions,Hive HQL files,and more.Theapplication code directory/app is used for application artifacts such as JARs forOozie actions or Hive user—defined functions(UDFs).It is not always necessaryto store such application artifacts in HDFS.but some Hadoop applications suchas Oozie and Hive require storing shared code and configuration on HDFS so itcan be used by code executing on any node of the cluster.This directory shouldhave a subdirectory for each group and application,similar to the structure usedin/etl.For a given application(say,Oozie),you would need a directory for eachversion of the artifacts you decide to store in HDFS,possibly tagging,via a symlink in HDFS,the latest artifact as latest and the currently used one as current.The directories containing the binary artifacts would be present under these versioned directories.This will look similar to:/appkgroup>kapplication>/< ver_sion >/< artrfact directory >/< artifact >.To continue our previous example,the JARfor the latest build of our aggregate preferences process would be in a directorystructure like/app/BI/clickstream/latest/aggregate—preferences/uber—aggregate—preferences.jar.
……
Hadoop应用架构(影印版 英文版) [Hadoop Application Architectures] 电子书 下载 mobi epub pdf txt
电子书下载地址:
相关电子书推荐:
- 文件名
- 科学可以这样看丛书:失落的非洲寺庙 9787229126308 [南非] 迈克尔特林格-R
- 跟着大厨学做宴客菜 家常菜 菜谱大全 烹饪食谱图解制作做法 美食厨师新手 简单做菜 烧菜
- {RT}青少年水域安全教育读本-王斌 湖北科学技术出版社 9787535291974
- 神奇速算
- 探秘世界之最(学生版)/探索天下
- 学生万次练字宝(毛笔系列)
- {RT}昆虫博物馆-李湘涛 时事出版社 9787800099472
- 培优宝典 让孩子从此爱上语文 知识大全 语文 升级版 第9次修订 原吉林摄影出版社知识集锦
- 异构多无人机 :无人机系列 畅销书籍 正版 [西] Anibai,Ollero,Lvan,
- 巧厨娘微食季:禽蛋50味(B08)
- 安全-危险没什么了不起-1
- 【中信书店】食帖07:大丈夫生于厨房
- 物种起源:大自然的谜题 9787512713970
- 学而思 小学数学几何秘籍(五年级)
- 爱因斯坦讲堂系列丛书:《被海洋卷走的世界》 9787514614787 [英] 詹姆斯·威