订阅业界RSS CSDN首页> 业界

亚洲首届Apache HBaseCon Asia 2017热力来袭!

发表于2017-06-29 18:36| 次阅读| 来源华为| 0 条评论| 作者华为

摘要:HBaseCon大会是Apache HBase™官方举办的技术会议,发起于2012年。Apache HBase是基于Apache Hadoop构建的一个分布式、可伸缩的KeyValue数据库,它提供了大数据背景下的高性能的随机读写能力,它的实现参考了Google在2006年发布的Bigtable论文。

HBaseCon大会是Apache HBase™官方举办的技术会议,发起于2012年。Apache HBase是基于Apache Hadoop构建的一个分布式、可伸缩的KeyValue数据库,它提供了大数据背景下的高性能的随机读写能力,它的实现参考了Google在2006年发布的Bigtable论文。

大会时间2017.08.04 08:00-18:00

大会地点中国·深圳市龙岗区坂田街道环城路天安云谷13D3楼国际会议中心

参会对象:开发者

Apache HBaseCon Asia 2017 Agenda

演讲主题简介

Keynote

HBase 2.0.0

Michael Stack

HBase-2.0.0 has been a couple of years in the making. It is chock-a-block full of a long list of new features and fixes. In this session, the 2.0.0 release manager will perform the impossible, describing the release content inside the session time bounds.

HBase Practice At XiaoMi

Zheng Hu

We'll share some HBase experience at XiaoMi:

1. How did we tuning G1GC for HBase Clusters.

2. Development and performance of Async HBase Client.

 

Track1

Offheap bucket cache success story and Offheaping the write path in HBase

Ramkrishna Vasudevan and Anoop Sam John

The first part of the talk covers the success story of deploying the latest improvements to offheap mode bucket cache in one of the biggest clusters at Alibaba.

It highlights how off heap read from bucket cache helped in improving the avg QPS and avoided the frequent dips in QPS due to GC.

The second part covers the efforts that went into making the HBase write path to effectively use the offheap memory, various design changes in terms of size accounting and the performance gains that we achieved at the end of the task.

HBase Multi tenancy use cases and various solution

Bhupendra Jain

In a multi tenant scenario the biggest challenge is to achieve the QoS for each tenant without impacting the other tenants workload. This session will talk about the multi tenancy use cases and challenges present in HBase. Session will talk in detail about

a) Achieving Multi tenancy with Single HBase cluster - Solutions, Pros and cons (RS Group, RPC Throttling, Quota etc.)

b) Achieving Multi tenancy with multiple HBase cluster - Solutions, Pros and cons.

Lift the ceiling of HBase throughputs

Yu Li and Lijin Bin

HBase is the core storage of Alibaba's search infrastructure and meets big challenge on improving its throughputs, which decides the speed of machine learning program processing thus the accuracy of recommendations made. In this session we will talk about work done and in progress to increase both read and write throughputs, as well as the real performance on the past Singles' Day and latest benchmark data in laboratory.

HareQL:快速HBase查詢工具的發展過程

Mon-Fong Mike Jiang, Kuan-Yu Hubert Fan-Chiang and Tienyu Rebecca Lin

2011年起,我們就開始使用HBase作為結構化大數據的儲存工具,主要是做為半導體製造設備參數的分析。為了有效進行數據查詢,我們開發Standard Query Language(SQL)的整合介面,最早的方式是(1)自行開發GUI操作介面及(2)透過自行定義SQL語法的方式進行,但是這樣會衍生出很多額外的工作,特別是SQL Parser與對應的HBase API的連結。 

為了解決此問題,我們解析了Hive QL Parser作為主要的核心,將此部分的原始碼整合進HareDB HBase Client之中,另外,也整合了HBase Coprocessor,可以加速查詢的進行,這個架構我們實際使用在數個半導體製造廠的大數據系統中,也展現了高查詢效率。 

除此之外,透過整合Kafka來處理串流數據的匯入,同時對於數據分析的呈現也加上Cube建立工具,這些都是實際開發大數據系統時陸續面對的問題與解決方法,我們將分享這一連串的系統開發過程。

Removable singularity: a story of HBase upgrade in Pinterest

Tianying Chang

HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.

HBase Disaster Recovery Solution at Huawei

Ashish Singhi

HBase Disaster recovery solution aims to maintain high availability of HBase service in case of disaster of one HBase cluster with very minimal user intervention. This session will introduce the HBase disaster recovery use cases and the various solutions adopted at Huawei like.

a) Cluster Read-Write mode

b) DDL operations synchronization with standby cluster

c) Mutation and bulk loaded data replication

d) Further challenges and pending work

Backup / Restore feature in HBase

Vladimir Rodionov and Ted Yu

Backup and restore functionality is crucial to achieving fault tolerance for data management systems.

In the talk, we are going to cover the newly merged backup and restore phases 2 and 3.

Previously users can perform snapshot for backing up data. However, the associated execution cost may be high due to the flush across region servers. There was no incremental snapshot either.

Backup and restore functionality provides two types of backup:

Full backup – foundation for incremental backups

Incremental backup – can be periodic to capture changes over time

We'll cover three types of backup strategies:

Intra-cluster backup

backup on a separate HDFS archive cluster

backup involving Cloud or a Storage Vendor

Best practices for Backup-and-Restore will be presented next.

We'll explain concepts such as Backup Image, Backup Set with example commands of how they are used.

Mechanism for Incremental backups is covered next.

Finally we'll cover bulk load support for backup.

HBase on Beam

Jingcheng Du

Apache Beam is an open source and unified programming model for defining batch and streaming jobs that run on many execution engines, HBase on Beam is a connector that allows Beam to use HBase as a bounded data source and target data store for both batch and streaming data sets. With this connector HBase can work with many batch and streaming engines directly, for example Spark, Flink, Google Cloud Dataflow, etc. In this session, I will introduce Apache Beam, and the current implementation of HBase on Beam and the future plan on this.

 

Track 2

HBase: recent improvement and practice at Alibaba

Wenlong Yang and Han Yang

AliHB, a tailored HBase branch for Alibaba Group's business characteristics and requirements, is widely used as a basic storage service to support the online and nearline applications of whole alibaba economy companies, like taobao.comtmall.comalipay.comcainiao.com and etc.

In this talk, we will share the experience of high availability and low cost to maintain the clusters including more than ten thousand nodes:

1. Several typical scenes introduction at Alibaba

2. SQL(based on Apache Phoenix) improvement

3. Range-level data copy feature cross clusters

4. Prefix-Bloomfilter for scan performance

5. Dual-Service based on async api, enabling concurrent access on two clusters for expected low latency

6. Some useful things for production.

Ecosystems with HBase and CloudTable service at Huawei

Jieshan Bi and Yanhui Zhong

1. CTBase: A light-weight HBase client for structured data.

1). Schematized table, more friendly for structured data storage.

2). Global secondary index for HBase.

3). HBase Query DSL. JSON based light-weight API.

4) Cluster table. Pre-joining with keys, a better solution for cross-table join queries from HBase.

2. Tagram: Distributed bitmap index implementation with HBase.

1). Distributed bitmap index for accelerating AD-HOC queries with low cardinality columns.

2). Powerful and flexible query API.

3). Tagram offers millisecond-level query latency.

3. CloudTable Service Introduction: HBase on Huawei cloud.

Large scale data near-line loading method and architecture

Shuaifeng Zhou

When we do real-time data loading to HBase, we use put/putlist interface. After receiving put request, regionserver will write WAL, write data into memory store, flush memory store to disk-store, then compact files again and again. That precedure occupies too much resource and causing read/write performance decrease. To solve the problem, we provide a kind of near-line loading method and architecture, greatly increase the loading bandwidth, and decrease the influence to read operations.

HBase at JD

Xingbo Peng, Nan Zhang and Bang Wen

1.规模现状

HBase在京东CTO体系中经历了数年的发展,集群规模已经达到3000+台,支持了京东600+业务系统,京东CTO体系的HBase集群,已经经历了多次618和双11的考验。京东CTO体系是HBase的重要用户。

2.应用的业务场景

介绍HBase在京东的典型应用的业务,包括监控、风控、推荐、广告等

3.高可用改进

介绍我们在HBase集群高可用方面做的一些工作,包括跨机房容灾、多租户-资源分组、集群安全等

4.运维实践

主要介绍我们在HBase集群运维上的一些实践,包括:HBase集群监控系统Mummut、报警系统、HBase集群与大数据平台结合、业务运营及数据迁移等

5.未来展望

介绍我们正在基于HBase做的及未来要做的一些工作,包括:kylinphoenix和容器化部署等

Synchronous replication for HBase

Shen Chunhui and Meng Qingyi

This talk will share the detailed implementation and actual practice about synchronous replication between clusters on alibaba's internal HBase branch.

It contains the content of how to keep the data consistency, how to switch the client access between clusters automatically, the related perfomance and monitor.

基于HBase的企业级大数据平台

Xinyu Zhang, Xueliang Chen and Zheng Fan

基于HBase的大数据平台已经成为中国人寿新一代综合业务处理系统中非常重要的基础性数据平台。目前基于该平台已经整合了上百TB的数据,并将几亿客户的客户、业务、接触数据整合到一个统一的数据模型中,并基于此形成了上千个客户标签。同时,基于该平台为客户、营销员和内部管理人员提供了销售支持、客户服务、运营支持等多类应用。通过APP、网页等形式提供了多种信息的检索和查询,并通过深度学习模型提供了反欺诈等方面的数据应用。

HBaseHulu的使用和实践

Qianxi Zhang

1. Hulu是美国最受欢迎的在线视频网站之一,Hulu BeijingHulu第二大研发中心。北京大数据基础架构团队负责整个公司的大数据基础架构的研发和运维。

2. HBaseHulu的概况

3. HBaseHulu的使用

4. 用户画像系统,存放所有用户的基本信息,用户行为,第三方DMP数据和机器学习结果标签(几十万个Qualifier)SparkSpark Streaming读写HBase数据,运行各种机器学习模型,为公司的视频推荐,精准广告和Marketing团队服务

5. HBaseHulu的优化

Apache HBase at Netease

Xinxin Fan and Hongxiang Jiang

First, we will give a brief introduction about the HBase service at Neteaseinclude the basic cluster info and the key HBase service. And then we will talk same tips about the tuning practices for HBase. Last, we will introduce some improvements at the internal HBase version.

Building online HBase cluster of Zhihu based on Kubernetes

Zhiyong Bai

As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapidand provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment.

 

0
0