1

Introduction

As more and more companies begin digital transformation, the big data industry has achieved unprecedented rapid development. The prosperity of big data has also brought unprecedented opportunities and challenges to the technologies of the big data ecosystem. When it comes to big data technology, I believe that everyone will be familiar with Apache. The vast majority of big data open source technologies come from the Apache Foundation. Today I will introduce you to the Apache annual event-ApacheCon.

ApacheCon @Official Global Series Conference

ApacheCon is the official global series of conferences of the Apache Software Foundation (ASF), held once a year. as a prestigious open source feast, is one of the most anticipated conferences in the open source industry.

Since its inception in 1998, ApacheCon has attracted more than 350 technical projects and different communities to participate. It brings together industry experts and teachers from home and abroad to share the latest global technology trends and practices, and discuss "tomorrow's technology" together. Technology enthusiasts can better upgrade their technology stack by seeing the latest trends and developments of each technology frontier.

However, ApacheCon has been held overseas for more than a decade. This year is the organizing committee held an online ApacheCon conference for the Asia-Pacific region: ApacheCon Asia. 140+ topics from China, Japan, India, the United States, etc., , API/microservices, middleware, workflow and data governance, data visualization, observability, and streaming 14 forums including processing, messaging system, Internet of Things and Industrial Internet of Things, integration, open source community/culture, Web Server/Tomcat, etc.

Participate in the Asia Conference August 6-8, 2021, you will receive:

·The latest global technology trends and practice sharing
· Exchange opportunities with 200+ top experts at home and abroad
·3-day event, 140+ topics, free to participate in the whole process

Conference official website: https://www.apachecon.com/acasia2021
Details of the conference agenda: https://apachecon.com/acasia2021/tracks.html

About Big Data Forum

Big Data is one of Apache's most important topics. The big data field is also very lively this year, covering projects including Arrow, Atlas, Bigtop, CarbonData, Cassandra, DolphinScheduler, Doris (incubating), Druid, Flink, Hadoop, HBase, Hive, HUDI, Impla, Kylin, Kyuubi (incubating) , Liminal (incubating), Nemo, Pinot, Pulsar, Spark, YuniKorn (incubating) and other top-level projects or projects that are incubating, as well as the hotter open source projects such as Milvus and openLooKeng. In this 3-day event, everyone can understand the cutting-edge trends of these technologies and the practical experience, principles, architecture analysis and other exciting content from front-line users.

Producer

图片

Because the big data technology is too hot and the agenda is full for 3 days, today we will give you a detailed interpretation of the top technology experts at home and abroad on the first day.

The big data field also specially invited 3 hosts

Big Data agenda highlights on August 6 @ Apache

Extending Impala - Common mistakes and best practices

Sharing guest : Manish Maheshwari
time : 13:30 on August 6
topic introduction:
Apache Impala is a complex engine that requires a comprehensive technical understanding to fully use it. In this talk, we will discuss ingestion best practices to maintain the scalability of Impala deployments, and admission control configurations that provide a consistent experience for end users. We will also conduct a high-level study of Impala's query profile, which is used as the first stop for any performance troubleshooting. In addition, we will discuss the mistakes that users and BI tools often make when interacting with Impala. Finally, we will discuss an ideal configuration to present all of the above in practice.
guest introduction:

Manish Maheshwari
Principal Sales Engineer of Cloudera, with more than 15 years of experience in building super-large data warehouses and analytical solutions. He has extensive experience in Apache Hadoop, DI and BI tools, data mining and forecasting, data modeling, master data and metadata management, and dashboard tools. Proficient in Hadoop, SAS, R, Informatica, Teradata and Qlikview.

How the data platform of DBS [Development Bank of Singapore] uses Apache CarbonData to promote real-time insight and analysis

Sharing guests: Ravindra Pesala / Kumar Vishal
Time: 13:30 on August 6
topic introduction:
DBS Bank (DBS) is a leading bank headquartered in Singapore. The bank already has several megabytes of structured and unstructured data. These data provide important help for the bank to specify its strategy. In 2020, DBS Bank invested in a CarbonData-based data platform to promote real-time analysis and release insights from existing data from various sources. In this lecture, we will introduce how DBS Bank uses Spark and Presto engines to switch from a traditional data warehouse to a data lake based on CarbonData.
guest introduction:

Ravindra Pesala
Senior Vice President of DBS Bank Singapore, Head of Big Data Platform
Apache CarbonData PMC
Leading the big data engineering platform, including ingestion, computing, data access, streaming and metadata.


Kumar Vishal
Apache CarbonData PMC
Senior Big Data Engineer
Processing big data engineering platform, including ingestion, calculation, data access, streaming media

building a distributed, fault-tolerant and scalable analysis stack

Sharing guest: Nishant Bangarwa
Time: August 6th at 14:10
topic introduction:
Up to now, the Apache Druid cluster has more than 50 trillion events, equivalent to more than 500PB of raw data, and it continues to grow. In this lecture, we will introduce the design of a distributed fault-tolerant scalable analysis stack and its challenges, and describe our path to develop Apache Druid into a powerful distributed fault-tolerant scalable analysis data storage.
guest introduction:

Nishant Bangarwa
Co-founder and head of engineering at Rilldata.
Active open source contributors, committers of Apache Druid PMC & Apache Superset PMC, Apache Calcite and Apache Hive.
Before Rilldata, he was a member of Cloudera's data warehouse team and Metamarkets Druid team, responsible for managing large-scale Apache Druid deployments.
Bachelor of Computer Science, Kurukshetra National Institute of Technology, India

How to achieve security in Apache Ozone

Sharing guests: Bharat Viswanadham /
Time: August 6th at 14:10
topic introduction:
Apache Ozone is a scalable, redundant, and distributed Hadoop object storage. It will become Apache's top project in 2020. Apache Ozone has two metadata services, one is Storage Container Manager (SCM), which manages block/container allocation and replication, certificate and node management; the other is OzoneManager, which manages metadata. In this lecture, we will discuss how security is achieved in Ozone.
guest introduction:
Bharat Viswanadham : Software engineering expert with more than 7 years of experience in designing and building scalable and high-performance distributed storage systems. Apache Hadoop and Apache Ozone Committer & PMC.
Banerjee : An expert in distributed storage systems with more than 8 years of experience. Committer & PMC for Apache Hadoop, Apache Ozone and Apache Ratis communities.

openLooKeng heuristic index framework architecture analysis and application practice

Sharing guest: Li
Time: August 6th at 14:50
topic introduction:
With the application and development of big data technology, there are more and more data types, more and more data distribution, and more and more complex query scenarios. This makes data processing difficult or not easy. In order to improve the availability of big data, Huawei launched the openLooKeng data virtualization engine open source project.
openLooKeng provides a unified SQL interface, provides basic interactive query and analysis capabilities, and continues to develop in terms of cross-data center/cloud, data source expansion, performance, reliability, and security to simplify big data. This lecture will focus on the openLooKeng heuristic indexing framework, as well as the serious indexing technology based on the framework and its implementation and application challenges.
guest introduction:

Li
Doctor of Huazhong University of Science and Technology. Joined Huawei in June 2018. At present, he focuses on the performance optimization research of openLooKeng, and is deeply involved in the design and implementation of the big data query and analysis engine architecture.

Kyuubi: NetEase's exploration and practical application of Serverless Spark scenarios

Sharing guest: Yao Qin
Time: August 6th at 14:50
topic introduction:
This topic mainly covers the architecture, implementation principles and application scenarios of NetEase's open source big data component Kyuubi project, and demonstrates through actual cases that Kyuubi helps the business realize Serverless Spark capabilities within NetEase and the corresponding process and thinking. At the same time, introduce how we directly participated in the Spark open source community during this process, and synchronously completed the corresponding problem processing and feature optimization.
guest introduction:

Yao Qin
Lead author of the Apache Kyuubi project
Apache Spark Committer
Apache Submarine Committer
From NetEase Big Data Team

China Merchants Bank cross-data source data analysis

Sharing guest: Wu Yimin
Time: 15:30 on August 6
topic introduction:
China Merchants Bank (CMB) has PB-level data stored in RDBMS, NoSQL database, object storage, big data framework-Apache Hadoop, Spark, Flink, etc. The cost of transferring data from different data sources through the ETL method is very high. Therefore, openLookeng was introduced to connect different data sources and process data locally across data centers and hybrid clouds.
This lecture will give an overview of CMB's data processing engine, which can perform in-situ analysis on geographically remote data sources. And how we use the features of openLookeng, such as high availability, automatic expansion, built-in caching and index support, etc., to meet the reliability of enterprise workload requirements.
guest introduction:

Wu Yimin
China Merchants Bank big data technology expert, 9 years of big data experience in the financial technology field, responsible for the architecture design, implementation and maintenance of China Merchants Bank big data platform. openLookeng PMC.

inside story of Apache Druid's storage and query engine

Sharing guest: Gian Merlino
Time: 15:30 on August 6
topic introduction:
Apache Druid is an open source columnar database known for its large scale and high performance; its largest deployment includes thousands of servers. But regardless of scale, high performance must start from a good foundation. This lecture will explore these basic principles in depth by exploring the inner workings of a single data server. We will introduce how Apache Druid stores data, what compression method is used, how the storage engine is connected to the query processing engine, and how the system handles resource management and multithreading.
guest introduction:

Gian Merlino
Co-founder and CTO of Imply. One of Druid's main committers. Previously led the data ingestion team at Metamarkets and held a senior engineering position at Yahoo. Bachelor of Computer Science from California Institute of Technology.

Speed up big data analysis by using Apache CarbonData index

Sharing guests: AKASH R NILUGAL / KUNAL KAPOOR
Time: 16:10 on August 6
topic introduction:
The data of the 21st century is like the oil of the 18th century: if processed in an intelligent way, it is a huge, untapped valuable asset. The storage and analysis of big data is challenging and expensive in terms of cost and time. Analytics solutions need to constantly adjust themselves to keep up with the challenges of exponential data growth. Apache CarbonData is a unified storage solution + file format, designed to optimize query performance, thereby reducing analysis costs. Apache CarbonData has been adopted by more than 100 open source users. In the database, the index is one of the main functions, it can basically help the query without scanning every row. Inspired by this concept, Apache CarbonData supports custom indexes such as min/max, Bloom, Lucene, secondary indexes and materialized views to speed up row-level updates, deletes, OLAP and point queries. This presentation emphasized CarbonData's custom index architecture and distributed index cache server, which helps provide faster query results, as well as future challenges and scope.
guest introduction:

Akash R Nilugal
Apache Carbondata PMC & Committer
Senior technical leader of the cloud and AI/data platform team at Huawei Banglore Research Center.
5 years of experience in big data, interested in big data index support, materialized views, big data CDC, Spark SQL query optimization, Spark structured streaming, data lake and data warehouse functions.

Kunal Kapoor
Apache Carbondata PMC & Committer, Huawei Banglore Research Center cloud and AI/data platform team system architect, mainly responsible for including distributed index cache server, Hive + Carbondata integration, pre-aggregation support, S3 support for Carbondata, Carbondata's secondary index , Spark SQL query optimization in Carbondata.

JAVA-based big data machine learning solution

Sharing guest: Lan Qing
Time: 16:10 on August 6
topic introduction:
The success of machine learning (ML) applications depends on the use of big data. Most big data is provided in an unstructured format. The availability of big data can also be offline and online. Although there are also options for ML tasks in Python, it is quite challenging to integrate Python applications into existing Java/Scala-based big data pipelines. In addition, in Java/Scala, there are few options to bridge the gap between processing big data and using the same library for ML workloads.
In order to solve the above problems, we will use the machine learning framework DJL in Java to demonstrate the big data ML solution in Java. DJL provides a variety of ML engines, including TensorFlow, PyTorch, Apache MXNet (incubating). PaddlePaddle, ONNXRuntime, etc. By using Apache Flink and Apache Spark, users can easily build their online/offline ML pipelines. At the end of the meeting, the audience will be able to build an easy-to-use, high-performance ML pipeline for all different scenarios.
guest introduction:

Lanqing
Amazon AWS machine learning platform software development engineer, deep in the application architecture of big data and machine learning in the production environment.
One of the co-authors of DJL (djl.ai)
Apache MXNet PPMC
Master of Computer Engineering, Columbia University

Insight into the secrets of the open source community-best practices for data-driven community operations

Guest sharing: Zhong Jun/ Jiang Yikun/ Peng Lei
Time: 16:50 on August 6
topic introduction:
In the evaluation process of the open source community, data-driven insights and analysis of the status quo of the community are very meaningful to help the healthy growth of the community. Therefore, data-driven operations have played a key role in the community. In this topic, we will introduce best practices in data-driven community operations. This operation management system helps several of the most active open source communities in China (such as openEuler, openGauss, openLooKeng, MindSpore, etc.) to efficiently and scientifically measure the health, activity and other key indicators of the community. This topic will also combine real cases in the openEuler community to describe how the data-driven operating system is implemented, and how to use the powerful Apache big data project to build the first usable version (including data storage, analysis, data insight and visualization), And the improvement plan we contributed to the Apache upstream project.
guest introduction:

clock Jun
Participated in the work of the open source community for more than 6 years. Responsible for the digital operation system of openEuler, MindSpore, openGauss and openLooKeng projects. Served as a core contributor to multiple communities, such as the maintainer of the openEuler open source community infra sig team, the maintainer of the openGauss open source community infra sig team, and the core member of the OpenStack manila project.


Jiang
A senior software engineer from Huawei's open source development team. He has participated in the open source community for more than 5 years and is committed to the multi-architecture support and improvement of projects in the big data field. Five years of experience in cloud computing and big data optimization. Before that, he was the Committer of the OpenStack storage project.


Peng Lei
A senior software engineer in the Huawei open source development team, engaged in MySQL multi-architecture support and improvement. Five years of experience in SQL development and big data use. He has studied the kernel of MySQL, including MySQL group replication, and engaged in the development of the kernel of distributed databases. Two years of experience in using big data projects, such as Spark/Kafka/Hadoop.

Apache HUDI on AWS

Sharing guest: Fei Lianghong
Time: 16:50 on August 6
topic introduction:
Introduce Apache Hudi on AWS, including introduction to Apache Hudi, common use cases, Hudi storage types, writing Hudi data sets, querying Hudi data sets and some tips.
guest introduction:

Fei Lianghong
Amazon Web Services AWS Chief Developer Evangelist
Use my 20 years of experience to support innovation and help start-ups and companies turn their ideas into reality. Focus on software development and cloud native architecture, as well as the technical and business impact of machine learning and data analysis. Before joining AWS, he worked at Apple and Microsoft. Some interests include artificial intelligence, data science and photography.
The above is the wonderful sharing on the first day of the Asia Conference Big Data Forum, so stay tuned for the big names on the second and third days!

Seeing this, what are you still hesitating about, hurry up and sign up!

ways of registration

ApacheCon Asia 2021
August
14 forums, 100+ technical projects
140+ topic speeches
Online dialogue with global technology
3 days all-weather exchange event
for free throughout
ApacheCon First Asian Online Conference
August 6-8, 2021

Looking forward to the arrival of friends!

Click [ here ] to register : https://hopin.com/events/apachecon-asia-2021


snakesss
1.1k 声望244 粉丝

SegmentFault 思否编辑,欢迎投稿优质技术资讯!