6

Canal

Positioning: Provide incremental data subscription & consumption based on database incremental log analysis, currently mainly supports mysql.

principle:

  • Canal simulates the interactive protocol of mysql slave, pretends to be mysql slave, and sends dump protocol to mysql master
  • The mysql master receives the dump request and starts to push the binary log to the slave (that is, canal)
  • Canal parses the binary log object (originally a byte stream)

图片

图片

The entire parser process can be roughly divided into several steps:

  • Connection gets the location of the last successful analysis (if it is started for the first time, get the initial location or the binlog location of the current database)
  • Connection establishes a connection, and a BINLOG\_DUMP command occurs
  • Mysql started to push Binary Log
  • The received Binary Log is analyzed by Binlog parser to supplement some specific information
  • Passing to the EventSink module for data storage is a blocking operation until the storage is successful
  • After the storage is successful, the Binary Log location will be recorded regularly

图片

  • Data filtering: support wildcard filtering mode, table name, field content, etc.
  • Data routing/distribution: solve 1:n (1 parser corresponds to multiple stores)
  • Data merging: solve n:1 (multiple parser corresponds to 1 store)
  • Data processing: perform additional processing before entering the store, such as join

Maxwell

图片

Canal is developed by Java and is divided into server and client. It has many derivative applications with stable performance and powerful functions; canal needs to write its own client to consume the data parsed by canal.

The advantage of maxwell over canal is that it is simple to use. It directly outputs data changes as a json string without writing a client.

Databus

Databus is a low-latency change capture system that has become an integral part of LinkedIn's data processing pipeline. Databus addresses the basic requirements for reliable capture, flow and processing of major data changes. Databus provides the following functions:

  • Isolation between source and consumer
  • Guarantee high availability in order and at least one delivery
  • Start to consume from any point in the change stream, including the full guidance function of the entire data.
  • Partition consumption
  • Source consistency preservation

图片

Alibaba Cloud's data transmission service DTS

Data Transmission Service (DTS) is a data flow service provided by Alibaba Cloud that supports data interaction among various data sources such as RDBMS (relational database), NoSQL, and OLAP. DTS provides a variety of data transmission capabilities such as data migration, real-time data subscription, and real-time data synchronization, which can realize non-stop data migration, data remote disaster recovery, remote multiple activities (unitization), cross-border data synchronization, real-time data warehouse, Various business application scenarios such as query report distribution, cache update, asynchronous message notification, etc., help you build a highly secure, scalable, and highly available data architecture.

Advantage

Data Transmission Service DTS supports data transmission between RDBMS, NoSQL, OLAP and other data sources. It provides multiple data transmission methods such as data migration, real-time data subscription, and real-time data synchronization. Compared with third-party data flow tools, the data transmission service DTS provides more diverse, high-performance, high-security and reliable transmission links. At the same time, it provides many convenient functions, which greatly facilitates the creation and management of transmission links.

Personal understanding: it is a message queue, which will push the sql objects it has wrapped, and you can make a service yourself to parse these sql objects.

Eliminate the expensive use cost of deployment and maintenance. DTS is adapted to Alibaba Cloud RDS (online relational database), DRDS and other products, and solves the problem of subscription high availability in scenarios such as Binlog log recovery, active/standby switching, and VPC network switching. At the same time, targeted performance optimizations have been made for RDS. For stability, performance and cost considerations, it is recommended.

Original: https://blog.csdn.net/weixin_38071106/article/details/88547660


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer