This article was first published on Nebula Graph Community public number
Summer of Open Source
Open Source Software Supply Chain Lighting Plan-Summer 2021 (hereinafter referred to as "Open Source Summer") is a summer event for college students jointly organized by the Institute of Software of the Chinese Academy of Sciences and the openEuler community, aiming to encourage students to actively participate in open source software The development and maintenance of, promote the vigorous development of excellent open source software community. The Chinese Academy of Sciences has united with major domestic open source communities including Nebula Graph to provide projects for the development and maintenance of important open source software, and open registration to college students around the world. After the students freely choose the project, they communicate with the community mentor to realize the plan and write the project plan. The selected students will complete the development work as planned under the guidance of the community mentor and contribute the results to the community. According to the difficulty and completion of the project, participants will receive project bonuses ranging from 6,000 to 12,000 issued by the organizer.
Official website of the event: https://summer.iscas.ac.cn/
This issue shares the project experience of Zheng Dongyang from the Nebula Graph community (the graph database Nebula Graph supports the JDBC protocol).
Project information
Project name: Nebula Graph supports JDBC protocol
Project details
Let Nebula Graph connect to the JDBC protocol, implement Nebula JDBC driver, and implement JDBC related interfaces. Requirements: Users can directly use the JDBC driver to operate the Nebula service, and the project repo has unit tests that run automatically.
Introduction to Nebula Graph
A reliable distributed graph database with linear expansion and high performance; the only graph database solution in the world that can accommodate hundreds of billions of vertices and trillions of edges, and provides millisecond query latency. Features of Nebula Graph
- Open source: Committed to cooperating with the community to popularize and promote the development of graph databases;
- Security: With role-based permission control, only authorization can be accessed;
- Scalability: Supports various peripheral ecological tools such as Spark, Hadoop, GraphX, Plato, etc.;
- High performance: Nebula Graph can still achieve low-latency read and write while maintaining high throughput;
- Expansion: Nebula Graph supports linear expansion based on the shared-nothing distributed architecture;
- Compatible with openCypher: gradually compatible with openCypher9, Cypher users can easily get started with Nebula Graph;
- High availability: Support multiple ways to recover abnormal data to ensure high availability of services in case of partial failure;
- Stable release: After the first-line Internet companies, such as JD.com, Meituan, and Xiaohongshu, they have been tested in the production environment.
Nebula Graph has an active community and timely technical support. This is the official website: https://nebula-graph.com.cn and GitHub repository: https://github.com/vesoft-inc/nebula , welcome Pay attention to and use Nebula Graph, and become the Contributor of Nebula Graph together, contributing to the development of graph database! ! !
Project landing
Program description
Learn about Nebula Graph related functions in the early stage and master its basic usage; investigate the driver development of JDBC, read the JDBC specification documents, and understand some interfaces that need to be implemented; in the mid-term refer to Neo4j's neo4j-jdbc: https://github.com/neo4j-contrib /neo4j-jdbc implementation, clone nebula-java: https://github.com/vesoft-inc/nebula-java project, learn the source code, understand the main logic and code style of the project code; use the existing wheels later nebula-java: https://github.com/vesoft-inc/nebula-java implements communication with the database, write code for Nebula Graph to implement JDBC related interfaces, and write unit tests.
Implementation description
The idea of this project is very clear: implements a series of interfaces in the JDBC specification (mainly in the java.sql package) to implement the methods in the interface. All classes in the JDBC specification add up to hundreds of methods that need to be implemented. The main database of JDBC is the traditional relational database (RDB), and Nebula Graph, as a new generation of graph database, does not have as complete functions as the developed relational database, but it is better than the relational database. There are many new features, so the methods in the JDBC specification are both redundant (no need to be implemented) and insufficient for Nebula Graph. (Need to be implemented but not defined in the relevant interface)
In the specific implementation, define some abstract classes directly implements the main interface in the specification, and then define some important methods in the specific implementation class to implement the interface, so that the methods in the implementation class will not appear very complicated when reading It's messy. For the methods that need to be implemented in the interface:
for( method : 接口的方法 ){
if(method BELONG_TO 不需要具体实现的方法){
// 比如 Statement::getGeneratedKeys()
在该抽象类中 Override,方法体中抛出一个SQLFeatureNotSupportedException;
}else if(method BELONG_TO 需要实现但是不是核心方法){
// 比如 Statement::isClosed()
在该抽象类中 Override;
}else if(method BELONG_TO 需要实现且是核心方法){
// 比如 Statement::execute(String nGql)
在具体实现类中 Override
}else if(method BELONG_TO 在接口中没有定义但是需要实现){
// 比如 NebulaResult::getNode getEdge getPath (点,边,路径是图数据库特有概念)
在具体实现类中实现
}
}
Some of the main implementations and extends relationships in the project are as follows: (The blue solid line is the extends relationship between classes, the green solid line is the implements relationship between interfaces, and the green dashed line is the implements relationship between abstract classes and interfaces).
Work flow and analysis of main methods in the class:
// 用户首先通过 NebulaDriver 注册驱动,其中有 NebulaPool 属性,用于获取 Session 与数据库通信
// NebulaDriver 中提供两个构造函数,无参构造函数配置默认的 NebulaPool,接收一个 Properties 类型参数的构造函数可以自定义 NebulaPool 配置
public NebulaDriver() throws SQLException {
this.setDefaultPoolProperties();
this.initNebulaPool();
// 将自身注册到 DriverManager
DriverManager.registerDriver(this);
}
public NebulaDriver(Properties poolProperties) throws SQLException {
this.poolProperties = poolProperties;
this.initNebulaPool();
// 将自身注册到 DriverManager
DriverManager.registerDriver(this);
}
// 注册驱动后用户可以 DriverManager::getConnection(String url) 获取连接。在 NebulaConnection 的构造函数中会通过 NebulaDriver 中的 NebulaPool 获取 Session 接着连接访问在 url 中指定的图空间
// 获取到 Connection 后用户可以用 Connection::createStatement 和 Connection::prepareStatement 拿到 Statement 或者 PreparedStatement 对象,调用其中的 execute 方法向数据库发送命令,数据库执行此命令后的结果会封装在 NebulaResult 中,再调用其中各种获取数据的方法可以得到不同数据类型的数据
// 目前 NebulaResult 中实现的获取数据方法有以下这些,Nebula Graph 中不同的数据类型都有对应实现
public String getString();
public int getInt();
public long getLong();
public boolean getBoolean();
public double getDouble();
public java.sql.Date getDate();
public java.sql.Time getTime();
public DateTimeWrapper getDateTime();
public Node getNode();
public Relationship getEdge();
public PathWrapper getPath();
public List getList();
public Set getSet();
public Map getMap();
project progress
Work done
- Deploy Nebula Graph and master its basic usage;
- Read JDBC: https://download.oracle.com/otn-pub/jcp/jdbc-4_2-mrel2-spec/jdbc4.2-fr-spec.pdf?AuthParam=1628844546_d5f078af230e42dcbe0ba3d183af7495 specification document, clear implementation requirements;
- Learn nebula-java: https://github.com/vesoft-inc/nebula-java source code;
Complete the following implementations:
Problems encountered and solutions
How to communicate with the database problem :
I didn’t know how to communicate with the database in the early stage of the project. After the realization of neo4j-jdbc of the research friend Neo4j, the Http framework was used to communicate with the database through the Nebula Graph API (roughly); after the completion, I contacted my instructor to ask if the idea was Feasible, the instructor told me that I can use the existing wheel nebula-java to communicate with Nebula Graph via rpc.
The data statistics adopt the method of calculating composite indicators, and calculate the scores of each enterprise in the four dimensions of enterprise scale, social influence, development potential and social responsibility, and determine the ranking after weighted average.
Questions about obtaining Connection :
Some parameters in the NebulaPoolConfig class are configurable. My idea is to configure in the form specified in the connection string, such as: "jdbc:nebula://ip:port/graphSpace?maxConnsSize=10&reconnect=true".
After consulting the instructor, the instructor suggested that when the user can get the connection, two interfaces are supported, one is to use the default configuration, and the other is to allow the user to specify the configuration, such as:
// default configuration
DriverManager.getConnection(url, username, password);
// customized configuration
DriverManager.getConnection(url, config);
Questions about PreparedStatement :
Relational databases support query statement pre-compilation. PreparedStatement can send SQL to DBMS to pre-compile and then pass parameters, which improves performance and prevents SQL injection attacks; currently Nebula Graph does not have this function, so it is parsed locally in nGQL Fill in the parameters, which is essentially the same as Statement.
nebula-java version issue :
At the beginning of the 2.0.0 version of the dependency introduced in the project, in a query, it was found that the path return result was inconsistent with the console return result. After consulting the instructor, it was found that this was a bug in this version, and the latest 2.0.0- SNAPSHOT version.
updateCount issue :
Some methods in the JDBC interface require that the return value be updateCount, the amount of data affected by this method, but currently the server does not have updateCount statistics returned to the user. If the user inserts multiple points or multiple edges into an insert statement at the same time, there may be partial success, but the server will only return to tell the user that it failed, but in fact, the user may be able to find part of the data. The updateCount is returned as 0, and then a comment is added to the interface to indicate that it is not supported.
NebulaPool initialization problem :
In the beginning, I initialized NebulaPool when I initialized NebulaConnection and then got the Session, and I confused the configuration of NebulaPool and the configuration of Session. In this case, the user will reinitialize the NebulaPool every time the connection is obtained, which is unreasonable. I submitted the code to the Gitlab instructor review and pointed out my error. I suggested that I move the initialization and shutdown of the NebulaPool to the NebulaDriver, and then improve the default configuration and There are two ways to initialize NebulaPool in custom configuration.
Follow-up work arrangement
- Complete the methods that should be implemented but not implemented in the interface;
- Improve code comments;
- Complete the unit test;
- Write instructions for use.
Thanks
This event promoted the development of open source software and the construction of excellent open source software communities, increased the activity of open source projects, and promoted the development of open source ecology; thanks to the organizer of @open source for providing the platform and opportunities for this event.
Thank you mentor @laura.ding for carefully reviewing my PR code during this process, and giving me detailed guidance to let me know my shortcomings; thank you @Nebula Graph, the operating lady, for sending me to my community surroundings, LUCKY!
This article is an original article by Zheng Dongyang.
The Shenzhen Meetup activity is underway. If you want to come and have a face-to-face communication with the Nebula technical team this Saturday, remember to click the link below to sign up~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。