This week, the first RISC-V China Summit was held at ShanghaiTech University. This is the first time that RISC-V has held a summit of the same scale outside of North America. At this conference, Bao Yungang, a professor at the University of Chinese Academy of Sciences and a researcher at the Institute of Computing Technology of the Chinese Academy of Sciences, announced the domestic open source high-performance RISC-V processor core- Xiangshan . The core is named after the "lake" and the architecture code is called the first generation. The "Yanqi Lake" and "Yanqi Lake" RTL codes were completed in April this year, and are scheduled to be taped out in July based on TSMC's 28nm process. The second-generation architecture is called "Nanhu" and will use SMIC's 14nm process. Beijing Microcore participated in the first phase of the design work. At present, the team is recruiting a joint development partner for the second phase of the Xiangshan processor. The participating companies already include ByteDance.
| Bao , graduated from Nanjing University in 2003, received a Ph.D. from the Institute of Computing Technology, Chinese Academy of Sciences in 2008, and a postdoctoral fellow at Princeton University from 2010 to 2012. He is currently a researcher, assistant director, advanced computer system research at the Institute of Computing Technology Director of the center, post professor at the University of Chinese Academy of Sciences, doctoral supervisor, and secretary general of the China Open Command Ecology (RISC-V) Alliance.
Regarding the report on Xiangshan on the afternoon of the 22nd, it was a bit regretful that everyone could not hear the complete report because of a technical failure in Zoom live. After thinking about it, I will post the report PPT directly, plus some of our considerations and ideas in the development process of Xiangshan, and share with you.
This report mainly answers four questions:
- 1. Why do you want to make Xiangshan?
- 2. What is the level of Xiangshan?
- 3. How does Xiangshan do it?
- 4. How will Xiangshan develop in the future?
1. Why do you want to make Xiangshan?
RISC-V was born in 2010, and it has been 11 years. Today, there are hundreds of commercial or open source RISC-V processor cores registered on the RISC-V International Foundation website (link below). an open source high-performance RISC-V core?
RISC-V Exchange: Cores & SoCs - RISC-V International
Regarding this issue, we have communicated with many industry companies, and have done a lot of research and analysis, which makes us judge that the industry needs an open source high-performance RISC-V core. On the other hand, we are also thinking about a question- Why does the CPU field not have an open source mainline like Linux? Open source Linux was born in 1991, and it is exactly 30 years today. Nowadays, Linux is not only widely used in industry, but also an innovative platform for academia to carry out operating system research.
RISC-V is an open and open source instruction set. allows anyone in the world to implement a RISC-V processor for free. It can be commercial or open source. This is the biggest difference compared to the company's proprietary X86/ARM instruction set. One. However, ten years later, it has not yet formed an open source mainline like Linux. Berkeley's BOOM goal is a high-performance open source RISC-V core, but the BOOM code repository is relatively closed. The official recommendation is that others should communicate with them in advance to ensure that they do not conflict with their plans. According to the official GitHub statistics page, has only 8 people who have submitted more than 100 lines of code modification for BOOM since January 2014. It can be seen that because of BOOM’s strict external contribution policy, the open source community’s participation in BOOM is not high.
Therefore, Dr. Tang Dan of the team and I have always believed that build an open source RISC-V core line like Linux, which can be widely used in the industry and support the academia to experiment with innovative ideas. The most important thing is to let it survive at least 30 years like Linux!
Thus, "Xiangshan" was born.
We have done more than a year of preparatory work-apply for funding, start the "Life One Core" program, cultivate talents, build a team, find partners... During this period, we have received too many support and help from too many people: Academician Sun Ninghui helped us find funding in many places. The National University of Science and Technology fully supports the "One Core for Life" plan. The Pengcheng Laboratory supports us in establishing a back-end physical design team. Many old friends of the computing institute decided to participate in the open source mainline and so on. Not listed one by one.
Finally, Xiangshan officially launched- On June 11, 2020, Xiangshan established a code repository on GitHub.
In a short period of one year, 25 students and teachers participated in the development of Xiangshan. 821 main branch code mergers, 3296 code submissions (commits), more than 50,000 lines of code, and more than 400 documents record the growth process of Xiangshan. Our philosophy is to code, open process, and open documents. During this period, some companies directly participated in the development, and some companies expressed their intention to participate, all because they agreed with the concept of open source and were willing to work together to build an open source Xiangshan. These positive feedbacks from the industry have given us great encouragement and confidence. allows us to practice the "research and heavy industry model" more firmly.
"Scientific research and heavy industry model", in January 2020, I wrote an "Inspiration from Berkeley Scientific Research Model" for the "Communications of the Chinese Computer Society (CCCF)":
Yuan Lanfeng: CCCF Intro: Inspiration from Berkeley's Research Model | Bao Yungang
Looking back at Berkeley’s scientific research process, we can find that they have developed a large number of prototype systems in the past few decades, which not only promoted technological progress and even subverted the industry, but also cultivated generations of outstanding talents (many of whom won the Turing Award): CALDIC in the 1950s System (Doug Englebart), Project Genie system (Butler Lampson and Chuck Thacker) in the 1960s, BSD Unix operating system and INGRES database system in the 1970s (Michael Stonebraker), RISC processor in the 1980s (David Patterson), RAID storage system in the 1990s and NOW cluster system... If sums up Berkeley's scientific research model in one sentence, then it is - keen to develop prototype systems that can really change the status quo, even if it requires a lot of engineering investment. Sun Ninghui, director of , called it 160d554d2cc14d "Scientific Research and Heavy Industry Model".
"Scientific research and heavy industry model", we don't want to talk about it on paper, we have to use actions to practice.
2. What is the level of Xiangshan?
Xiangshan is an open source RISC-V processor core, whose architecture code is named after the lake. The first version of the architecture is codenamed "Yanqi Lake" . This is the name given by the students who have a strong plot of the National University of Science and Technology, because they have spent a year in Yanqi Lake, Huairou. The "Yanqi Lake" RTL code was completed in April 2021, and it is planned to be taped out in July based on the TSMC 28nm process. The current frequency is 1.3GHz.
code name of the second edition of , which is a tribute to the 100th anniversary of the founding of the party. "Nanhu" plans to tape out by the end of this year and will use SMIC's 14nm process with a target frequency of 2GHz.
What open source license does Xiangshan choose? This problem entangled us for a long time. Later, we specifically consulted Professor Minghui Zhou of Peking University, and the friends formulated 4 open source license schemes. After repeated comparisons and trade-offs, the solution in the following table was finally selected ①—— Mulan Loose License (MulanPSLv2) . Here, I am very grateful to Peking University Teacher Minghui Zhou for his professional guidance!
Comparison of open source license schemes (Xu Yi is difficult to organize)
The "Yanqi Lake" architecture is an out-of-order processor core with 11 stages of pipeline, 6 launches, and 4 memory access components. **It is comparable to some ARM high-end processor cores in launch width, but it has not been fully optimized, so there is still a big gap in actual performance. We hope that in the future, through continuous iterative optimization ("Nanhu"-->"X Lake"-->"Y Lake"-->...), the reach the level of ARM A76.
We on GitHub CI build a set of process-oriented automated regression testing framework , and for six months increased load test in the past, from cputest, risc-tests to Linux, to SPECCPU workload. This set of automatic regression testing framework guarantees and verifies the correctness of the chip.
Every big project always has some exciting moments. This 30-second short video records the moment when Xiangshan starts Linux/Debian on FPGA, which is a little bit happy.
Video link: Xiangshan starts Linux/Debian on FPGA
3. How does Xiangshan do it?
The early stage of Xiangshan development was very fast: established the code warehouse on June 11, and the out-of-order pipeline was completed on July 6, and CoreMark was able to run correctly in less than a month; on September 12, Linux started correctly; 10 On the 22nd, Debian started correctly.
Next is the structural optimization, performance tuning, and timing optimization for most of the year. The Xiangshan architecture is almost equivalent to a reconstruction. a typical example. Xiangshan's first version of the branch predictor (BPU) refers to BOOM's BPU, but the back-end evaluation frequency can only reach 800MHz (TSMC 28nm). So Gou Lingrui, who was in charge of BPU design, continuously optimized the BPU structure under the guidance of several teachers, and finally increased the frequency to 1.4GHz.
During this period, the friends have done their own work, developed a variety of optimization and debugging tools, which greatly accelerated the optimization and verification process. This makes me really admire these post-90s-they are really , from work to life, and one of the main driving forces is "saving (tou) time (lan)". example, I would rather write a program to automatically order takeaway, and I don't want to open the phone to read the menu.
There are at least two important decisions in the development of Xiangshan. The is to choose the agile design language Chisel . Many people questioned Chisel and rejected Chisel, but after a thorough evaluation, we decided to use Chisel.
Our team started using Chisel in 2016, and the team was full of doubts at the beginning. In 2018, we designed two sets of quantitative comparative experiments. We asked two students to use Chisel and one engineer to use Verilog to design a L2 Cache module. Through a series of quantitative comparisons, three conclusions are drawn as follows:
- Chisel development efficiency is much higher than Verilog;
- achieves the same function, the Chisel code is only 1/5 of Verilog (so the 50,000 lines of Chisel code in Xiangshan is equivalent to 250,000 lines of Verilog code);
- Chisel's development quality is not worse than Verilog.
Later, the experimental results were published in "Computer Research and Development" in January 2019. I recently went to Huawei for exchanges and learned that these comparison results also promoted the establishment of a Chisel development team within Huawei. Huawei is now also a supporter of Chisel.
In 2020, we have completed an 8-core tagging RISC-V processor with which is based on the Rocket processor core for tagging architecture transformation, using TSMC 28nm process. Although due to time constraints, no detailed back-end optimization was performed, but the chip can still run normally at 1.2GHz after returning. This is an 8-core SoC chip with a certain degree of complexity, but Chisel can handle it. Therefore, we believe that Chisel can be used to develop complex chips.
In the process of developing Xiangshan, our team has accumulated a wealth of Chisel development experience . The friends (Xu Yinan, Wang Kaifan, Lin Jiawei, Yu Zihao, Jin Yue) have prepared 6 reports, which will be shared with you at the CCC Workshop on June 25.
Another important decision is that attaches great importance to building processes and tools that support agile design.
In the process of developing Xiangshan, has been emphasizing the importance of process, platform and infrastructure . I played more of the role of the cheerleader, and the friends really implemented the concept into specific actions.
In order to better support the development and debugging of Chisel, to capture, reproduce and locate bugs more quickly, and to more accurately evaluate the performance benefits of the optimization technology, friends have developed more than ten unique tools. These tools support a set of processes for agile development of processor chips. Of course, this set of processes is still relatively rudimentary and not yet systematic. We also look forward to more open source developers joining to improve this agile design process.
Here are a few examples of tools. NEMU is a teaching simulator developed by Yu Zihao when he was undergraduate at Nanjing University. During the Ph.D. study period in computing, he has been continuously improving and optimizing NEMU with one person, making NEMU a high-performance interpreter with an efficiency close to QEMU—starting Debian is even 18.2% faster than QEMU (9.87s vs. . 12.07s).
What's more important is that NEMU is an instruction interpreter, can dynamically analyze each instruction; contrast, QEMU's translation granularity is a basic block and cannot track every instruction. In fact, NEMU's instruction interpreter mechanism has become the basis of Difftest, a correctness verification framework developed by Xiangshan. (Yu Zihao will introduce NEMU on the afternoon of June 23)
Cache is a very core module in the processor, especially the Cache that supports the consistency protocol is more complicated. To this end, our friends developed a set of Cache module testing framework Agent Faker that specifically verifies that they support the TileLink conformance protocol, and found several Cache module bugs. (Legend of Zhang will introduce this work in the morning of June 25)
Difftest is an online differential verification framework based on NEMU instruction set. One end of it is an emulator, which provides the gold standard for processor execution; the other end is an emulator running RTL. During the simulation process, information such as the number of instructions, interrupts, MMIO, and microstructure status are sent to NEMU for comparison. Determine the correctness of the RTL implementation.
Difftest was first implemented by Yu Zihao, and later optimized by One of the most important improvements of 160d554d2ccb54 is SMP-Difftest, which supports multi-core SMP system-wide simulation, and supports Cache consistency, memory consistency and other issues that require software and hardware coordination. (Wang Kaifan will introduce Difftest in the afternoon of June 24)
How to quickly capture, reproduce, and locate bugs is a very critical step in the debugging process, and a lot of time is consumed at this stage. The friends proposed an innovative lightweight simulation snapshot technology-the entire simulation program is regarded as a process, and the fork mechanism is used to create the child process . Then the parent process continues to execute, and the child process is suspended. When an error occurs in the parent process, you can restore to the child process for debugging. Compared with the Savable mechanism that comes with the Verilator simulator, the LightSSS mechanism single snapshot time of 160d554d2ccbc9 by nearly 7000 times! (Yu Zihao will introduce LightSSS in the afternoon of June 23)
Many people question the inconvenience of Chisel for debugging. Small partners are fully advantage can customize Firrtl Transform features Chisel, and designed a new type of hardware stack agile debugging, debugging can be converted based waveform based debug events. We have designed a set of tools that can directly extract new high-level semantics from the waveform and visualize it. To this end, a Xiang language is specially designed. (Lin Jiawei will introduce this work in the afternoon of June 23)
The most important part of processor performance optimization is to quickly and accurately evaluate the performance benefits brought by optimization techniques. If the evaluation process takes several days, it will seriously affect the efficiency of iterative optimization. friends 160d554d2ccca5 designed an agile performance evaluation framework BetaPoint, which uses three mechanisms-Sampling mechanism, Generic Full System Checkpoint mechanism and Functional Warmup mechanism to realize that the SPEC score of the processor can be estimated within 10 hours. (Zhou Yaoyang will introduce BetaPoint in the evening of June 23)
The entire Xiangshan development team will share 22 technical reports with you at this summit. These reports are all post-90s, many of them are post- : 160d554d2cccf6 Gou Lingrui, Hu Bohan, Jin Yue, Li Xin, Liu Zhigang, Lin Jiawei, Wang Huaqiang, Wang Yinzhe, Wang Kaifan, Xu Yinan, Yu Zihao, Zhang Legend , Zhang Fawang, Zhang Linjun, Zhang Zifei, Zhang Ziyue, Zhou Yaoyang, Zhou Yike, Zou ; In addition, there are many students who participated in the development of Xiangshan who did not submit their papers this time. These little friends have made irreplaceable contributions in the development process of Xiangshan.
4. How will Xiangshan develop in the future?
At present, Xiangshan is developing the next-generation architecture "South Lake". The goal is to by the end of this year. 160d554d2ccdeb is based on SMIC's 14nm process frequency to 2GHz, and SPECCPU score reaches 10 points/GHz . This is a very challenging goal and requires substantial optimization and improvement of the architecture.
A few days ago, the friends made a special trip to Jiaxing South Lake to discuss the future development of Xiangshan. In addition to technology, we once again focus on processes and platforms. The agile design process and platform previously built supports a development team of more than 20 people, which is far from enough. What we need to consider now is builds a set of open source, open, and standardized open processes that can support the development of an open source community of 2,000 people.
Support thousands of people to develop open source software together, which already has successful experience. But supports thousands of people to develop open source processors together, there is currently no case that can be referred to, we can only rely on our own exploration. Also look forward to experts from all walks of life to give us more guidance and suggestions.
We have a wish-we hope that "Fragrant Mountain" can survive for 30 years; we have an agreement-to get together again in 30 years, will then see what Xiangshan will become. However, to realize this desire, there are still many problems and challenges that need to be resolved.
Sincerely look forward to more partners joining the development team of Xiangshan!
the support of Chinese Academy of Sciences Institute of Computing Technology and Pengcheng Laboratory 160d554d2ccf1f, Xiangshan developed an open source high-performance RISC-V processor core China Open Instruction Ecology (RISC-V) Alliance Beijing Zhiyuan Artificial Intelligence Research Institute supported. Here, I would like to thank the senior experts of Beijing Microcore Corporation for their strong support in the development of Xiangshan. They very much agree with the concept of open source and are the first company to jointly develop with Xiangshan. I am very happy that there are more partners in the development of the "South Lake" architecture. Thank you for your support to Xiangshan.
Welcome to contact us and join the Xiangshan open source community!
V. Highlights
- a book about Xiangshan . Many people think that CPU design is difficult and very high. We hope to lower this threshold-so we plan to publish a book similar to Berkeley's "TCP/IP Protocol Stack Detailed Explanation" and "Linux Kernel Source Code" by Teacher Mao Decao. Books like "Scene Analysis" combine the source code of Xiangshan to analyze CPU design details and know-how. This book can also be written by the community in an open source way. Interested friends are welcome to participate.
- "One core for a lifetime" program returns . Five students (Jin Yue, Wang Huaqiang, Wang Kaifan, Zhang Linjun, Zhang Zifei) of the first phase of the "Lifetime One Core" program joined Xiangshan Development as soon as they graduated from their undergraduate programs and became technical backbones. Many people ask when they graduate, and they may have to wait a little longer. They are only studying for one level now, and they will have to wait another two years before any classmates graduate.
- in the epidemic. At the beginning of June 2020, the new crown epidemic suddenly appeared in Xinfadi, Beijing, and the students were unable to return to Beijing. Thanks to the strong support of Shenzhen Pengcheng Laboratory, the entire team was concentrated in Shenzhen, where the closed development of the Pengcheng Laboratory was concentrated for three months. During that period, there were basically more than 150 commits per week, which turned out to be the most efficient time period for development.
- Xiangshan’s Logo . We tried many Xiangshan logo designs, and finally we voted for the solution in the lower left corner. However, a friend kindly told Xiangshan that the red leaves are mainly Sumac leaves. Fortunately, Xiangshan also has five-cornered maple, so I will decide this logo for now.
Last few sentences
Fortunately, Mr. Qi (Qi Xiaoning) of Alibaba served as the chairman of the joint conference of the first RISC-V China Summit, but behind the scenes are the teachers from the Software Institute of the Chinese Academy of Sciences and the Shanghai University of Science and Technology. Hard work.
Due to the great uncertainty brought about by the epidemic, the organizing committee has always maintained a high degree of tension: at the beginning, only about 1,500 offline participants were opened, but they soon filled up; later, they opened up two re-reporting opportunities. , But only 200 places were opened each time, and the final total was 2,600 (some signed up for the main conference and the branch at the same time).
But even so, in the end, because of the epidemic in Guangdong, we had to arrange for everyone to participate in the conference remotely. Here, I apologize to the friends who failed to register, and thank the friends who registered but chose to give up attending the meeting because of the epidemic. However, this summit provided 4 live broadcast channels, with 101 reports being broadcast live throughout, and there will be video replays in the follow-up. I'm sorry that some technical problems were encountered during the live broadcast (some errors occurred in the afternoon after debugging in the morning), I'm sorry again.
Special thanks to teachers Wu Wei and Wu Yanjun from the Software Institute of the Chinese Academy of Sciences. They worked hard to prepare for this summit, and they really paid too much. Thanks to Zhou Pingqiang, Dean of the School of Information, ShanghaiTech University, for coordinating local resources in Shanghai and fully supporting and ensuring the smooth convening of the summit. I also want to thank all the members and volunteers of the preparatory team who silently contributed to the summit!
Everyone came together because of RISC-V, precisely because the open source, openness, sharing, and co-governance brought by RISC-V are everyone's consensus, and also because RISC-V brings us unlimited imagination. Today, RISC-V has flourished in China, and China is also contributing more and more power in the RISC-V ecosystem-the first RISC-V China Summit is the best portrayal.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。