sales@hkmjd.com
Service Telephone:86-755-83294757
Intel Xeon Processors Demonstrate Superior AI Large Model Inference Performance in AISBench Tests
Recently, the fifth-generation Intel Xeon Scalable processors passed the Artificial Intelligence Server System Performance Test (AISBench) organised by the China National Institute of Electronic Technology Standardization. Intel became one of the fi…
Recently, the fifth-generation Intel ® Xeon ® Scalable processors passed the Artificial Intelligence Server System Performance Test (AISBench) organised by the China National Institute of Electronic Technology Standardization. Intel became one of the first companies to pass the AISBench Large Language Model (LLM) inference performance test.
Based on the relevant requirements of the national standard ‘AI Server System Performance Test Specification’ (draft for public comment), the Saixi Lab of the China National Institute of Electronic Technology Standardization (CNIET) completed the AI large model reasoning performance and accuracy test of the fifth-generation Intel Xeon Scalable Processor using the AISBench 2.0 test tool. In the test, the 5th generation Intel Xeon demonstrated excellent inference performance on both ChatGLM V2-6B (6 billion parameters) and Llama2-13B (13 billion parameters) models, which can meet the real-time inference requirements of lightweight large language models.
In this single-computer performance test, the dataset was constructed in a closed test scenario with a fifth-generation Intel Xeon-based server achievable while meeting the normal human reading speed requirement (generation latency of less than 100 milliseconds):
In ChatGLM V2 model generalised inference with 6 billion parameters, performance of up to 2493 tokens per second when the input-output sequence is 256, and up to 926 tokens per second when the input-output sequence is 2048.
In general-purpose reasoning for the Llama2 model with 13 billion parameters, performance is up to 513 tokens per second when the input-output is 256, and up to 132 tokens per second when the input-output sequence is 2048.1 The Llama2 model is a general-purpose processor that is designed to perform at the same level of performance as a general-purpose processor.
As a general-purpose processor, the fifth-generation Intel Xeon delivers outstanding performance across key workloads such as AI, networking, storage, databases, etc. Test results from AISBench 2.0 validate the excellent inference performance demonstrated by the Intel Xeon when running lightweight, large-language models, enabling customers to build a general-purpose AI system for data preprocessing using Xeon-based servers, model inference and deployment, resulting in a combination of AI performance, efficiency, accuracy and scalability. This also highlights Intel Xeon's ability to provide enterprises with ‘out-of-the-box’ functionality, where a portion of their AI workloads can be deployed on a general-purpose system, giving customers a better total cost of ownership (TCO) advantage.
Hard and Soft, Xeon Demonstrates AI Advantages
Not only does Intel Xeon's built-in AI accelerator make it an ideal solution for running some AI workloads on general-purpose processors, but Intel also equips it with optimised, easy-to-program open software that lowers the barriers for customers and eco-partners to deploying a wide range of AI-based solutions in the datacentre, from the cloud to the intelligent edge.
The fifth-generation Intel Xeon Scalable processors take full advantage of system-level benefits (including cache, memory, etc.), and as a result, achieve significant improvements in inference speed.
Its built-in AI accelerator, Intel ® Advanced Matrix Extensions (AMX), enables full utilisation of computing resources by providing a dedicated Matrix Operations Acceleration Module (TMUL), as well as support for low-precision data types such as INT8 and BF16, resulting in a significant increase in computing efficiency.
xFasterTransformer (xFT) is a deeply optimised open solution provided by Intel for deploying large language models on CPU platforms, which is easier for users to use and integrate into their own business frameworks through two API interfaces: C++ and Python.
About AISBench
AISBench benchmarks are a set of performance test benchmarks applied to AI computing products, led by the China National Institute for Standardization of Electronic Technology. Similar to international advanced computing benchmarks such as MLPerf, the benchmark is used to test a variety of AI computing product forms and supports rich testing scenarios, modes, types and metrics.
Time:2024-11-18
Time:2024-11-18
Time:2024-11-18
Time:2024-11-18
Contact Number:86-755-83294757
Enterprise QQ:1668527835/ 2850151598/ 2850151584/ 2850151585
Business Hours:9:00-18:00
E-mail:sales@hkmjd.com
Company Address:Room1239, Guoli building, Zhenzhong Road, Futian District, Shenzhen, Guangdong
CopyRight ©2022 Copyright belongs to Mingjiada Yue ICP Bei No. 05062024-12
Official QR Code
Links: