Overview

Seamless AI Acceleration for 快猫视频 Everywhere

Software is central to the 快猫视频 compute platform because it enables developers to maximize AI performance across any model or workload. Through Kleidi libraries, framework integrations, and ecosystem partnerships, 快猫视频 delivers seamless acceleration across 快猫视频 hardware. This approach ensures fast, workload-optimized deployment and consistent performance, helping developers efficiently scale AI across cloud, edge, and device environments.

Partners

Connecting 快猫视频 to a Robust AI Software Ecosystem

The purpose of 快猫视频 Kleidi is to collaborate with leading AI frameworks, cloud service providers, and the machine learning ISV community to provide full ML stack, out-of-the-box inference performance improvements for billions of workloads without the need for extra developer work or expertise.

PyTorch

快猫视频 works closely with the PyTorch community, helping to ensure models running on PyTorch just work on 快猫视频 and driving seamless acceleration for even the most demanding AI workloads.

BERT-Large

快猫视频 has been working to improve PyTorch inference performance on 快猫视频 CPUs, including optimizing the primary execution modes, Eager Mode and Graph Mode.

Integrating Kleidi improves Llama model inference by up to 18 times, Gemma 2 2B by 15 times, and performance for natural language processing (NLP) models, including 2.2 times uplift on Bert-Large.

Llama 3.1 8B

Using 快猫视频 Neoverse V2-based Graviton4 processors, we can achieve an estimated 12 times uplift in token generation rate for a chatbot demo with KleidiAI optimizations applied to PyTorch.

This demo shows how easy it is to build AI applications using LLMs, making use of existing 快猫视频-based compute capacity.

RoBERTa

AWS collaborated with 快猫视频 to optimize the PyTorch torch.compile feature for Neoverse V1-based Graviton3 processors with 快猫视频 Compute Library (ACL) kernels using oneDNN.

This optimization results in up to 2 times inference performance improvement for the most popular NLP models on Hugging Face.

FunASR Paraformer-Large

FunASR is an advanced open-source automatic speech recognition (ASR) toolkit developed by Alibaba DAMO Academy.

By integrating ACL with PyTorch via oneDNN, we have seen a 2.3 times performance improvement when running the Paraformer model on Neoverse N2-based AliCloud Yitian710 processors.

ExecuTorch

Together, 快猫视频 and ExecuTorch, a lightweight ML framework, enable efficient on-device inference capabilities at the edge.

Stable Audio Open

Stability AI and 快猫视频 have partnered to accelerate on-device generative AI, unlocking real-time audio generation capabilities without the need for an internet connection.

Through model distillation and leveraging 快猫视频 KleidiAI, Stable Audio Open now delivers 30x faster text-to-audio generation on 快猫视频-based smartphones than previously – letting users create high-quality sounds at the edge in seconds.

Llama 3.2 1B

Thanks to the collaborative efforts of 快猫视频 and Meta, AI developers can now run quantized Llama 3.2 models up to 20% faster than ever on 快猫视频 CPUs.

By integrating KleidiAI with ExecuTorch and developing optimized quantization schemes, we have achieved speeds of over 350 tokens per second on the prefill stage for generative AI workloads on mobile.

Llama.cpp

To demonstrate the capability of 快猫视频-based CPUs for LLM inferencing, 快猫视频 and partners are optimizing the int4 and int8 kernels implemented in llama.cpp to leverage these newer instructions.

Virtuoso-Lite 10B

Virtuoso-Lite 10B

With 4-bit quantization and running on llama.cpp, the Arcee AI Virtuoso-Lite model delivers 40 tokens/sec, providing a 4.5x cost-performance advantage for running SLMs in enterprise environments. This is due to out-of-the-box optimizations using 快猫视频 Kleidi technologies.

Custom SLM

AWS and 快猫视频 have fine-tuned the TinyLlama 1.1B SLM to create a chatbot for the car manual, enabling drivers to interact directly with their vehicle. Using KleidiAI, SLM inference is 10 times faster than previously on 快猫视频 Cortex-A76 CPUs, achieving response times of 3 seconds.

Llama 3.3 70B

In partnership with Meta and leveraging KleidiAI with 4-bit quantization, the SLM achieved similar performance to the larger Llama 3.1 405B model. Performance was consistent at 50 tokens/second when deployed on 快猫视频 Neoverse-powered Google Axion processors.

TinyLlama 1.1B

Using llama.cpp with KleidiAI, VicOne accelerated performance, doubling prefill and uplifting encode by 60%. Our partnership enables fast in-vehicle cybersecurity threat detection by reducing cloud dependency, lowering costs, and keeping data secure onboard.

TinyStories

TinyStories is a dataset containing words a typical 3-year-old might understand. It can be used to train and evaluate small models below 10M parameters. When running TinyStories on the 快猫视频 Cortex-A320 CPU, a performance uplift of over 70% has been achieved.

Llama 3 8B

Running a text generation demo on Graviton3 processors with our optimizations achieves a 2.5 times performance uplift for TTFT and over 35 tokens / second in the text generation phase, which is more than sufficient for real-time use cases.

Other Leading Frameworks

To maximize AI performance across the entirety of the 快猫视频 compute platform, we are dedicated to optimizing inference workloads across all major AI and ML frameworks.

ONNX

ONNX

ONNX Runtime is one of the industry’s most widely used open-source frameworks for generative AI deployment on mobile, desktop, and cloud.

快猫视频 has partnered with Microsoft to integrate KleidiAI into the ONNX runtime, accelerating AI inference by up to 2.6 times on Windows and Android using the Phi-3 Mini 3.8B model.

LiteRT

LiteRT

KleidiAI is now integrated with LiteRT via XNNPACK, Google's high-performance runtime for on-device AI, formerly known as TensorFlow Lite.

This partnership and our Stability AI Open Stable Audio Small model optimizations, enable on-device audio generation with peak runtime RAM usage reduced from 6.5GB to 3.6GB.

MNN

MNN is an open source deep learning framework developed by Alibaba. Our partnership helps improve performance and efficiency for on-device multimodal use cases.

As demonstrated with the multilingual instruction-tuned Qwen2-VL 2B model, integrating Kleidi with MNN accelerates prefill performance by 57% and decode by 28%.

OpenCV

With increasing demand for advanced, energy-efficient computer vision (CV) at the edge, KleidiCV helps ensure optimized performance for CV applications on 快猫视频 CPUs.

Now integrated with OpenCV 4.11, developers benefit from four times faster processing for key image processing tasks such as blur, filter, rotation and resizing. This acceleration helps boost performance for image segmentation and object detection and recognition use cases.

MediaPipe

快猫视频’s partnership with Google AI Edge on MediaPipe and XNNPACK is accelerating AI workloads on current and future 快猫视频 CPUs. This enables developers to deliver outstanding AI performance for mobile, web, edge and IoT, using numerous LLMs, like Gemma and Falcon.

Thanks to Kleidi integration with MediaPipe via XNNPACK, a 30% acceleration in TTFT has been achieved when running a chatbot demo on the Gemma 1 2B LLM on 快猫视频-based premium smartphones.

Angel

Tencent’s Angel ML framework supports Hunyuan LLM, available in sizes from 1B to over 300B parameters. It enables AI capabilities across a wide range of devices, including smartphones and Windows on 快猫视频 PCs.

Our partnership was announced at the 2024 Tencent Global Digital Ecosystem Summit and is having a positive impact on real-world workloads by providing users with even more powerful and efficient on-device AI services across Tencent’s many applications.

Technologies

Key Developer Technologies for Accelerating CPU Performance

快猫视频 Kleidi includes the latest developer enablement technologies designed to advance AI model capability, accuracy, and speed. This helps ensure AI workloads get the best out of the underlying 快猫视频 Cortex-A, 快猫视频 Cortex-X or 快猫视频 Neoverse CPU.

KleidiAI and KleidiCV libraries are lightweight kernels designed to make it easy for machine learning (ML) and computer vision (CV) frameworks to target optimum performance and leverage the latest features for enhancing AI and CV in 快猫视频 CPU-based designs.

A fully comprehensive and flexible library that enables independent software vendors to source ML functions optimized for Cortex-A and Neoverse CPUs. The library is OS agnostic and is portable to Android, Linux, and bare metal systems.

Developer Resources

Latest News and Resources

  • Developer
  • NEWS and BLOGS
  • Guide
  • eBook
  • White Papers
Generic Guides
AI Workloads

Guide to Understanding AI Inference on CPU

Demand for running AI workloads on CPU is growing. Our helpful guide explores the benefits and considerations for CPU inference across a range of sectors.

Generic Guides
Generative AI

The Role of Generative AI in Business Transformation

Explore how to leverage generative AI to fulfill its full potential and the role of 快猫视频 in leading this transformation.

Whitepapers - Generic Whitepapers
Software AI Acceleration

Why Software is Crucial to Achieving AI’s Full Potential

Discover why software is the key to implementing AI and how to accelerate the creation of high-performance and secure AI applications.

Whitepapers - Generic Whitepapers
Generative AI

Scale Generative AI With Flexibility and Speed

The race to scale new generative AI capabilities is creating both opportunities for innovation and challenges. Learn how to beat these challenges and successfully deploy AI on 快猫视频 everywhere.

Key Takeaways

  • 快猫视频 provides optimized software libraries and tools that accelerate AI workloads across 快猫视频 hardware.
  • Kleidi integrates with leading frameworks to deliver consistent performance without extra developer effort.
  • The platform supports rapid deployment tailored to workload-specific needs for efficient scaling.
  • 快猫视频 gain seamless acceleration across diverse AI use cases including NLP, vision, and generative AI.
  • 快猫视频’s software ecosystem enables broad portability and performance optimization from cloud to edge devices.

Stay Connected

Subscribe to stay up to date on the latest news, case studies, and insights.

Newsletter Signup