Documentation#

FlagOS

A unified, open-source system software stack designed for a variety of AI chips

FlagOS Overview
FlagOS Core Libraries

FlagGems

A high-performance general-purpose operator library implemented with the Triton programming language and its extended languages.

FlagTree

An open-source, unified compiler for multiple AI chips.

FlagScale

A comprehensive toolkit designed to support the entire lifecycle of large models.

FlagCX

A scalable and adaptive unified communication library for cross-chip environments.

Fused Operator Libraries

FlagGems-vllm

A high-performance deep learning operator library.

Multi-Domain Operator Libraries

FlagDNN

A deep neural network computing library oriented towards multiple chip backends.

FlagBLAS

A computing library that follows the BLAS standard interface.

FlagFFT

A JIT-compiled GPU FFT library via Triton/TLE.

FlagSparse

A domain-specific operator library for sparse computation scenarios.

FlagTensor

A high-performance tensor-primitive library implemented in Triton.

FlagAudio

A multi-backend computing library for audio signal processing.
FlagOS Ecosystem Enablement Projects

vllm-plugin-FL

A plugin for the vLLM inference/serving framework, built on FlagOS's unified multi-chip backend — including the unified operator library FlagGems and the unified communication library FlagCX.

Megatron-LM-FL

A fork of Megatron-LM that introduces a plugin-based architecture for supporting diverse AI chips, built on top of FlagOS, a unified open-source AI system software stack.

TransformerEngine-FL

A fork of TransformerEngine that introduces a plugin-based architecture for supporting diverse AI chips, built on top of FlagOS, a unified open-source AI system software stack.

verl-FL

A fork of verl (Volcano Engine Reinforcement Learning for LLMs) that extends the upstream library with multi-chip/multi-hardware support via the FlagOS ecosystem.

PyTorch-Plugin-FL

A custom PyTorch device plugin based on the PrivateUse1 extension mechanism, registering FlagGems high-performance Triton operators as the flagos device backend for unified multi-chip support.

sglang-plugin-FL

An out-of-tree (OOT) plugin for SGLang, built on FlagOS's unified multi-chip backend — including the unified operator library FlagGems and the unified communication library FlagCX. It extends SGLang's inference capabilities across diverse hardware platforms.
FlagOS Domain-Specific Projects

FlagOS-Robo

An integrated training and inference framework for AI models used in robots, so-called Embodied Intelligence.

FlagQuantum

A high-performance distributed quantum statevector simulator built on PyTorch, enabling quantum circuit simulation across multiple GPUs with automatic sharding and resharding.
FlagOS Developer Tools

KernelGen

An operator auto-generation tool.

KernelGenBench

A benchmark framework for evaluating LLM and agent-based Triton kernel generation across multiple hardware platforms.

FlagOS Skills

Compatible with Claude Code, Cursor, Codex, and any agent supporting the Agent Skills standard.

Online Laboratory

An online laboratory providing cloud-based development environments.
FlagOS Platform Services

FlagRelease

An automated platform for the cross-chip migration and release of open-source large models

FlagPerf

An integrated AI hardware evaluation engine.

FlagCICD

A CI/CD toolchain that streamlines large-model development across diverse AI chips, eliminating fragmentation and cutting adaptation costs.

Start to Use FlagOS

Join us to co-build an open AI chip development ecosystem

FlagOS Homepage