Skip to content

vllm.v1.attention.ops.chunked_prefill_paged_decode

has_native_kv_cache_layout

has_native_kv_cache_layout(
    key_cache: Tensor, value_cache: Tensor
) -> bool

Return whether KV cache blocks can use the native ROCm pairing.

The C++ ops.paged_attention_rocm custom kernel requires each block to be contiguous in memory. Returns False for stride-padded hybrid layouts and for the unified KV cache (RFC #42082, see :meth:PagedAttention.split_kv_cache), routing them to Triton.

Source code in vllm/v1/attention/ops/chunked_prefill_paged_decode.py
def has_native_kv_cache_layout(
    key_cache: torch.Tensor,
    value_cache: torch.Tensor,
) -> bool:
    """Return whether KV cache blocks can use the native ROCm pairing.

    The C++ ``ops.paged_attention_rocm`` custom kernel requires each block
    to be contiguous in memory. Returns False for stride-padded hybrid
    layouts and for the unified KV cache (RFC #42082, see
    :meth:`PagedAttention.split_kv_cache`), routing them to Triton.
    """
    return (
        key_cache.stride(0) == key_cache.shape[1:].numel()
        and value_cache.stride(0) == value_cache.shape[1:].numel()
    )