Core AI - OS 27 and WWDC26

Core AI for App Developers

Core AI is Apple's new on-device model deployment stack for OS 27 apps: Python and PyTorch tools to produce .aimodel assets, a Swift framework to run them, and Xcode tools for specialization, caching, profiling, and debugging.

June 15, 2026 8 minute read WWDC26 session 324
WWDC26 Core AI session artwork
Apple positions Core AI as the lower-level runtime and toolchain for bringing your own on-device models to Apple Silicon.

Highlights

Runtime layer

Use Core AI when you own the model asset; use Foundation Models when you want the language-model session API.

PyTorch path

coreai-torch converts exported PyTorch graphs into optimized .aimodel files.

Swift API

AIModel, InferenceFunction, and NDArray are the basic integration types.

First-run cost

Plan for model specialization, cache management, and ahead-of-time compilation before users tap the feature.

What Core AI Is

Core AI is not another prompt API. It is the deployment path for your own on-device models: convert or author a model, ship or download the .aimodel, specialize it for the user's Apple Silicon device, then run inference locally across CPU, GPU, and Neural Engine.

Requires OS 27 runtime

Foundation Models still matters. It is the ergonomic language-model API for sessions, streaming, tools, and structured output. Core AI sits below that when you bring your own model; Apple's Core AI Models runtime can expose a Core AI language model through the Foundation Models session shape.

import FoundationModels
import CoreAILanguageModels

@Generable
struct VocabCard {
    let term: String
    let definition: String
    let example: String
}

let model = try await CoreAILanguageModel(resourcesAt: qwen3ModelURL)
let session = LanguageModelSession(model: model)

let response = try await session.respond(
    to: "Create a short vocabulary card for sunflower",
    generating: VocabCard.self
)

let card = response.content

That is the handoff point: Core AI owns the local model asset, specialization, and hardware execution; Foundation Models owns the conversational session, transcript, tools, streaming, and typed generation.

Convert PyTorch to .aimodel

The shortest path starts in Python. Export the model with torch.export, run Core AI's decomposition table, convert with TorchConverter, then save an .aimodel asset that Xcode and the OS 27 runtime can understand.

Python 3.11+ and PyTorch 2.8+
Older toolchain fallback

If your build environment cannot run the Core AI Python packages yet, keep the model in its current deployment format and isolate conversion in a separate experiment branch or CI job. Do not commit an unverified conversion; compare numerics against the original PyTorch model first.

import torch
from coreai_torch import TorchConverter, get_decomp_table

model = MyModel().eval()
sample = (torch.randn(1, 10),)

exported = torch.export.export(model, args=sample)
exported = exported.run_decompositions(get_decomp_table())

ai_program = TorchConverter().add_exported_program(exported).to_coreai()
ai_program.optimize()
ai_program.save_asset("MyModel.aimodel")

Use coreai-opt before conversion when memory, disk size, latency, or power needs compression. Its workflows cover quantization, palettization, pruning, calibration-based compression, and fine-tuning-based compression while staying in PyTorch until export.

Inspect the Model in Xcode

Add the .aimodel file to the app target and open it in Xcode. The model viewer shows metadata, size, operation distribution, and function signatures. That signature is your contract: names, shapes, scalar types, dynamic dimensions, inputs, outputs, and states.

Xcode 27 tooling

This matters because Core AI does not hide tensor plumbing. If a function expects a 2D float32 NDArray named features, your app must provide that exact input.

AIModel, InferenceFunction, NDArray

The basic Swift flow is small: load the model, load a named inference function, prepare one or more NDArray inputs, run the function, then read the output arrays.

Requires OS 27 runtime
import CoreAI

struct ModelRunner {
    let function: InferenceFunction

    init(modelURL: URL) async throws {
        let model = try await AIModel(contentsOf: modelURL)
        self.function = try model.loadFunction(named: "main")!
    }

    func predict(input: NDArray) async throws -> NDArray {
        var outputs = try await function.run(inputs: ["features": input])
        guard let logits = outputs.remove("logits")?.ndArray else {
            throw ModelError.missingOutput
        }
        return logits
    }
}

For hot loops, move beyond the simple path: allocate NDArrays in the runtime's preferred layout, pre-allocate outputs, and pipeline work with asynchronous values when multiple inference functions compose.

Specialization, AOT, and Caches

A shipped .aimodel is a portable source representation. Before it runs, Core AI specializes it for the user's hardware and OS version. Large models can make that first load expensive, so do it during onboarding, after a model download, or behind an explicit "prepare" step rather than inside the main tap path.

OS 27 cache and specialization APIs Xcode 27 AOT tools
let cache = AIModelCache.default

if let model = try cache.model(for: modelURL, options: .default) {
    // Fast path: already specialized and cached.
    use(model)
} else {
    showPreparingState()
    try await AIModel.specialize(contentsOf: modelURL)
}

Ahead-of-time compilation moves part of specialization to the development machine. Apple's session shows coreai-build producing compiled variants for device architectures; the app can then download the right asset and leave less work for first launch.

Cache policy is part of product design. Core AI lets apps inspect the default model cache, explicitly specialize models, delete entries, tune persistence policy, and share a cache across apps in the same app group.

Stateful Inference

Transformer-style loops should not recompute the full history every step. Core AI supports model states: buffers that are read and updated in place during inference. In PyTorch, mutable buffers can become Core AI states during conversion; in Swift, you pass mutable views for those state NDArrays.

Requires OS 27 runtime
class Decoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer("key_cache", torch.zeros(layers, max_length, width))
        self.register_buffer("value_cache", torch.zeros(layers, max_length, width))
var states = InferenceFunction.MutableViews()
states.insert(&keyCache, for: "keyCache")
states.insert(&valueCache, for: "valueCache")

let outputs = try await function.run(
    inputs: ["features": nextTokenFeatures],
    states: states
)

The payoff is latency stability: after the first token or frame, each step can consume the latest input plus cached history instead of a growing full-context tensor.

Debug with Xcode and Instruments

Core AI has dedicated Xcode and Instruments support. Use the Core AI debug gauge to spot live activity, Instruments to see load, specialization, and inference latency, and the Core AI Debugger when converted numerics look wrong. The debugger can inspect intermediate tensor values and trace converted operations back to the Python source that produced them.

Xcode 27 tooling

Keep a small numerics test in the Python pipeline too. Run the same representative input through the PyTorch model and the converted Core AI model, then fail the conversion if the maximum delta exceeds the tolerance your feature can accept.

Custom Kernels Are the Escape Hatch

Most apps should start with exported PyTorch models, Core AI Models recipes, or direct Core AI Python authoring. Custom op lowering, composite op externalization, Metal kernels, and Metal tensors are for unsupported operations or measured hot paths where the default graph is leaving performance on the table.

Core AI Python and Metal 4 tooling

In practice: keep tensor-level work contained in the model package. App code should usually call a small Swift wrapper rather than spread preprocessing, tokenization, mask extraction, and state layout details across view models.

Adoption Checklist

Decide whether this is a Foundation Models feature or a Core AI feature. If you only need Apple's model and the session API, stay higher-level. If you need your own model, use Core AI.
Treat conversion as a testable build artifact. Keep sample inputs, shape expectations, and PyTorch-vs-Core-AI numeric checks near the export script.
Move specialization out of the interactive flow. Download, specialize, and cache models when the user opts in or while a first-run screen is already on screen.
Measure before using lower-level APIs. Preferred layouts, pre-allocated outputs, states, and custom kernels are worth it when Instruments shows the cost.

Sources