Core AI - OS 27 and WWDC26
Core AI for App Developers
Core AI is Apple's new on-device model deployment stack for OS 27 apps:
Python and PyTorch tools to produce .aimodel assets, a Swift framework
to run them, and Xcode tools for specialization, caching, profiling, and debugging.
Highlights
Use Core AI when you own the model asset; use Foundation Models when you want the language-model session API.
PyTorch pathcoreai-torch converts exported PyTorch graphs into optimized .aimodel files.
AIModel, InferenceFunction, and NDArray are the basic integration types.
Plan for model specialization, cache management, and ahead-of-time compilation before users tap the feature.
What Core AI Is
Core AI is not another prompt API. It is the deployment path for your own on-device models:
convert or author a model, ship or download the .aimodel, specialize it for the user's
Apple Silicon device, then run inference locally across CPU, GPU, and Neural Engine.
Foundation Models still matters. It is the ergonomic language-model API for sessions, streaming, tools, and structured output. Core AI sits below that when you bring your own model; Apple's Core AI Models runtime can expose a Core AI language model through the Foundation Models session shape.
import FoundationModels
import CoreAILanguageModels
@Generable
struct VocabCard {
let term: String
let definition: String
let example: String
}
let model = try await CoreAILanguageModel(resourcesAt: qwen3ModelURL)
let session = LanguageModelSession(model: model)
let response = try await session.respond(
to: "Create a short vocabulary card for sunflower",
generating: VocabCard.self
)
let card = response.content
That is the handoff point: Core AI owns the local model asset, specialization, and hardware execution; Foundation Models owns the conversational session, transcript, tools, streaming, and typed generation.
Convert PyTorch to .aimodel
The shortest path starts in Python. Export the model with torch.export, run Core AI's decomposition
table, convert with TorchConverter, then save an .aimodel asset that Xcode and the
OS 27 runtime can understand.
Older toolchain fallback
If your build environment cannot run the Core AI Python packages yet, keep the model in its current deployment format and isolate conversion in a separate experiment branch or CI job. Do not commit an unverified conversion; compare numerics against the original PyTorch model first.
import torch
from coreai_torch import TorchConverter, get_decomp_table
model = MyModel().eval()
sample = (torch.randn(1, 10),)
exported = torch.export.export(model, args=sample)
exported = exported.run_decompositions(get_decomp_table())
ai_program = TorchConverter().add_exported_program(exported).to_coreai()
ai_program.optimize()
ai_program.save_asset("MyModel.aimodel")
Use coreai-opt before conversion when memory, disk size, latency, or power needs compression.
Its workflows cover quantization, palettization, pruning, calibration-based compression, and fine-tuning-based
compression while staying in PyTorch until export.
Inspect the Model in Xcode
Add the .aimodel file to the app target and open it in Xcode. The model viewer shows metadata,
size, operation distribution, and function signatures. That signature is your contract: names, shapes,
scalar types, dynamic dimensions, inputs, outputs, and states.
This matters because Core AI does not hide tensor plumbing. If a function expects a 2D float32
NDArray named features, your app must provide that exact input.
AIModel, InferenceFunction, NDArray
The basic Swift flow is small: load the model, load a named inference function, prepare one or more
NDArray inputs, run the function, then read the output arrays.
import CoreAI
struct ModelRunner {
let function: InferenceFunction
init(modelURL: URL) async throws {
let model = try await AIModel(contentsOf: modelURL)
self.function = try model.loadFunction(named: "main")!
}
func predict(input: NDArray) async throws -> NDArray {
var outputs = try await function.run(inputs: ["features": input])
guard let logits = outputs.remove("logits")?.ndArray else {
throw ModelError.missingOutput
}
return logits
}
}
For hot loops, move beyond the simple path: allocate NDArrays in the runtime's preferred layout, pre-allocate outputs, and pipeline work with asynchronous values when multiple inference functions compose.
Specialization, AOT, and Caches
A shipped .aimodel is a portable source representation. Before it runs, Core AI specializes it
for the user's hardware and OS version. Large models can make that first load expensive, so do it during
onboarding, after a model download, or behind an explicit "prepare" step rather than inside the main tap path.
let cache = AIModelCache.default
if let model = try cache.model(for: modelURL, options: .default) {
// Fast path: already specialized and cached.
use(model)
} else {
showPreparingState()
try await AIModel.specialize(contentsOf: modelURL)
}
Ahead-of-time compilation moves part of specialization to the development machine. Apple's session shows
coreai-build producing compiled variants for device architectures; the app can then download
the right asset and leave less work for first launch.
Cache policy is part of product design. Core AI lets apps inspect the default model cache, explicitly specialize models, delete entries, tune persistence policy, and share a cache across apps in the same app group.
Stateful Inference
Transformer-style loops should not recompute the full history every step. Core AI supports model states: buffers that are read and updated in place during inference. In PyTorch, mutable buffers can become Core AI states during conversion; in Swift, you pass mutable views for those state NDArrays.
class Decoder(nn.Module):
def __init__(self):
super().__init__()
self.register_buffer("key_cache", torch.zeros(layers, max_length, width))
self.register_buffer("value_cache", torch.zeros(layers, max_length, width))
var states = InferenceFunction.MutableViews()
states.insert(&keyCache, for: "keyCache")
states.insert(&valueCache, for: "valueCache")
let outputs = try await function.run(
inputs: ["features": nextTokenFeatures],
states: states
)
The payoff is latency stability: after the first token or frame, each step can consume the latest input plus cached history instead of a growing full-context tensor.
Debug with Xcode and Instruments
Core AI has dedicated Xcode and Instruments support. Use the Core AI debug gauge to spot live activity, Instruments to see load, specialization, and inference latency, and the Core AI Debugger when converted numerics look wrong. The debugger can inspect intermediate tensor values and trace converted operations back to the Python source that produced them.
Keep a small numerics test in the Python pipeline too. Run the same representative input through the PyTorch model and the converted Core AI model, then fail the conversion if the maximum delta exceeds the tolerance your feature can accept.
Custom Kernels Are the Escape Hatch
Most apps should start with exported PyTorch models, Core AI Models recipes, or direct Core AI Python authoring. Custom op lowering, composite op externalization, Metal kernels, and Metal tensors are for unsupported operations or measured hot paths where the default graph is leaving performance on the table.
In practice: keep tensor-level work contained in the model package. App code should usually call a small Swift wrapper rather than spread preprocessing, tokenization, mask extraction, and state layout details across view models.
Adoption Checklist
Sources
- Apple Developer: Meet Core AI
- Apple Developer: Integrate on-device AI models into your app using Core AI
- Apple Developer: What's new in iOS 27
- Apple: Core AI PyTorch Extensions
- Apple: Core AI Python
- Apple: Core AI Optimization
- Apple Developer Documentation: Compiling Core AI models ahead of time
- Apple Developer Documentation: Managing model specialization and caching