Running local models on Macs gets faster with Ollama's MLX support

April 1, 2026

TL;DR

Ollama now supports Apple's open source MLX framework for machine learning.
Improvements in caching performance and support for Nvidia's NVFP4 format enhance memory efficiency.
These changes promise significantly better performance on Macs with Apple Silicon chips (M1 or later).
The update is available in preview (Ollama 0.19) and currently supports Alibaba's Qwen3.5 (35 billion-parameter variant).
Requires an Apple Silicon Mac with at least 32GB of RAM.
Ollama now utilizes the Neural Accelerators in Apple’s M5-series GPUs for faster token processing.
Local models are becoming good enough for tasks previously requiring paid cloud subscriptions, with added privacy benefits.
Apple's MLX optimizes access to unified memory shared between GPU and CPU.

Continue reading the original article