tech
April 1, 2026
Running local models on Macs gets faster with Ollama's MLX support
Apple Silicon Macs get a performance boost thanks to better unified memory usage.

TL;DR
- Ollama now supports Apple's open source MLX framework for machine learning.
- Improvements in caching performance and support for Nvidia's NVFP4 format enhance memory efficiency.
- These changes promise significantly better performance on Macs with Apple Silicon chips (M1 or later).
- The update is available in preview (Ollama 0.19) and currently supports Alibaba's Qwen3.5 (35 billion-parameter variant).
- Requires an Apple Silicon Mac with at least 32GB of RAM.
- Ollama now utilizes the Neural Accelerators in Apple’s M5-series GPUs for faster token processing.
- Local models are becoming good enough for tasks previously requiring paid cloud subscriptions, with added privacy benefits.
- Apple's MLX optimizes access to unified memory shared between GPU and CPU.
Continue reading the original article