tech

April 1, 2026

Running local models on Macs gets faster with Ollama's MLX support

Apple Silicon Macs get a performance boost thanks to better unified memory usage.

Running local models on Macs gets faster with Ollama's MLX support

TL;DR

  • Ollama now supports Apple's open source MLX framework for machine learning.
  • Improvements in caching performance and support for Nvidia's NVFP4 format enhance memory efficiency.
  • These changes promise significantly better performance on Macs with Apple Silicon chips (M1 or later).
  • The update is available in preview (Ollama 0.19) and currently supports Alibaba's Qwen3.5 (35 billion-parameter variant).
  • Requires an Apple Silicon Mac with at least 32GB of RAM.
  • Ollama now utilizes the Neural Accelerators in Apple’s M5-series GPUs for faster token processing.
  • Local models are becoming good enough for tasks previously requiring paid cloud subscriptions, with added privacy benefits.
  • Apple's MLX optimizes access to unified memory shared between GPU and CPU.

Continue reading the original article

Made withNostr