MLX
MLX is Apple’s native ML framework for Apple Silicon. On M-series Macs it consistently outperforms GGUF runtimes.
Tip If you’re on an M5 Mac and Ollama crashes with “llama runner process has terminated”, MLX bypasses the bug entirely.
Setup
- Run
openagent --setup. - Choose Local → MLX.
- OpenAgent runs
pip3 install --user mlx-lmand starts a server onlocalhost:8080.
Requires Python 3 and Apple Silicon (M1+).
Curated models
| Model | Size | RAM |
|---|---|---|
mlx-community/gemma-4-e2b-it-4bit |
1.5 GB | 5 GB |
mlx-community/gemma-4-e4b-it-4bit (default) |
3.0 GB | 5 GB |
mlx-community/gemma-4-26b-a4b-it-4bit |
15 GB | 18 GB |
mlx-community/gemma-4-31b-it-4bit |
18 GB | 20 GB |
mlx-community/Llama-3.2-3B-Instruct-4bit |
1.8 GB | 8 GB |
mlx-community/Meta-Llama-3.1-8B-Instruct-4bit |
4.5 GB | 16 GB |
mlx-community/Qwen2.5-Coder-7B-Instruct-4bit |
4.3 GB | 16 GB |
mlx-community/Qwen2.5-Coder-32B-Instruct-4bit |
18 GB | 32 GB |
mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit |
9.0 GB | 24 GB |
Custom models
Any model from the mlx-community Hugging Face org works. Paste the full repo path in the Custom option.
How it runs
OpenAgent starts the MLX server in the background:
nohup python3 -m mlx_lm.server --model <id> --port 8080 > /tmp/openagent-mlx.log 2>&1 &
The server is OpenAI-compatible on /v1/chat/completions. Logs land in /tmp/openagent-mlx.log.
Limitations
- Tool use is best-effort — MLX’s function-calling support varies by model. For agentic coding work, prefer Ollama or LM Studio if you can.
- One model at a time — switching models requires restarting the server. The picker handles this automatically.