TuneSalon AI
Resources

Beta Models (Engine B)

Last updated April 17, 2026

What Are Beta Models?

Beta models are the newest additions to TuneSalon. They run on a separate training engine (Engine B) that supports architectures our standard engine cannot handle: multimodal vision-language models and Mixture-of-Experts (MoE) models. The Beta label is honest: these models have been tested but the underlying libraries are moving fast, so expect occasional rough edges compared to the Standard lineup.

Why a Separate Engine?

Standard models (Engine A) use the classic HuggingFace PEFT stack with fp16 LoRA. It is battle-tested and works great for text-only dense models like Qwen3 and Mistral.

Beta models (Engine B) use Unsloth, which handles the quirks of modern architectures: MoE expert routing, fused 3D tensors, bf16 precision, and multimodal token handling. Keeping it as a separate engine means Beta improvements never destabilize the Standard path.

The 4 Beta Models

Qwen3.5-27BA100

Dense · 27B parameters

Qwen's latest dense model. Strong reasoning and instruction following. A solid step up from Mistral-Small-24B if you want to stay on A100.

Gemma-4-26B-A4BA100

MoE · 26B total, 4B active

Google's Gemma 4 MoE. Runs 26B total parameters but only activates 4B per token, so inference is fast while quality stays high.

Gemma-4-31BH200

Dense · 31B parameters

Google's Gemma 4 dense flagship. Top-tier instruction following. Requires H200 for the VRAM headroom.

Qwen3.5-35B-A3BH200

MoE · 35B total, 3B active

The flagship. 35B total parameters with only 3B active per token, so it chats fast on H200. Best for demanding tasks where quality matters most.

All four are Apache 2.0 licensed and fully usable for commercial applications.

When to Pick a Beta Model

  • You want faster inference with high quality: pick an MoE model (Gemma-4-26B-A4B or Qwen3.5-35B-A3B). Only a fraction of parameters activate per token, so responses are quicker than their total size suggests.
  • You want the best dense model: Gemma-4-31B on H200 gives top-tier quality. Qwen3.5-27B on A100 is the sweet spot if you do not want to pay H200 rates.
  • You have tried Standard and want more capability: any Beta model is a meaningful jump from the Standard 24B tier.

Trade-offs vs Standard

Standard (Engine A)Beta (Engine B)
StabilityVery stableStable with occasional quirks
ArchitecturesText-only denseMultimodal + MoE
Training precisionfp16bf16 via Unsloth
GGUF export cost5-50 credits (CPU)200-500 credits (GPU required)
Adapter file sizeSmall to mediumLarger (MoE has many experts)
Chat on the siteYesYes

GGUF export is the biggest cost difference. Standard models can be converted to GGUF on a cheap CPU container, but Beta models need a GPU (Unsloth requires it), which makes the export step noticeably more expensive.

Getting Started

Beta models show up in the Train tab alongside Standard models. Switch to the Beta sub-tab to see them. The training flow is identical: pick a model, upload your dataset, press Train.

Chat works the same way too. Load a Beta model in the Chat tab, apply your adapter, and talk to it. Multi-adapter loading, chat history, and GGUF export all work.