MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

1. 研究背景 (Background)

MoE LLM（如 Qwen3-235B、DeepSeek-V3）：

推理时稀疏（activated experts 少）
但总参数量极大 → 部署成本高

已有方法：

expert pruning → 性能损失明显
low-rank decomposition → 表达能力不足

核心问题：
MoE expert 之间存在冗余，但不是完全可删除的冗余。

2. 拟解决的问题 (Problem)

目标：

压缩 MoE 模型参数量
保持性能稳定

难点：

expert specialization 强 → 不能简单删除
low-rank 方法不足以表达复杂结构
需要建模：
- expert 共性
- expert 差异

3. 贡献 (Contributions)

(1) 提出 Mixture-of-Basis-Experts

expert 不再独立存储
通过 shared basis 表示

(2) 显式建模 expert redundancy

分解为：
- shared component（共性）
- expert-specific component（差异）

(3) 高压缩低损失

参数 ↓ 24%–30%
精度 ↓ 1%–2%

4. 方法 (Method)

4.1 核心思想

对每个 expert weight：

W \approx A \cdot B

其中：

A：expert-specific（小）
B：shared basis（大）

4.2 Mixture-of-Basis

进一步：

B = i \sum α_{i} B_{i}

( $B_{i}$ )：shared basis matrices（layer-wise共享）
( $α_{i}$ )：expert-specific coefficients

每个 expert：

不再存完整矩阵
只存：
- 小矩阵 A
- 一组系数 α

4.3 设计选择

(1) 只压缩 up/gate matrix

down projection 保持不变

(2) 引入非线性

SiLU / GELU / Tanh 优于 ReLU

(3) Z-score normalization

稳定 basis 学习
不增加推理成本

4.4 优势

相比：

MoLAE
D²-MoE

MoBE：

reconstruction error 更低
表达能力更强

5. 实验 (Experiments)

模型

Qwen3-30B-A3B
Qwen3-235B-A22B
DeepSeek-V3
Kimi-K2

核心结果

(1) 压缩效果

参数 ↓ 24%–30%

(2) 性能

平均下降 ≈ 1.4%

对比

方法	压缩率	性能
MoBE	高	最优
MoLAE	中	较差
D²-MoE	中	更差

关键发现

expert 间存在大量 shared structure
直接 pruning 不如 sharing

6. Key Insights

Insight 1：Expert 冗余 ≠ 可以删除

有共享结构
应考虑 factorization

专门压 vision-related experts
保 text experts 不变

Insight 3：Pruning ≠ 唯一方向

可以三条路线：

Token pruning（FastMMoE）
Expert pruning（传统）
Basis sharing（MoBE）

Insight 4：MoE Compression 可与 VLM 结合

视觉模块通常更冗余
可优先应用 MoBE

7. 一句话总结

MoBE 的核心思想是：

MoE 的冗余不在于“多余的 expert”，而在于“expert 之间共享的表示空间”。

Starry's Blog

Explorer

MoBE

MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

1. 研究背景 (Background)

2. 拟解决的问题 (Problem)

3. 贡献 (Contributions)

(1) 提出 Mixture-of-Basis-Experts

(2) 显式建模 expert redundancy

(3) 高压缩低损失

4. 方法 (Method)

4.1 核心思想

4.2 Mixture-of-Basis

4.3 设计选择

(1) 只压缩 up/gate matrix

(2) 引入非线性

(3) Z-score normalization

4.4 优势

5. 实验 (Experiments)

模型

核心结果

(1) 压缩效果

(2) 性能

对比

关键发现

6. Key Insights

Insight 1：Expert 冗余 ≠ 可以删除

Insight 3：Pruning ≠ 唯一方向

Insight 4：MoE Compression 可与 VLM 结合

7. 一句话总结

Graph View

Table of Contents

Starry's Blog

Explorer

MoBE

MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

1. 研究背景 (Background)

2. 拟解决的问题 (Problem)

3. 贡献 (Contributions)

(1) 提出 Mixture-of-Basis-Experts

(2) 显式建模 expert redundancy

(3) 高压缩低损失

4. 方法 (Method)

4.1 核心思想

4.2 Mixture-of-Basis

4.3 设计选择

(1) 只压缩 up/gate matrix

(2) 引入非线性

(3) Z-score normalization

4.4 优势

5. 实验 (Experiments)

模型

核心结果

(1) 压缩效果

(2) 性能

对比

关键发现

6. Key Insights

Insight 1：Expert 冗余 ≠ 可以删除

Insight 2：可以做 Vision Expert Sharing

Insight 3：Pruning ≠ 唯一方向

Insight 4：MoE Compression 可与 VLM 结合

7. 一句话总结

Graph View

Table of Contents