Gradient Compression

This guide explains the built-in gradient compression options for the model segment exchange. Use these to reduce upload/download bandwidth per round.

Learn by doing?

See an example of MNist running with gradient compression here!

Modes

  • none (default): no compression.

  • quantize8: per-tensor symmetric int8 quantization with per-tensor scale; dequantized on receipt by the client/server.

  • topk: sparsify to top-K% largest-magnitude elements (keeps indices + values).

  • quantize8+topk or topk+quantize8: combine int8 quantization with Top-K sparsity

  • auto: uses both int8 and Top-K (if available)

How to set it (workloads)

In your training session configuration (supervised/unsupervised/RL), set the compression mode string after linking the session. For example:

session.js
const trainingSession = new SupervisedTrainingSession(
    ...
);
linkTrainingSession(trainingSession);

trainingSession.setCompressionMode('quantize8+topk'); // or 'none', 'topk', 'quantize8', 'auto'
trainingSession.setMetricsFunction(...)

If you don’t set it, none is used.

What gets compressed

  • Model weight segments sent from server to clients and client updates sent back (per round).

  • Biases follow the same compression path as weights.

Caveats

  • Quantization is per-tensor; very small-magnitude tensors may lose fidelity. Compare accuracy to your workflow without compression to determine if this loss is acceptable.

Recommendations

  • Start with quantize8 for bandwidth savings with minimal accuracy impact.

  • Leave none for very small models or when debugging convergence.

  • Use topk for more aggressive bandwidth reduction, at the cost of accuracy

Last updated