Gradient Compression
This guide explains the built-in gradient compression options for the model segment exchange. Use these to reduce upload/download bandwidth per round.
Learn by doing?
See an example of MNist running with gradient compression here!
Modes
none(default): no compression.quantize8: per-tensor symmetric int8 quantization with per-tensor scale; dequantized on receipt by the client/server.topk: sparsify to top-K% largest-magnitude elements (keeps indices + values).quantize8+topkortopk+quantize8: combine int8 quantization with Top-K sparsityauto: uses both int8 and Top-K (if available)
How to set it (workloads)
In your training session configuration (supervised/unsupervised/RL), set the compression mode string after linking the session. For example:
const trainingSession = new SupervisedTrainingSession(
...
);
linkTrainingSession(trainingSession);
trainingSession.setCompressionMode('quantize8+topk'); // or 'none', 'topk', 'quantize8', 'auto'
trainingSession.setMetricsFunction(...)If you don’t set it, none is used.
What gets compressed
Model weight segments sent from server to clients and client updates sent back (per round).
Biases follow the same compression path as weights.
Caveats
Quantization is per-tensor; very small-magnitude tensors may lose fidelity. Compare accuracy to your workflow without compression to determine if this loss is acceptable.
Recommendations
Start with
quantize8for bandwidth savings with minimal accuracy impact.Leave
nonefor very small models or when debugging convergence.Use
topkfor more aggressive bandwidth reduction, at the cost of accuracy
Last updated