Unsupervised Learning

Unsupervised Learning with FedDDL

This guide shows you how to run an unsupervised (or self-supervised) workload on FedDDL. You supply the model, data arrays, and training hyperparameters; the platform handles client partitioning, rounds, and aggregation.

Learn by doing?

See a full example of training an autoencoder on the MNist dataset here!

What you need to provide

  • Model architecture: a compiled tf.Sequential (or similar) model. Optimizers are supported; custom losses/metrics are not (use built-in Tensorflow.js losses/metrics like meanSquaredError, cosineProximity, etc.).

  • Flattened dataset: input array sized to TOTAL_SAMPLES * INPUT_SIZE. For autoencoders, the target is usually the same as the input.

  • Test/validation split (optional): testInput (and testOutput if applicable) for evaluation.

  • Training config: TOTAL_SAMPLES, INPUT_SIZE, OUTPUT_SIZE (often equals INPUT_SIZE), TOTAL_ROUNDS, BATCH_SIZE, EPOCHS_PER_ROUND, MIN_CLIENTS_TO_START.

Core class: UnsupervisedTrainingSession

Create one session per workload. It mirrors the supervised flow but uses the input as its own target.

Constructor signature:

new UnsupervisedTrainingSession(
	FINAL_OUTPUT_UNITS,   // matches last layer units
	TOTAL_SAMPLES,        // number of training examples
	INPUT_SIZE,           // flattened input length
	OUTPUT_SIZE,          // flattened target length (often same as input)
	TOTAL_ROUNDS,         // federated rounds
	BATCH_SIZE,           // per-client batch size
	EPOCHS_PER_ROUND,     // local epochs per round
	MIN_CLIENTS_TO_START, // quorum to begin training
	input, output,        // flattened training data (output optional if same as input)
	model,                // compiled tf model
	testInput, testOutput // optional eval data
)

Example workload (autoencoder-style)

Data shape and partitioning

  • Data is partitioned per client by slicing the flat arrays; ensure arrays are contiguous and sized correctly.

  • input length must equal TOTAL_SAMPLES * INPUT_SIZE.

Constraints and tips

  • Avoid custom losses/metrics; use built-in tf losses/metrics.

  • Metrics callback is optional and runs on the server; keep it lightweight.

  • Choose BATCH_SIZE and EPOCHS_PER_ROUND based on client capability; larger values increase on-device compute.

  • MIN_CLIENTS_TO_START gates the first round; set to 1 for single-client demos.

What runs where

  • Model training and metrics run on clients.

  • The server hosts the session, partitions data, and aggregates weight segments.

Last updated