Unsupervised Learning

Unsupervised Learning with FedDDL

This guide shows you how to run an unsupervised (or self-supervised) workload on FedDDL. You supply the model, data arrays, and training hyperparameters; the platform handles client partitioning, rounds, and aggregation.

Learn by doing?

See a full example of training an autoencoder on the MNist dataset here!

What you need to provide

Model architecture: a compiled tf.Sequential (or similar) model. Optimizers are supported; custom losses/metrics are not (use built-in Tensorflow.js losses/metrics like meanSquaredError, cosineProximity, etc.).
Flattened dataset: input array sized to TOTAL_SAMPLES * INPUT_SIZE. For autoencoders, the target is usually the same as the input.
Test/validation split (optional): testInput (and testOutput if applicable) for evaluation.
Training config: TOTAL_SAMPLES, INPUT_SIZE, OUTPUT_SIZE (often equals INPUT_SIZE), TOTAL_ROUNDS, BATCH_SIZE, EPOCHS_PER_ROUND, MIN_CLIENTS_TO_START.

Core class: `UnsupervisedTrainingSession`

Create one session per workload. It mirrors the supervised flow but uses the input as its own target.

Constructor signature:

new UnsupervisedTrainingSession(
	FINAL_OUTPUT_UNITS,   // matches last layer units
	TOTAL_SAMPLES,        // number of training examples
	INPUT_SIZE,           // flattened input length
	OUTPUT_SIZE,          // flattened target length (often same as input)
	TOTAL_ROUNDS,         // federated rounds
	BATCH_SIZE,           // per-client batch size
	EPOCHS_PER_ROUND,     // local epochs per round
	MIN_CLIENTS_TO_START, // quorum to begin training
	input, output,        // flattened training data (output optional if same as input)
	model,                // compiled tf model
	testInput, testOutput // optional eval data
)

Example workload (autoencoder-style)

import './src/tools/util-polyfill.js';
import * as tf from '@tensorflow/tfjs-node';
import { UnsupervisedTrainingSession } from './src/sessions/unsupervised.js';
import { linkTrainingSession, markAllTasksDone } from './src/tools/server.js';

const INPUT_SIZE = 784;     // e.g., 28x28 flattened
const OUTPUT_SIZE = 784;    // autoencoder target == input
const TOTAL_SAMPLES = 20000;

// Flattened data
const input = /* Float32Array or array */;
const output = input; // for autoencoders; provide separate output if your task differs
const testInput = /* optional */;
const testOutput = testInput;

// Model (simple autoencoder example)
const model = tf.sequential();
model.add(tf.layers.dense({ units: 256, activation: 'relu', inputShape: [INPUT_SIZE] }));
model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
model.add(tf.layers.dense({ units: 256, activation: 'relu' }));
model.add(tf.layers.dense({ units: OUTPUT_SIZE, activation: 'linear' }));
model.compile({ optimizer: tf.train.adam(1e-3), loss: 'meanSquaredError' });

const session = new UnsupervisedTrainingSession(
	OUTPUT_SIZE,
	TOTAL_SAMPLES,
	INPUT_SIZE,
	OUTPUT_SIZE,
	/* TOTAL_ROUNDS */ 3,
	/* BATCH_SIZE */ 128,
	/* EPOCHS_PER_ROUND */ 1,
	/* MIN_CLIENTS_TO_START */ 1,
	input,
	output,
	model,
	testInput,
	testOutput
);

linkTrainingSession(session);

session.setMetricsFunction((metrics) => {
	// metrics come from tfjs training on the client
});

session.completedTraining().then(async () => {
	markAllTasksDone();
	await session.save('unsupervised_model');
});

Data shape and partitioning

Data is partitioned per client by slicing the flat arrays; ensure arrays are contiguous and sized correctly.
input length must equal TOTAL_SAMPLES * INPUT_SIZE.

Constraints and tips

Avoid custom losses/metrics; use built-in tf losses/metrics.
Metrics callback is optional and runs on the server; keep it lightweight.
Choose BATCH_SIZE and EPOCHS_PER_ROUND based on client capability; larger values increase on-device compute.
MIN_CLIENTS_TO_START gates the first round; set to 1 for single-client demos.

What runs where

Model training and metrics run on clients.
The server hosts the session, partitions data, and aggregates weight segments.

PreviousPython Environments NextSupervised Learning

Last updated 1 month ago

hashtagUnsupervised Learning with FedDDL

hashtagLearn by doing?

hashtagWhat you need to provide

hashtagCore class: UnsupervisedTrainingSession

hashtagExample workload (autoencoder-style)

hashtagData shape and partitioning

hashtagConstraints and tips

hashtagWhat runs where