Supervised Learning

Supervised Learning with FedDDL

This guide shows you how to set up a supervised learning workload on FedDDL. You only need to provide a model, dataset arrays, and a few training hyperparameters; the platform handles client partitioning, rounds, and aggregation.

Learn by doing?

See a full example of MNist running on the Obit network here!

What you need to provide

Model architecture: a compiled tf.Sequential (or similar) model. Optimizers are supported; custom losses/metrics are not (use built-in Tensorflow.js losses/metrics).
Flattened dataset: input and output arrays sized to TOTAL_SAMPLES * INPUT_SIZE and TOTAL_SAMPLES * OUTPUT_SIZE respectively. Labels should already be encoded (e.g., one-hot for classification).
Test split (optional): testInput and testOutput arrays for evaluation.
Training config: TOTAL_SAMPLES, INPUT_SIZE, OUTPUT_SIZE, TOTAL_ROUNDS, BATCH_SIZE, EPOCHS_PER_ROUND, MIN_CLIENTS_TO_START.

Core class: `SupervisedTrainingSession`

Create one session per workload. It:

Serves model + metadata to clients
Partitions data across clients on demand
Aggregates client weight segments
Optionally applies differential privacy (disabled by default)

Constructor signature:

new SupervisedTrainingSession(
	FINAL_OUTPUT_UNITS,   // matches last layer units
	TOTAL_SAMPLES,        // number of training examples
	INPUT_SIZE,           // flattened input length
	OUTPUT_SIZE,          // flattened output/label length
	TOTAL_ROUNDS,         // federated rounds
	BATCH_SIZE,           // per-client batch size
	EPOCHS_PER_ROUND,     // local epochs per round
	MIN_CLIENTS_TO_START, // quorum to begin training
	input, output,        // flattened training data
	model,                // compiled tf model
	testInput, testOutput // optional eval data
)

Minimal example

import './src/tools/util-polyfill.js';
import * as tf from '@tensorflow/tfjs-node';
import { SupervisedTrainingSession } from './src/sessions/supervised.js';
import { linkTrainingSession, markAllTasksDone } from './src/tools/server.js';

// Shapes
const INPUT_SIZE = 784;           // e.g., 28x28 flattened
const OUTPUT_SIZE = 10;           // e.g., one-hot classes
const TOTAL_SAMPLES = 25000;

// Data: supply flattened arrays sized TOTAL_SAMPLES * INPUT_SIZE / OUTPUT_SIZE
const input = /* Float32Array or array */;
const output = /* Float32Array or array */;
const testInput = /* optional */;
const testOutput = /* optional */;

// Model
const model = tf.sequential();
model.add(tf.layers.dense({ units: OUTPUT_SIZE, activation: 'softmax', inputShape: [INPUT_SIZE] }));
model.compile({ optimizer: tf.train.adam(), loss: 'categoricalCrossentropy', metrics: ['accuracy'] });

// Session
const session = new SupervisedTrainingSession(
	OUTPUT_SIZE,
	TOTAL_SAMPLES,
	INPUT_SIZE,
	OUTPUT_SIZE,
	/* TOTAL_ROUNDS */ 2,
	/* BATCH_SIZE */ 100,
	/* EPOCHS_PER_ROUND */ 1,
	/* MIN_CLIENTS_TO_START */ 1,
	input,
	output,
	model,
	testInput,
	testOutput
);

linkTrainingSession(session);

session.setMetricsFunction((metrics) => {
	// metrics come from tfjs training on the client
	// forward to your logger/telemetry if desired
});

session.completedTraining().then(async () => {
	markAllTasksDone();
	await session.save('my_supervised_model');
});

Data shape and partitioning

Data is partitioned per client by slicing the flat arrays; ensure arrays are contiguous and sized correctly.
input length must equal TOTAL_SAMPLES * INPUT_SIZE.
output length must equal TOTAL_SAMPLES * OUTPUT_SIZE (use one-hot labels for classification).

Constraints and tips

Avoid custom loss/metric functions; use built-in Tensorflow.js losses/metrics.
Metrics function is optional and runs on the server. Keep it lightweight to avoid logging overhead.
Choose BATCH_SIZE and EPOCHS_PER_ROUND based on requirements of your model. Larger values increase compute required for clients.

What runs where

The server hosts the session, partitions data, and aggregates weight segments.
The client receives its data partition and trains the model.

PreviousUnsupervised Learning NextEntitlements

Last updated 1 month ago

hashtagSupervised Learning with FedDDL

hashtagLearn by doing?

hashtagWhat you need to provide

hashtagCore class: SupervisedTrainingSession

hashtagMinimal example

hashtagData shape and partitioning

hashtagConstraints and tips

hashtagWhat runs where