Environment API

Learn by doing?

Check out our real CartPole demo as an example of how to write a custom Environment. You'll use the exact same APIs.

Writing a Custom Environment

This guide shows how to add your own RL environment for FedDDL. Environments export a class named Environment that implements a minimal reset()/step() contract used by algorithms (DQN, PPO, etc.) via ReinforcementLearningSession workloads.

Interface contract

Export Environment (named export).
Constructor accepts an optional config object for tunable parameters (must be serializable if run in a worker).
reset() returns the initial state (array/typed array/number list). Should also reset internal episode counters. May be async and is awaited by algorithms.
step(action) returns { state, reward, done } (you may also include info or truncated). May be async and is awaited by algorithms.
Keep the class self-contained and deterministic given the same random seeds/config. Multiple instances may run in parallel (one per Web Worker by default when autovectorization is enabled), or in-thread if workers are disabled/unavailable.

State and action spaces

Discrete actions: document the mapping (e.g., 0=left, 1=idle, 2=right).
Continuous actions: accept numbers/arrays; clamp to safe ranges.
State shape: fixed-length number array; keep ordering consistent and documented.
No matter if you choose to implement your Environment as a discrete or continuous action space, the reset() and step() methods should return the internal state of the environment with consistent data.

Episode termination

Set done=true when reaching terminal conditions (success/failure) or when exceeding maxEpisodeSteps if you enforce a horizon.
Optionally include { truncated: true } in the step result when stopping only due to time-limit.

Minimal skeleton

environment.js

class Environment {
  constructor(config = {}) {
    this.maxEpisodeSteps = config.maxEpisodeSteps ?? 200;
    this.state = null;
    this.steps = 0;
  }

  async reset() {
    this.state = [0, 0]; // replace with your state init
    this.steps = 0;
    return this.state;
  }

  async step(action = 0) {
    // Update state with your dynamics
    const nextState = this.state; // replace with real transition
    this.steps += 1;

    const done = this.steps >= this.maxEpisodeSteps; // or other terminal condition
    const reward = done ? 0 : -1; // customize

    this.state = nextState;
    return { state: nextState, reward, done };
  }
}

export { Environment };

Tips and best practices

Determinism: avoid shared globals; keep per-instance random draws (optionally expose a seed if you add RNG control).
Safety: clamp positions/velocities/actions to avoid NaNs or runaway values; return finite numbers only.
Performance: keep step free of tensor ops; environments should use plain JS math to avoid GPU/CPU overhead.
Parallelism: assume multiple Environment instances may be created (one per worker when useWebWorkers is true); do not rely on singleton state.
Workers: environment code runs in a dedicated Web Worker when autovectorization + useWebWorkers are enabled. Avoid closing over DOM/window; rely on constructor config for inputs and import any dependencies inside the class.
Observability: document state ordering and action meanings in comments; add maxEpisodeSteps to prevent infinite episodes.
Async: reset/step can be async; algorithms already await them.
Fallbacks: if workers are unavailable or fail, the runtime will automatically run envs in-thread.

Connecting to a Workload

Place the file

Place the file under src/simulations/ and export Environment.

Import and pass to ReinforcementLearningSession

In your training script (e.g., example/rl/my-env-dqn.js), import it and pass it to ReinforcementLearningSession:

example/rl/my-env-dqn.js

import './src/tools/util-polyfill.js';
import * as tf from '@tensorflow/tfjs-node';
import { ReinforcementLearningSession } from './src/sessions/reinforce.js';
import { Environment as MyEnv } from './src/simulations/my_env.js';
import { DQNAlgorithm } from './src/algorithms/dqn.js';

const model = tf.sequential();
model.add(tf.layers.dense({ units: 64, activation: 'relu', inputShape: [STATE_SIZE] }));
model.add(tf.layers.dense({ units: ACTION_COUNT, activation: 'linear' }));
model.compile({ optimizer: tf.train.adam(1e-3), loss: 'meanSquaredError' });

const envConfig = { maxEpisodeSteps: 300 };
const algoConfig = { numActions: ACTION_COUNT, numEpisodes: 500, logInterval: 20 };
const autoConfig = { enabled: true, maximumEnvironments: 8 };

const session = new ReinforcementLearningSession(
  ACTION_COUNT,
  800,
  3,
  64,
  1,
  1,
  model,
  MyEnv,
  envConfig,
  DQNAlgorithm,
  algoConfig,
  autoConfig
);

Keep constants aligned

Keep STATE_SIZE and ACTION_COUNT aligned with your environment’s state length and action space.

By following this pattern you can drop in new environments (gridworlds, physics toys, games) and train them with any existing compatible algorithm.

PreviousAutovectorization NextAlgorithm API

Last updated 1 month ago

hashtagLearn by doing?

hashtagWriting a Custom Environment

hashtagInterface contract

hashtagMinimal skeleton

hashtagTips and best practices

hashtagConnecting to a Workload

hashtagPlace the file

hashtagImport and pass to ReinforcementLearningSession

hashtagKeep constants aligned