Environment API

Learn by doing?

Check out our real CartPole demo as an example of how to write a custom Environment. You'll use the exact same APIs.

Writing a Custom Environment

This guide shows how to add your own RL environment for FedDDL. Environments export a class named Environment that implements a minimal reset()/step() contract used by algorithms (DQN, PPO, etc.) via ReinforcementLearningSession workloads.

Interface contract

  • Export Environment (named export).

  • Constructor accepts an optional config object for tunable parameters (must be serializable if run in a worker).

  • reset() returns the initial state (array/typed array/number list). Should also reset internal episode counters. May be async and is awaited by algorithms.

  • step(action) returns { state, reward, done } (you may also include info or truncated). May be async and is awaited by algorithms.

  • Keep the class self-contained and deterministic given the same random seeds/config. Multiple instances may run in parallel (one per Web Worker by default when autovectorization is enabled), or in-thread if workers are disabled/unavailable.

State and action spaces

  • Discrete actions: document the mapping (e.g., 0=left, 1=idle, 2=right).

  • Continuous actions: accept numbers/arrays; clamp to safe ranges.

  • State shape: fixed-length number array; keep ordering consistent and documented.

  • No matter if you choose to implement your Environment as a discrete or continuous action space, the reset() and step() methods should return the internal state of the environment with consistent data.

Episode termination

  • Set done=true when reaching terminal conditions (success/failure) or when exceeding maxEpisodeSteps if you enforce a horizon.

  • Optionally include { truncated: true } in the step result when stopping only due to time-limit.

Minimal skeleton

Tips and best practices

  • Determinism: avoid shared globals; keep per-instance random draws (optionally expose a seed if you add RNG control).

  • Safety: clamp positions/velocities/actions to avoid NaNs or runaway values; return finite numbers only.

  • Performance: keep step free of tensor ops; environments should use plain JS math to avoid GPU/CPU overhead.

  • Parallelism: assume multiple Environment instances may be created (one per worker when useWebWorkers is true); do not rely on singleton state.

  • Workers: environment code runs in a dedicated Web Worker when autovectorization + useWebWorkers are enabled. Avoid closing over DOM/window; rely on constructor config for inputs and import any dependencies inside the class.

  • Observability: document state ordering and action meanings in comments; add maxEpisodeSteps to prevent infinite episodes.

  • Async: reset/step can be async; algorithms already await them.

  • Fallbacks: if workers are unavailable or fail, the runtime will automatically run envs in-thread.

Connecting to a Workload

1

Place the file

Place the file under src/simulations/ and export Environment.

2

Import and pass to ReinforcementLearningSession

In your training script (e.g., example/rl/my-env-dqn.js), import it and pass it to ReinforcementLearningSession:

3

Keep constants aligned

Keep STATE_SIZE and ACTION_COUNT aligned with your environment’s state length and action space.

By following this pattern you can drop in new environments (gridworlds, physics toys, games) and train them with any existing compatible algorithm.

Last updated