Environment API
Learn by doing?
Check out our real CartPole demo as an example of how to write a custom Environment. You'll use the exact same APIs.
Writing a Custom Environment
This guide shows how to add your own RL environment for FedDDL. Environments export a class named Environment that implements a minimal reset()/step() contract used by algorithms (DQN, PPO, etc.) via ReinforcementLearningSession workloads.
Interface contract
Export
Environment(named export).Constructor accepts an optional
configobject for tunable parameters (must be serializable if run in a worker).reset()returns the initial state (array/typed array/number list). Should also reset internal episode counters. May beasyncand is awaited by algorithms.step(action)returns{ state, reward, done }(you may also includeinfoortruncated). May beasyncand is awaited by algorithms.Keep the class self-contained and deterministic given the same random seeds/config. Multiple instances may run in parallel (one per Web Worker by default when autovectorization is enabled), or in-thread if workers are disabled/unavailable.
State and action spaces
Discrete actions: document the mapping (e.g.,
0=left, 1=idle, 2=right).Continuous actions: accept numbers/arrays; clamp to safe ranges.
State shape: fixed-length number array; keep ordering consistent and documented.
No matter if you choose to implement your Environment as a discrete or continuous action space, the
reset()andstep()methods should return the internal state of the environment with consistent data.
Episode termination
Set
done=truewhen reaching terminal conditions (success/failure) or when exceedingmaxEpisodeStepsif you enforce a horizon.Optionally include
{ truncated: true }in the step result when stopping only due to time-limit.
Minimal skeleton
Tips and best practices
Determinism: avoid shared globals; keep per-instance random draws (optionally expose a seed if you add RNG control).
Safety: clamp positions/velocities/actions to avoid NaNs or runaway values; return finite numbers only.
Performance: keep
stepfree of tensor ops; environments should use plain JS math to avoid GPU/CPU overhead.Parallelism: assume multiple
Environmentinstances may be created (one per worker whenuseWebWorkersis true); do not rely on singleton state.Workers: environment code runs in a dedicated Web Worker when autovectorization +
useWebWorkersare enabled. Avoid closing over DOM/window; rely on constructor config for inputs and import any dependencies inside the class.Observability: document state ordering and action meanings in comments; add
maxEpisodeStepsto prevent infinite episodes.Async:
reset/stepcan be async; algorithms alreadyawaitthem.Fallbacks: if workers are unavailable or fail, the runtime will automatically run envs in-thread.
Connecting to a Workload
Last updated