# Transformers

{% hint style="info" %}
Note: Transformers.js is only accessible for inference currently. If you wish to train a model, check out the [Model Training](/for-research/workload-types/model-training.md) page
{% endhint %}

{% hint style="warning" %}
This workload does not support batching
{% endhint %}

Many people find it necessary to run a diverse set of models, such as text-to-speech, speech-to-text, image classification and more. A semi-complete list of workloads can be found [here](https://github.com/xenova/transformers.js?tab=readme-ov-file#tasks), but implementing a new, unsupported workload is relatively simple. For small and medium sized models, this workload is perfect!

We use [Transformers.js](https://github.com/xenova/transformers.js?tab=readme-ov-file) to facilitate ONNX inference.

## Feature Support

All types of model are supported, except, most notably:

* Large LLMs (above 1B parameters, use WebLLM instead)
* Large diffusion models (most are too large)
* Video classification

Check the full list of supported and unsupported model types [here](https://huggingface.co/docs/transformers.js/pipelines).

## Use of experimental software

We use a [transformers.js beta version](https://github.com/xenova/transformers.js/pull/545) to enable a WebGPU option. Note that it may cause unintended behavior, and if you do not need it, we can switch your project back to the stable version of transformers.js on request.

## Example

Let's run OpenAI's whisper model on two types of speech: one obfuscated and spoken quickly, and a clear presidential speech. Note that load\_options contains `device: webgpu`, as transformers.js defaults to the CPU (WASM) runtime, which can cause OOM errors and slower inference on large models.

```javascript
// speech to text with whisper

import { markAllTasksDone } from "../modules/tools.js";

let prompts = [
    "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav",
    "https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav"
];
const responses = [];

function shaderResponse() {
    return prompts[Math.floor(Math.random() * prompts.length)];
}

// this function will be called when the user responds with outputs
function handleOutputs(prompt, outputs) {
    console.log("User responded with response", outputs);

    prompts = prompts.filter((p) => p !== prompt);
    responses.push({ prompt, outputs });
    if (prompts.length === 0) {
        // upload to your remote server here
        console.log("All prompts are completed", ...responses);
        markAllTasksDone();
    }
    console.log("Prompts left", prompts);
}

export default {
    type: "transformers",
    action: "speech to text",
    officialName: "transformers-testuniversity-test3",
    organization: "Test University",
    hooks: {
        shaderResponse,
        handleOutputs
    },
    payload: {
        // many other translation models are supported: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TranslationPipeline
        model: "Xenova/whisper-tiny.en",
        task: "automatic-speech-recognition",
        // we're using the alpha from https://github.com/xenova/transformers.js/pull/545
        load_options: {
            device: 'webgpu',
            //dtype: 'fp16',
        },
        runtime_options: {}
    }
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.obitmc.com/for-research/workload-types/transformers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
