Transformers
(pretrained ONNX model interface connected to Huggingface)
This workload does not support batching
Many people find it necessary to run a diverse set of models, such as text-to-speech, speech-to-text, image classification and more. A semi-complete list of workloads can be found here, but implementing a new, unsupported workload is relatively simple. For small and medium sized models, this workload is perfect!
We use Transformers.js to facilitate ONNX inference.
Feature Support
All types of model are supported, except, most notably:
Large LLMs (above 1B parameters, use WebLLM instead)
Large diffusion models (most are too large)
Video classification
Check the full list of supported and unsupported model types here. If the model type you want to run is unsupported, consider direct ONNX inference
Use of experimental software
We use a transformers.js beta version to enable a WebGPU option. Note that it may cause unintended behavior, and if you do not need it, we can switch your project back to the stable version of transformers.js on request.
Example
Let's run OpenAI's whisper model on two types of speech: one obfuscated and spoken quickly, and a clear presidential speech. Note that load_options contains device: webgpu
, as transformers.js defaults to the CPU (WASM) runtime, which can cause OOM errors and slower inference on large models.
// speech to text with whisper
import { markAllTasksDone } from "../modules/tools.js";
let prompts = [
"https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav",
"https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav"
];
const responses = [];
function shaderResponse() {
return prompts[Math.floor(Math.random() * prompts.length)];
}
// this function will be called when the user responds with outputs
function handleOutputs(prompt, outputs) {
console.log("User responded with response", outputs);
prompts = prompts.filter((p) => p !== prompt);
responses.push({ prompt, outputs });
if (prompts.length === 0) {
// upload to your remote server here
console.log("All prompts are completed", ...responses);
markAllTasksDone();
}
console.log("Prompts left", prompts);
}
export default {
type: "transformers",
action: "speech to text",
officialName: "transformers-testuniversity-test3",
organization: "Test University",
hooks: {
shaderResponse,
handleOutputs
},
payload: {
// many other translation models are supported: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TranslationPipeline
model: "Xenova/whisper-tiny.en",
task: "automatic-speech-recognition",
// we're using the alpha from https://github.com/xenova/transformers.js/pull/545
load_options: {
device: 'webgpu',
//dtype: 'fp16',
},
runtime_options: {}
}
}
Last updated