by Alexandre Alapetite on 2013-01-17; updated on 2013-01-21

Multimodality with the Web speech API

This is a demonstration of basic multimodality, with the combination of mouse and speech input, using the Web Speech API Specification.

This is inspired by Bolt’s “Put-that-there” concept from 1980, Google’s demonstration of free-text speech recognition from 2013, and a similar demonstration I made in 2006 using another voice API.



Hit the Speak button, move your mouse, and say: (red|green|blue|yellow|orange|black|pink) (square|circle).
The corresponding shape will be generated at your mouse’s or finger’s position.

Technical notes

This demonstration is currently using a free-text speech recognition engine, and will be updated to use a grammar for better speech recognition rates when the specification gets produced.

Be sure to use the HTTPS version of this page to avoid repetitive warnings about microphone access.


Video of how it should work (in French):