What is the feature detection story for MediaStreamTrack support? #126

beaufortfrancois · 2025-01-08T11:35:39Z

As a web developer, I wish there was a way to feature-detect before calling start(audioTrack) following #118 changes. This is not possible for now, even after calling it.

May I suggest some ideas?

1. Rename `start(MediaStreamTrack audioTrack)` - bikeshedding is welcome

     undefined start();
-    undefined start(MediaStreamTrack audioTrack);
+    undefined startWithMediaStreamTrack(MediaStreamTrack audioTrack);

if ('startWithMediaStreamTrack' in webkitSpeechRecognition.prototype) {
  // MediaStreamTrack is supported by the Web Speech API.
}

2. Tie MediaStreamTrack support to other properties

It may be appropriate to assume some of the changes to the Web Speech API will come with MediaStreamTrack support. I'm not sure though which ones yet. Maybe SpeechRecognitionMode mode from #122?

if ('mode' in webkitSpeechRecognition.prototype) {
  // MediaStreamTrack is supported by the Web Speech API.
}

FYI @evanbliu

The text was updated successfully, but these errors were encountered:

padenot · 2025-01-08T14:24:56Z

setSource(MediaStreamTrack), then start() would also allow detection.

evanbliu · 2025-01-08T19:14:16Z

#2 sounds like a pretty straightforward option that doesn't require any changes to the shape of the API. In Chromium, on-device speech recognition support and MediaStreamTrack support are both behind the same feature flag, so they will launch at the same time. This feature association isn't necessarily guaranteed by other browsers in the future though.

beaufortfrancois · 2025-01-09T09:22:14Z

This feature association isn't necessarily guaranteed by other browsers in the future though.

This worries me indeed.

setSource(MediaStreamTrack), then start() would also allow detection.

If we go down this road, would it make sense to have a source attribute reflected as well?

Moreover, the spec should say whether setSource(MediaStreamTrack) can be called after start() or not. I believe it should not be possible to call it successfully after speech recognition has started.

evanbliu · 2025-01-09T18:45:24Z

We could have a source attribute, but then it would make the setSource(MediaStreamTrack) redundant as it would only be needed for feature detection purposes. What do you all think about renaming start(MediaStreamTrack) to startWithMediaStreamTrack(MediaStreamTrack)?

beaufortfrancois · 2025-01-10T09:43:50Z

I'm fine with either Option 1 startWithMediaStreamTrack(MediaStreamTrack) modulo name, or a source attribute only as described below.

3. Add a readonly attribute `source`

partial interface SpeechRecognition : EventTarget {
    readonly attribute (MediaStreamTrack or undefined) source;
}

if ('source' in webkitSpeechRecognition.prototype) {
  // MediaStreamTrack is supported by the Web Speech API.
  
  const recognition = new webkitSpeechRecognition();
  recognition.start(myAudioTrack);
  console.log(recognition.source); // MediaStreamTrack myAudioTrack
}

Note that if we expect more sources support, we'll have the same issue next time to detect whether "newSource" is supported or not.

padenot · 2025-01-13T17:28:11Z

Can we not use the standard isConfigSupported pattern ?

beaufortfrancois · 2025-01-14T09:17:13Z

Can we not use the standard isConfigSupported pattern ?

The following seems a bit too much to me but why not?

4. Add static `isSourceSupported(source)`

dictionary SpeechRecognitionSourceSupport {
  boolean supported;
  SpeechRecognitionSource source;
};

typedef (MediaStreamTrack or undefined) SpeechRecognitionSource;

partial interface SpeechRecognition : EventTarget {
  static Promise<SpeechRecognitionSourceSupport> isSourceSupported(SpeechRecognitionSource source)
};

if ('isSourceSupported' in webkitSpeechRecognition.prototype) {
  const {supported} = await webkitSpeechRecognition?.isSourceSupported(myAudioTrack);
  if (supported) {
    // MediaStreamTrack is supported by the Web Speech API.
  }
}

beaufortfrancois · 2025-01-20T07:43:50Z

gentle ping

padenot · 2025-01-20T09:57:46Z

This is extensible, potentially providing an avenue to offer offline transcription ("faster-than-realtime") in the future, by adding a third SpeechRecognitionSource, that could be a ReadableStream.

MediaStreamTrack are always tied to a clock domain, be it of an audio device or system clock, and cannot work faster than real-time.

Considering the speed at which my voice-to-text transcription experiments (with various software and hardware) have been running, and the ongoing rate of progress of software and hardware, I think this is a good feature to consider.

beaufortfrancois · 2025-01-20T10:34:21Z

One thing I just realized is that isSourceSupported(myReadableStream) would throw a TypeError right away now IIUC because of typedef (MediaStreamTrack or undefined) SpeechRecognitionSource. Is that what we want or is it going to fail properly and reject with supported = false?

beaufortfrancois · 2025-01-20T14:27:44Z

I prototyped this in Chromium to see how this would work with https://chromium-review.googlesource.com/c/chromium/src/+/6179415/1 and it looks like Blink bindings are catching this type error before we can actually return a promise with supported: false:

await webkitSpeechRecognition.isSourceSupported(undefined);
// > {source: undefined, supported: true}
await webkitSpeechRecognition.isSourceSupported(1);
// > Uncaught TypeError: Failed to execute 'isSourceSupported' on 'SpeechRecognition': 
// > The provided value is not of type '(MediaStreamTrack or undefined)'.

Is this pattern still good enough?

try {
  const { supported } = await webkitSpeechRecognition.isSourceSupported(myAudioTrack);
  if (supported) {
    // MediaStreamTrack is supported by the Web Speech API.
  } else {
    throw Error('"¯\_(ツ)_/¯"');
  }
} catch(error) {
  // MediaStreamTrack is NOT supported by the Web Speech API.
}

padenot · 2025-01-20T14:31:54Z

Try to model this after Web Codecs, this works: https://w3c.github.io/webcodecs/#audiodecoder-interface.

beaufortfrancois · 2025-01-20T14:44:36Z

This model uses dictionary strings instead of "typedef (MediaStreamTrack or undefined)".
Would it be better this way?

dictionary SpeechRecognitionSource {
  required DOMString source; // "mediastreamtrack" or "microphone"
};

dictionary SpeechRecognitionSourceSupport {
  boolean supported;
  SpeechRecognitionSource source;
};

partial interface SpeechRecognition : EventTarget {
  static Promise<SpeechRecognitionSourceSupport> isSourceSupported(SpeechRecognitionSource source)
};

if ('isSourceSupported' in webkitSpeechRecognition.prototype) {
  const {supported} = await webkitSpeechRecognition?.isSourceSupported({source: "mediastreamtrack"});
  if (supported) {
    // MediaStreamTrack is supported by the Web Speech API.
  }
}

beaufortfrancois · 2025-01-22T07:32:06Z

@padenot @evanbliu What do you think?

padenot · 2025-01-22T10:23:33Z

This is in line with what I see in API "around" our API here, such as media decoding and encoding, and WebRTC, thanks a lot for taking the time to revise the proposal multiple times. Here are some simple change to this interface: lower-case strings is indeed the recommended way to do "enums" on the Web, but we can make it explicit, e.g. we'd have:

enum SpeechRecognitionSource {
  "microphone",
  "mediastreamtrack",
  "streams" // future extension w/ WhatWG Streams
}

dictionary SpeechRecognitionOptions {
  // Do we need more members here?
  required SpeechRecognitionSource source;
};

dictionary SpeechRecognitionSourceSupport {
  boolean supported;
  SpeechRecognitionOptions options;
};
partial interface SpeechRecognition : EventTarget {
  static Promise<SpeechRecognitionSourceSupport> isSourceSupported(SpeechRecognitionOptions options)
};

if we foresee more options to be added (otherwise, we can cut a layer). I'm thinking of a language identifier, or a model size, etc. Do we also want to add provision besides supported, e.g. "supported but we'd have to download a big model" ? I'm not entirely sure if this is warranted.

I understand it's a bit verbose, but I think the verbosity isn't unwarranted in this instance:

asynchronicity is necessary, because it is possible to need e.g. cross-process calls or checking the presence of a model on disk before answering
using an enum allows extensibility in the future

beaufortfrancois · 2025-01-22T14:38:49Z

Non-recognized enum values will still throw TypeError though right?

Uncaught TypeError: Failed to execute 'isSourceSupported' on 'SpeechRecognition': Failed to read the 'source' property from 'SpeechRecognitionOptions': The provided value 'foo' is not a valid enum value of type SpeechRecognitionSource.

To be on the safe side, I'm happy to add SpeechRecognitionOptions. See https://chromium-review.googlesource.com/c/chromium/src/+/6179415/2

evanbliu · 2025-01-22T21:07:06Z

enum SpeechRecognitionSource {
"microphone",
"mediastreamtrack",
"streams" // future extension w/ WhatWG Streams
}

dictionary SpeechRecognitionOptions {
// Do we need more members here?
required SpeechRecognitionSource source;
};

dictionary SpeechRecognitionSourceSupport {
boolean supported;
SpeechRecognitionOptions options;
};
partial interface SpeechRecognition : EventTarget {
static Promise isSourceSupported(SpeechRecognitionOptions options)
};

IMO, while this approach offers the most extensibility, it seems a little over-engineered for future use cases that may never materialize and comes at the cost of a more complicated API surface. My preference would be for a simpler approach with either the readonly attribute or the new startWithMediaStreamTrack name. Given the historical rate of progress on the Web Speech API, I'm skeptical about if and when we'll need to extend the functionality of this API. But if we do want to add new functionality with additional sources and options, we may want to consider developing a new, modernized speech recognition API with Promises instead of callbacks. I've filed issue #130 to track this discussion.

asynchronicity is necessary, because it is possible to need e.g. cross-process calls or checking the presence of a model on disk before answering
This issue tracks the feature detection of MediaStreamTrack support, which can function independently of on-device speech recognition. Offline/on-device speech recognition support is tracked separately in Issue #108.

PR #132 updates the specification to define the method for detecting on-device speech recognition availability as asynchronous.

What do you all think?

beaufortfrancois · 2025-01-23T07:11:31Z

The WebIDL may seem complex at first glance, but from a web developer's perspective, it's actually quite straightforward. I recommend using either startWithMediaStreamTrack() or the latest version of isSourceSupported(). Hopefully, we can reach an agreement on this soon.

padenot · 2025-01-23T08:00:41Z

Non-recognized enum values will still throw TypeError though right?

I was wrong indeed, and I checked, and other spec use a DOMString, sorry about that. e.g. https://www.w3.org/TR/mediacapture-streams/#webidl-1647796506, [[kind]] is fairly close to what we're doing here.

Firefox would implement faster-than-real-time transcription rather quickly, I assume, so I would still sit on the side of an extensible API.

beaufortfrancois · 2025-01-23T08:28:48Z

Let me summarize this then. We either go with the following startWithMediaStreamTrack():

partial interface SpeechRecognition : EventTarget {
    undefined startWithMediaStreamTrack(MediaStreamTrack audioTrack);
}

Or future-proof isSourceSupported() (WIP CL):

dictionary SpeechRecognitionOptions {
  required DOMString source; // "mediastreamtrack", "microphone", or "streams" in the future
};

dictionary SpeechRecognitionSourceSupport {
  boolean supported;
  SpeechRecognitionOptions options;
};

partial interface SpeechRecognition : EventTarget {
  static Promise<SpeechRecognitionSourceSupport> isSourceSupported(SpeechRecognitionOptions options)
};

padenot · 2025-01-23T09:10:52Z

I support the second API, thanks for shepherding this through.

beaufortfrancois · 2025-01-23T10:26:46Z

@eric-carlson Do you have opinions on this?

evanbliu · 2025-01-23T18:58:10Z

I have a slight preference for the first API, but not enough to cause a fuss if you two prefer the second one :)

If we do go with the second one, do we still need it to be asynchronous? Checking if the browser supports MediaStreamTrack (or streams) doesn't necessarily need to be tied to on-device speech recognition, though I suppose making it asynchronous provides browsers with the flexibility to do whatever they want to make that determination.

Spec issue: WebAudio/web-speech-api#126 Change-Id: I07b5551dcd570d32687e13fa0aa62adce16d0a0a Bug: 40286514

youennf · 2025-01-24T07:14:49Z

Usually, we do not tend to add feature detection API if there is a way to do it in JS.
It seems feature detection can be done with the current API by looking at the exceptions start can throw.

For instance, calling start(videoTrack) on a blob iframe which is disallowed to use microphone by permission policy will always throw.
Either it will throw with InvalidStateError in case videoTrack is checked (UA supports the new API), or it will throw with NotAllowedError due to permission policy (UA does not support the new API).

It would be nice to have a more precise algorithm written in the spec, to get the exact order of the checks.
For instance Safari is throwing in case of detached iframe which could be another way to do feature detection.

And it would probably make sense to clarify that microphone permission policy is needed for start() but not for start(audioTrack).

beaufortfrancois · 2025-01-24T10:00:41Z

Thank you for jumping into the discussion @youennf!

I've started playing with assumed exceptions at https://web-speech-mediastreamtrack.glitch.me/issue-126.html and it seems like we have plenty of work to do interop wise if we're going this way ;)

beaufortfrancois · 2025-01-27T09:53:43Z

There's one thing in my opinion that prevents us to use exceptions is that failure due to microphone permission policy does NOT trigger a NotAllowedError DOMException. It actually fires an error event with not-allowed.

It means we have to wait for an amount of time before deciding if an error event happened or not. This is bad in my opinion. The isSourceSupported() method seems cleaner now, and in the future if we want to expose options when starting speech recognition for MediaStreamTrack and maybe ReadableStream.

For info, I've updated https://web-speech-mediastreamtrack.glitch.me/issue-126.html with plenty of tests to discover what we could do. The following screenshot shows Safari, Chrome, and Chrome Canary with MediaStreamTrack support:

And if you didn't want to dig in the code, here's the snippet I used:

(async () => {
  const iframe = document.createElement("iframe");
  iframe.setAttribute("allow", "microphone 'none'");
  iframe.src = URL.createObjectURL(new Blob([], { type: "text/html" }));
  document.body.appendChild(iframe);

  const recognition = new iframe.contentWindow.webkitSpeechRecognition();
  const ac = new AudioContext();
  const mediaStreamDestination = ac.createMediaStreamDestination();
  const audioTrack = mediaStreamDestination.stream.getAudioTracks()[0];

  try {
    recognition.start(audioTrack);

    const errorPromise = new Promise((resolve) => {
      recognition.onerror = ({ error }) => {
        resolve(error);
      };
    });
    const timeoutPromise = new Promise((resolve) => setTimeout(resolve, 1000));
    const errorEvent = await Promise.race([errorPromise, timeoutPromise]);

    if (errorEvent) {
      console.log("Fail! recognition.start(audioTrack) fired error event");
    } else {
      console.log("Success! recognition.start(audioTrack) did not fire error event");
    }
  } catch (error) {
    console.log("Fail! recognition.start(audioTrack) should succeed");
  }
  iframe.parentNode.removeChild(iframe);
})();

padenot · 2025-01-27T11:54:40Z

I've started playing with assumed exceptions at web-speech-mediastreamtrack.glitch.me/issue-126.html and it seems like we have plenty of work to do interop wise if we're going this way ;)

w/ my Firefox implementer hat on, if you happen to be in a position in which it's easy for you to list somewhere the differences you see, we're happy to add them to our roadmap, and align. Our implementation has drifted some, and if we're doing work on it as part of local recognition, we might as well make it interoperable.

beaufortfrancois · 2025-01-27T12:06:02Z

I've started playing with assumed exceptions at web-speech-mediastreamtrack.glitch.me/issue-126.html and it seems like we have plenty of work to do interop wise if we're going this way ;)

w/ my Firefox implementer hat on, if you happen to be in a position in which it's easy for you to list somewhere the differences you see, we're happy to add them to our roadmap, and align. Our implementation has drifted some, and if we're doing work on it as part of local recognition, we might as well make it interoperable.

Oh I just found out Firefox had an implementation behind media.webspeech.recognition.enable and media.webspeech.recognition.force_enable preferences! Here are the current results:

FAIL: : recognition.onerror should fire 'not-allowed' error if microphone 'none'
FAIL: : recognition.start(audioTrack) should fail with InvalidStateError if microphone 'none'
FAIL: : recognition.start() should fail with UnknownError if microphone 'none' and detached iframe (this behavior is not in the spec)
FAIL: : recognition.start() should fail with UnknownError if detached iframe (this behavior is not in the spec)
SUCCESS: recognition.start() succeeds
FAIL: : recognition.start() should succeed - error: TypeError: SpeechRecognition.start: Argument 1 does not implement interface MediaStream.
FAIL: recognition.start(undefined) should fail

@padenot What do you think of #126 (comment)?

padenot · 2025-01-27T12:10:00Z

Oh I just found out Firefox had an implementation behind media.webspeech.recognition.enable and media.webspeech.recognition.force_enable preferences! Here are the current results:

It's not enabled for a reason :-). We'll make sure it's compatible before shipping, w/ WPT and all.

@padenot What do you think of #126 (comment)?

As said previously, I support an isSourceSupported approach, or at least more work in this direction.

youennf · 2025-01-27T13:34:26Z

I've started playing with assumed exceptions at https://web-speech-mediastreamtrack.glitch.me/issue-126.html

Thanks for doing this!
That seems like worthwhile input for WPT.
I agree we should try to converge and probably make the spec more algorithmic to clarify the intended behavior.

It means we have to wait for an amount of time before deciding if an error event happened or not.

Not really, we just need to check whether start(videoTrack) will throw InvalidStateError synchronously or not.
start(0) throwing TypeError seems even better to check for start overloading.
If there is no such exception, we can deduce that there is no MediaStreamTrack support.

This approach can probably support streams support detection with a locked ReadableStream.

This feature detection will usually be asynchronous (need to wait for the blob URL iframe being loaded if the current frame can get microphone access).
I am not sure though that this warrants a new API that will become useless as UAs catch up with the spec.

beaufortfrancois · 2025-01-27T13:57:30Z

This approach looks good indeed @youennf!

function isMediaStreamTrackSupported() {
  const iframe = document.createElement("iframe");
  iframe.setAttribute("allow", "microphone 'none'");
  iframe.src = URL.createObjectURL(new Blob([], { type: "text/html" }));
  document.body.appendChild(iframe);

  const recognition = new iframe.contentWindow.webkitSpeechRecognition();

  try {
    recognition.start(0);
    result = false;
  } catch (error) {
    result = error.name == "TypeError";
  } finally {
    iframe.remove();
    return result;
  }
}

Note that non-spec compliant implementation of Firefox also throws a TypeError as they expect a MediaStream, but they will not ship this version.

This approach can probably support streams support detection with a locked ReadableStream.

Would MediaStreamTrack support detection conflict with ReadableStream at some point? Maybe it's a problem for the future only...

youennf · 2025-01-27T14:06:18Z

Would MediaStreamTrack support detection conflict with ReadableStream at some point?

start(canvas.captureStream.().getVideoTracks()[0]) would specifically check for MediaStreamTrack support.

beaufortfrancois · 2025-01-27T14:14:24Z

Would MediaStreamTrack support detection conflict with ReadableStream at some point?

start(canvas.captureStream.().getVideoTracks()[0]) would specifically check for MediaStreamTrack support.

Web Speech supports only audio kinds of MediaStreamTrack but I get what you meant. We would "just" pass a MediaStreamTrack, not 0 eventually.

If @padenot is fine with the outcome, I think* I can close this issue.

beaufortfrancois · 2025-01-29T07:11:54Z

gentle ping @padenot

padenot · 2025-01-29T10:01:46Z

I strongly prefer explicit and clear API that align with what the rest of the Web Platform do, especially around the same space, and considering that it is likely we'll add more feature.

I can live with doing this the hacky way, but it's nowhere near as nice.

youennf · 2025-01-29T10:39:12Z

I strongly prefer explicit and clear API that align with what the rest of the Web Platform do

Do you have example of an API that would be similar to this particular case?

I know that, in many cases, decision has been made to not expose this kind of API as long as there is a way for web pages to feature detect it.

padenot · 2025-01-29T10:45:14Z

Web Codecs isConfigSupported, WebRTC's constrainable, the media element canPlayType, MediaCapabilities are examples of clear APIs that allows one to understand if it is possible to do something on the web platform in the current state of things, related to media, without trying it out and seeing if it fails.

youennf · 2025-01-29T10:54:26Z

These APIs are mostly targeted at exposing what the OS supports: the codecs, the camera capabilities...
The intent is not to expose whether a UA implements a particular API like we are talking here.

As an example, there is no API to tell a web page whether a UA implements https://w3c.github.io/webcodecs/#dom-videodecoderconfig-optimizeforlatency.
It is up to the website to feature detect it (and it can).

I think the situation is different if an OS restriction would disallow UAs to implement MediaStreamTrack+SpeechRecognition support, while allowing to implement MediaStreamTrack and SpeechRecognition independently.

padenot · 2025-01-29T12:31:37Z

These APIs are mostly targeted at exposing what the OS supports: the codecs, the camera capabilities...
The intent is not to expose whether a UA implements a particular API like we are talking here.

An example of feature detection for each of the items I listed (non-exhaustive), to detect if a UA implement something, that have nothing to do with OS support, and everything to do with what a UA at a particular version and on a particular platform support:

Web Codecs's isConfigSupported: used to understand if one can do SVC encoding. In Firefox, it's been implemented on Windows, macOS, not on Linux, not on Android. This is feature detection: nothing prevents us from implementing it on Linux and Android, it's actually underway, but it's not shipped yet. On Chrome, you can use it to understand if you can encode h264 level 6.2: Firefox can, Chrome cannot, Safari cannot. This is feature detection, as it means you can work with 8k video. Nothing blocks anybody from implementing this. This has nothing to do with OS support either, it has to do with prioritization and/or the own policy of each vendor.
contrainable: lots of properties aren't implemented consistently yet across OSes and UA, this is feature detection. One example: you can't constrain on the sample-rate of an input device in Firefox. Nothing prevents us from implementing it.
MediaCapabilities: lots of UA haven't implemented hw decoding and encoding on some combination of platform and hardware, that do support hardware and software decoding. This is feature detection, it's frequent that we implement hw decoding and encoding for new hw and software combination.
canPlayType (and its encoding counterpart isTypeSupported()) : Chrome implements h264-in-webm playback and recording, which is nonsensical, but it's a thing regardless, nobody else does. Same for mkv (vs. just the webm subset). Nothing prevents anybody from implementing those, and so authors use canPlayType for feature detection.

As an example, there is no API to tell a web page whether a UA implements w3c.github.io/webcodecs#dom-
videodecoderconfig-optimizeforlatency.
It is up to the website to feature detect it (and it can).

Talked about at length in this thread, and we agreed to do it explicitly, in another API, because not being able to do it is a problem:
w3c/webcodecs#206 (comment)

I think the situation is different if an OS restriction would disallow UAs to implement MediaStreamTrack+SpeechRecognition
support, while allowing to implement MediaStreamTrack and SpeechRecognition independently.

For the case at hand, recent advances in the field make it generally possible, but require significant engineering effort to integrate well in the Web Browser, and so we expect some delay before it's available everywhere. It can also depend on the hardware and software, so we're very much in the situation you describe. An example would be that a cheap windows laptop is going to have a hard time performing speech recognition in real-time, but can do it offline without issue. As evoked in this thread, a WHATWG Stream API for Speech recognition would then be supported, but not the MediaStreamTrack variant. Alternatively, it can be variable for a single system, that has enabled or disabled its discrete GPU for power efficiency reasons.

To come back to your last point, it's been the case for the longest time that one couldn't implement Web Speech API + MediaStreamTrack, because the OS APIs would only work with the microphone, or because the implementation was relying on calling an external API, increasing the latency in a way that it wouldn't be able to work with MediaStreamTrack.

Case in point, Firefox already has MediaStreamTrack, and implement SpeechRecognition from the microphone only today (behind a flag). We are dependent on the OS APIs for this, so we cannot implement SpeechRecognition + MediaStream together. To solve this, we are forgoing OS APIs and relying on free software and models that we can use.

youennf · 2025-01-29T14:49:00Z

Again, there is no API introduced to tell a web page whether the UA implements optimizeForLatency or not.
What is proposed here is something like: bool isPropertyUnderstoodByIsConfigSupported(DOMString).

This is unneeded as the web page has other ways to know whether optimizeForLatency support is implemented by a UA.

The examples you give above are extension points, which warrant dedicated APIs.
In our case, this is not an extension point, there will be a very restricted set of values, probably 2, maybe 3.

I'd be fine to investigate how to solve this issue in a more generic way: determine overloads, optional method parameters...

BTW, the detection script can probably be further simplified by removing the iframe before calling start(), no need to use the allow trick. If the iframe is detached, it will likely not proceed with a prompt and not throw with TypeError.

beaufortfrancois · 2025-01-30T08:21:03Z

BTW, the detection script can probably be further simplified by removing the iframe before calling start(), no need to use the allow trick. If the iframe is detached, it will likely not proceed with a prompt and not throw with TypeError.

You're right the following detection script works BUT I thought "detached" was not a concept in specs world.

function isMediaStreamTrackSupported() {
  const iframe = document.createElement("iframe");
  iframe.src = URL.createObjectURL(new Blob([], { type: "text/html" }));
  document.body.appendChild(iframe);

  const recognition = new iframe.contentWindow.webkitSpeechRecognition();
  iframe.remove();

  try {
    recognition.start(0);
    return false;
  } catch (error) {
    return error.name == "TypeError";
  }
}

youennf · 2025-01-30T08:36:16Z

Do we need setting iframe.src given we are doing the test synchronously?

youennf · 2025-01-30T08:38:28Z

I thought "detached" was not a concept in specs world.

active might be the right concept. I would not expect start to work in this case. getUserMedia is for instance rejecting explicitly.

beaufortfrancois · 2025-01-30T09:01:26Z

Do we need setting iframe.src given we are doing the test synchronously?

Setting iframe.src can be omitted yes.

I thought "detached" was not a concept in specs world.

active might be the right concept. I would not expect start to work in this case. getUserMedia is for instance rejecting explicitly.

TIL fully active thanks! We should add this to the spec then.

beaufortfrancois · 2025-02-03T07:34:57Z

@youennf @padenot @evanbliu It would be nice to find an agreement on whether we expose a new API to detect source support in the Web Speech API.

Given that web developers can use the following snippet for now, I'm not opposed to delay adding the future-proof isSourceSupported() method. I'll defer to WebAudio folks.

function isMediaStreamTrackSupported() {
  const frame = document.body.appendChild(document.createElement("iframe"));
  const recognition = new frame.contentWindow.webkitSpeechRecognition();
  frame.remove();

  try {
    recognition.start(0);
    return false;
  } catch (error) {
    return error.name == "TypeError";
  }
}

evanbliu · 2025-02-04T00:57:36Z

It's not the most elegant solution, but I'm fine with deferring this for now--we can always add a isSourceSupported() in the future if/when we add new capabilities.

beaufortfrancois · 2025-02-10T07:22:19Z

Shall I close this issue for now?

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jan 23, 2025

Add static SpeechRecognition isSourceSupported()

5b35f2c

Spec issue: WebAudio/web-speech-api#126 Change-Id: I07b5551dcd570d32687e13fa0aa62adce16d0a0a Bug: 40286514

chromium-wpt-export-bot mentioned this issue Jan 23, 2025

Add static SpeechRecognition isSourceSupported() web-platform-tests/wpt#50249

Draft

beaufortfrancois mentioned this issue Jan 27, 2025

start(audioTrack) does not check microphone permissions policy #135

Merged

4 tasks

beaufortfrancois mentioned this issue Feb 5, 2025

Introduce start session algorithm #138

Open

4 tasks

What is the feature detection story for MediaStreamTrack support? #126

What is the feature detection story for MediaStreamTrack support? #126

Comments

beaufortfrancois commented Jan 8, 2025 • edited Loading

1. Rename start(MediaStreamTrack audioTrack) - bikeshedding is welcome

2. Tie MediaStreamTrack support to other properties

padenot commented Jan 8, 2025

evanbliu commented Jan 8, 2025

beaufortfrancois commented Jan 9, 2025 • edited Loading

evanbliu commented Jan 9, 2025

beaufortfrancois commented Jan 10, 2025 • edited Loading

3. Add a readonly attribute source

padenot commented Jan 13, 2025

beaufortfrancois commented Jan 14, 2025

4. Add static isSourceSupported(source)

beaufortfrancois commented Jan 20, 2025

padenot commented Jan 20, 2025

beaufortfrancois commented Jan 20, 2025

beaufortfrancois commented Jan 20, 2025 • edited Loading

padenot commented Jan 20, 2025

beaufortfrancois commented Jan 20, 2025 • edited Loading

beaufortfrancois commented Jan 22, 2025

padenot commented Jan 22, 2025

beaufortfrancois commented Jan 22, 2025 • edited Loading

evanbliu commented Jan 22, 2025

beaufortfrancois commented Jan 23, 2025

padenot commented Jan 23, 2025

beaufortfrancois commented Jan 23, 2025 • edited Loading

padenot commented Jan 23, 2025

beaufortfrancois commented Jan 23, 2025

evanbliu commented Jan 23, 2025

youennf commented Jan 24, 2025 • edited Loading

beaufortfrancois commented Jan 24, 2025 • edited Loading

beaufortfrancois commented Jan 27, 2025 • edited Loading

padenot commented Jan 27, 2025

beaufortfrancois commented Jan 27, 2025 • edited Loading

padenot commented Jan 27, 2025

youennf commented Jan 27, 2025

beaufortfrancois commented Jan 27, 2025

youennf commented Jan 27, 2025

beaufortfrancois commented Jan 27, 2025

beaufortfrancois commented Jan 29, 2025

padenot commented Jan 29, 2025

youennf commented Jan 29, 2025

padenot commented Jan 29, 2025

youennf commented Jan 29, 2025

padenot commented Jan 29, 2025

youennf commented Jan 29, 2025 • edited Loading

beaufortfrancois commented Jan 30, 2025

youennf commented Jan 30, 2025

youennf commented Jan 30, 2025

beaufortfrancois commented Jan 30, 2025

beaufortfrancois commented Feb 3, 2025

evanbliu commented Feb 4, 2025

beaufortfrancois commented Feb 10, 2025

beaufortfrancois commented Jan 8, 2025 •

edited

Loading

1. Rename `start(MediaStreamTrack audioTrack)` - bikeshedding is welcome

beaufortfrancois commented Jan 9, 2025 •

edited

Loading

beaufortfrancois commented Jan 10, 2025 •

edited

Loading

3. Add a readonly attribute `source`

4. Add static `isSourceSupported(source)`

beaufortfrancois commented Jan 20, 2025 •

edited

Loading

beaufortfrancois commented Jan 20, 2025 •

edited

Loading

beaufortfrancois commented Jan 22, 2025 •

edited

Loading

beaufortfrancois commented Jan 23, 2025 •

edited

Loading

youennf commented Jan 24, 2025 •

edited

Loading

beaufortfrancois commented Jan 24, 2025 •

edited

Loading

beaufortfrancois commented Jan 27, 2025 •

edited

Loading

beaufortfrancois commented Jan 27, 2025 •

edited

Loading

youennf commented Jan 29, 2025 •

edited

Loading