Skip to content

DRAFT : Response to WHAT

Irfan Ali edited this page Nov 9, 2022 · 8 revisions

Thank you for a very helpful joint meeting during TPAC in Vancouver. APA is highly encouraged by our conversation and the outcomes. One of those specific next steps follows below. But, first let me ask you to also thank Mike Smith, Simon Peters, and the other WHAT participants who joined us. I would cc them--but I don't have current emails.

So, in this email, we'd like to confirm your recommendation regarding specifics relating to our work on a normative approach for TTS the generated output that can be relied on to produce consistent results across multiple operating environments and user agents. This is the work of our Spoken Presentation Task Force whose home wiki page is here:

https://www.w3.org/WAI/APA/task-forces/pronunciation/

Also, the explainer document defines a standard mechanism to allow content authors to include spoken presentation guidance in HTML content. This document contains two identified approaches and enumerates their advantages and disadvantages.

https://w3c.github.io/pronunciation/explainer/

This was our second topic in Vancouver, under the title "Spoken Presentation" as logged here:

https://www.w3.org/2022/09/13-apa-minutes.html#t03

APA would like to request WHAT consider elevating SSML to a status in HTML parallel to that currently provided for SVG. We believe this would be the most direct and productive approach for our various accessibility use cases, and we believe it would also be beneficial for non-accessibility use cases.

Our analysis indicates there are exactly 4 elements defined both by HTML and SSML for which we'd need to define disambiguation. We believe the first question should be to confirm our list. Is it correct? Or are there others? Once confirmed, we could take up what we might do to resolve the overlap.

The 4 overlapping elements are:

  • sub
  • p
  • mark
  • sub

In addition to that, I would also add:

  • Audio is an overlapping element. The expected behavior/use is quite different, as are the attribute sets.
  • desc is a child element of ssml audio, that also overlaps with the SVG desc (since we are talking about first-class citizens of HTML)
  • While not name conflicts, one could argue the author's confusion with semantics.

Here we can consider:

  • ssml emphasis and HTML em
  • ssml break and HTML br

Also consider that SVG and MathML are typically inline or block content islands, whereas common SSML usage (at least in our educational applications) is styling spoken presentation of individual words, typically occurring inline. What concerns me is the content bloat and authoring challenge of wrapping individual words with multiple elements to properly form the SSML markup. I see this occurring a lot with MathML in HTML when used to “style” a variable occurring in the text.

Please advise your suggestion for the next steps. Shall we log a formal GitHub request for SSML in HTML noting the above 4 pain points?