Vocal Accessibility Part 3: A critique of PAS 901:2025, and testing

In this post, I’m presenting a critique of PAS 901:2025 Vocal accessibility in system design. This follows on from the Part 2 post, where I quoted some of the content from the Code of Practice, particularly Part 5, System Design.

I need to preface the critique with two points. First, PAS 901:2025 is a really important document, the result of a lot of hard work, and the first time anyone has tried to describe and specify what Vocal Accessibility should entail. This critique is written in the spirit of wanting to take this valuable work and build on it. And second, I believe that PAS 901:2025 was developed with a design mindset, whereas I with my Accessibility Consultant hat on, view it more through a testing and auditing lens.

Video: Vocal Accessibility Part 3, on Vimeo.

Critique 1: not open

The first critique concerns openness in all its forms. PAS 901:2025 is published as a DRM-protected PDF through the BSI shop, where you need to register an account, add it to your shopping basket and checkout. It costs £0.00 (GBP). It is copyright “© The british Standards Institution 2025”, and carries the statement or warning “No copying without BSI permission except as permitted by copyright law”.

As the Code of Practice is published as a PDF and is behind a paywall, it is not possible to deep link to different sections. By contrast, this is something that is straightforward with, for example, the Web Content Accessibility Guidelines, where you can link to the whole standard, one principle, a [guideline][wmcag:g-1.1] or an individual Success Criterion (W3C license).

It is worth noting that the PDF is untagged, and therefore not accessible to those using assistive technologies such as screen readers.

Finally, the process and governance around developing PAS 901:2025 was not open. There was no opportunity to feedback on early drafts of the work, and as far as I know, the first that the wider community was aware was when it was published in March 2025.

All of the above lead, I believe, to siloed thinking, the inability to freely share, blog, create bug reports, improve and re-use PAS 901:2025 (including, self-censorship). Overall, this is likely to restrict usage and uptake. <!–

Restricted access & copyright (BSI “shop” £ 0)
PDF, not HTML (without deep-linking) (+ untagged PDF!)
Closed governance. Consequences:
Siloed thinking
Not free* to share, blog, create bug reports **, improve, re-use… b) “self-censorship”
Likely to restrict usage and uptake. –>

Critique 2: sub-clauses not prioritized (too big!)

The second criticism concerns prioritization and the scale of PAS 901:2025. There are 60 sub-clauses in section 5 of the Code of Practice, with no indication if any of them should be considered more or less important than the others. This is similar in scale to the 57 Success Criteria in WCAG 2.2 Level AA. However, WCAG tries to cover all disabilities and digital accessibility considerations (except voice/speech), while PAS 901:2025 only covers vocal accessibility.

This leads to a few questions and consequences. Do individual organizations have to research and assign priorities, given that it is hard to share information (openness)? Would an audit of a speech-enabled system need to test all 60 sub-clauses, plus WCAG? (A big undertaking.) Again, this is likely to restrict usage and uptake. <!–

Section 5 has 60 sub-clauses (similar scale to WCAG 2.2 A+AA, 57)
No concept of level A, AA, AAA. Consequences:
Different orgs have to research/assign priorities.
Would an audit need to test all 60 sub-clauses (+WCAG)?
Likely to restrict usage and uptake. –>

Critique 3: objectivity / testable?

The third critique concerns objectivity and testability. Taking the Web Content Accessibility Guidelines as an example again, while there are grey areas such as informative versus decorative images, there are also many Success Criteria that are objectively testable, and have thresholds and other quantitative criteria. Examples, include Contrast (minimum), Reflow and Three Flashes or Below Threshold.

By contrast, sub-clause “5.4.16 Support for conversational interactions – Interrupting a user” in PAS 901:2025 talks about “sufficient time”, without specifying or suggesting what the time should be, and “5.5.3 Fairness” states that “Systems should behave fairly with regard to protected characteristics”, without defining what “fairness” means (fairness is hopefully the ultimate aim of all accessibility work, including vocal accessibility — I suggest that it is too high level).

I should note that there are sub-clauses that appear to be objectively testable, for example, “5.1.1 Individualization”, which talks about multi-modal presentation and operation. Whether a system is multi-modal is hopefully a binary decision. <!–

Can sub-clauses be objectively tested?
Was testability/ measurability considered during writing/dev.?
Thresholds/ timeouts/ numbers?
Example, “5.5.3 Fairness - Systems should behave fairly with regard to protected characteristics” Consequences:
1. Testing/ audits would be difficult,
2. Likely to restrict usage and uptake. –>

How to test?

When I presented at TechShare Pro 2025 colleagues were keen that I included a practical discussion of how to conduct testing for vocal accessibility. This very much remains a work-in-progress, but here are some thoughts.

As noted above, there are sub-clauses that can be tested in a binary (pass/fail) manner. Analysis is required to determine which sub-clauses can be objectively tested without further work.

There are sub-clauses, as highlighted above, which require further research. Sub-clause 5.4.16, which was referenced above, talks about “sufficient time”. We can envisage this being achieved by giving the user the option to customise the recognition timeout(s) for a speech-enabled system. User research can help us determine a practical range of timeouts to offer the user.

A number of sub-clauses deal with accuracy of speech recognition, and this throws up an interesting challenge, namely that testing recognition requires a “user input”. This is fundamentally different to things like testing colour contrast for WCAG. Contrast can be tested using software and algorithms, without a user being present to perceive the colours.

So, how are we to test speech recognition? Does it require accessibility consultant professionals to “mimic” a range of non-typical speech, including stammers/stutters? No, this is clearly unacceptable, as well as being unfeasible.

One can envisage a bank of “standardized” audio samples that could be used for repeatable testing. Questions arise around how to collect samples, managing access, and data protection. We would need to guard against things like un-authorized voice cloning. We could collaborate with groups like the Speech Accessibility Project at the University of Illinois Urbana-Champaign. We would need to ensure that data used for testing is not the same as that used for training by software vendors (Train, Test, Validation split).

There are unresolved questions relating to the range of speech-types and dis-fluencies that we can test with, how to produce the “minor” and “significant” failures of speech recognition that are mentioned in sub-clauses in PAS 901:2025 (5.5.5, 5.5.6, and so on).

And, there is always an alternative to an audit-type approach in the form of diverse user-testing.

That concludes my critique of PAS 901:2025, and our discussion of testing challenges and methods.