Imagine using the Internet as a blind person.
As an occasional web-developer, I had some awareness of the importance of accessibility for the web, but to be honest it was pretty superficial. You just add ALT tags to your images, make sure you can tab between all the controls on the page, and a screen-reader will sort out the rest, right?
I went to an event in London a couple of weeks ago, where the reality was brought home to me.
Screen-readers are not as intelligent or as helpful as I’d assumed. They just read out everything on the page.
Imagine a typical modern web app… for example, facebook. Start reading everything on the page, from the top left of the page, and carry on until you reach the bottom right. Imagine what that might be like.
The best analogy I can think of is to try and picture the worst possible automated phone menu experience. The sort of one where they read you a long list of almost-unintelligible options: “for blah-blah-blah, press 1, for blather-blather-blather, press 2, for something-or-other, press 3 …. for something-else-vague, press 9 …”
None of the options seem like an exact match for the task that you have in mind, and by the time you’ve got to the end, you can’t remember whether the option that sounded sort of vaguely similar was option 3 or option 4…
Imagine that for a web page. Apparently, a screen-reader can take three or four minutes to read out the contents of a typical web page today. Can you imagine an automated phone system that spent four minutes listing your options, then expected you to try and choose which one you wanted?
That’s the experience that many blind people face when trying to use modern web apps that we take for granted.
ALT tags are all well and good, but making a web page accessible isn’t the same as making it usable.
So… as geeks with a passion for technology and an interest in making the web useful to all, what can we do?
The Conversational Internet
That was the question posed by the RLSB – a society for the blind – at an event they hosted in London a couple of weeks ago.
The demo that they showed was a video of someone trying to perform a task on the web using a screen reader. Never mind the futurology concept bit towards the end for the moment – the important bit is in the first 01:30, with the demo of the problem.
They had a clear perspective of the problem, and a general vision of what could help.
They described it as “The Conversational Internet”. The question was whether it could be possible for blind users to interact with the web at a higher, much more task-oriented level.
Instead of being read out an endless list of links and fields on a page as options, why can’t a user just say what it is that they want to do? Clearly inspired by the promise in the marketing for the iPhone 4S’s Siri interface, the vision is kinda compelling.
I go to facebook, but no longer have to try and navigate to the text entry box for making a status update by tabbing through every entry field in turn until I get to one that sounds like it’s the right one. Instead I just say that I want to update my status.
Or I go to a website for my local leisure centre, and no longer have to listen to a list of every type of service and activity they provide. Instead I just ask what activities are available in my area at the weekend.
Is this possible? Can we do this?
This was the discussion that RLSB hosted – with representatives from tech companies like Google, Samsung, Cisco and IBM, consumer-facing businesses like the Post Office and RBS, academics from Universities like Queen Mary and Brighton, charities for the blind like RLSB and Vision Charity, and others.
My notes from the evening aren’t very complete, and I was remiss at noting who said what, so apologies to everyone that I will now fail to attribute, or just entirely misrepresent.
This was a very brief meeting. It was an opportunity to introduce the problem to us, and give us a short chance to bounce around some initial reactions. So these ideas were not fully-formed or thought through.
Many of ideas discussed seemed, to me, to fit into one of three general approaches:
Automated natural language analysis of web pages
Can automated natural language analysis interpret both a user’s request, and the contents of a web page, in order to perform the requested task without needing the user to choose from a list of options? Think of how IBM Watson demonstrated the potential of interpreting questions in, and identifying answers from, natural unstructured text – but on a much smaller and more focused scale.
Custom task support using underlying APIs
What about ignoring the web page altogether and using the underlying APIs that many web apps make available nowadays to build a conversational interface? Think of how the iPhone’s Siri has been programmed to provide a voice interface to the core phone apps like messaging and reminders, by being programmed to use the same underlying APIs that the UI apps use. Think of the sort of thing that zypr are doing.
Web developers providing support for a conversational interface
What about using more semantic markup on web pages so that screen-readers could be more intelligent?
Or, what about using the existing VoiceXML standard, and making it a standard alternate interface to web pages? We already have an accepted norm that web pages can define alternate representations in the metadata – such as specifying the location of an RSS or ATOM feed. So we already have an existing way to specify the location of a VoiceXML service. And this is a long-standing interface for defining voice-based interactions.
What if web developers used VoiceXML to define an alternate voice-interaction method for tasks that their site supports? Then a screen-reader could see when this was available, and use this instead of reading the contents of the page.
There were a lot of pros and cons discussed for all approaches (Remind me to hide when cameras appear, next time!).
The more automated approaches require the least rework by web developers, and so are less dependent on large-scale support to succeed. However, they are likely very technically complex. The less automated approaches seemed to be more technologically manageable, but would rely on enough web developers supporting the approach in order to be viable.
A number of hybrid approaches were discussed that could potentially take the best elements of each. For example, an automated approach could be prototyped to be good enough to work with a select number of web apps. If it can work enough to demonstrate the concept for a dozen most-used web apps, maybe this demonstration could help drive the enthusiasm and support necessary to encourage broader adoption?
RLSB are looking to pull together a group to properly look into the area. What is already possible? What has already been done but could be brought together or even just better signposted to users? What technologies are emerging in the next few years that we could start trying out now?
I’ve tried to explain what I understood from the meeting I attended, but I’m not writing this on behalf of IBM or RLSB’s group. I’m writing this as an individual who thinks this is an exciting and ambitious idea with potential applications that go far beyond accessibility for the blind, and I’m writing this to find out what you think.
If you think you can help, let them know!