The Conversational Internet

Imagine using the Internet as a blind person.

As an occasional web-developer, I had some awareness of the importance of accessibility for the web, but to be honest it was pretty superficial. You just add ALT tags to your images, make sure you can tab between all the controls on the page, and a screen-reader will sort out the rest, right?

I went to an event in London a couple of weeks ago, where the reality was brought home to me.

Screen-readers are not as intelligent or as helpful as I’d assumed. They just read out everything on the page.

Imagine a typical modern web app… for example, facebook. Start reading everything on the page, from the top left of the page, and carry on until you reach the bottom right. Imagine what that might be like.

The best analogy I can think of is to try and picture the worst possible automated phone menu experience. The sort of one where they read you a long list of almost-unintelligible options: “for blah-blah-blah, press 1, for blather-blather-blather, press 2, for something-or-other, press 3 …. for something-else-vague, press 9 …

None of the options seem like an exact match for the task that you have in mind, and by the time you’ve got to the end, you can’t remember whether the option that sounded sort of vaguely similar was option 3 or option 4…

Imagine that for a web page. Apparently, a screen-reader can take three or four minutes to read out the contents of a typical web page today. Can you imagine an automated phone system that spent four minutes listing your options, then expected you to try and choose which one you wanted?

That’s the experience that many blind people face when trying to use modern web apps that we take for granted.

ALT tags are all well and good, but making a web page accessible isn’t the same as making it usable.

So… as geeks with a passion for technology and an interest in making the web useful to all, what can we do?

The Conversational Internet

That was the question posed by the RLSB – a society for the blind – at an event they hosted in London a couple of weeks ago.

The demo that they showed was a video of someone trying to perform a task on the web using a screen reader. Never mind the futurology concept bit towards the end for the moment – the important bit is in the first 01:30, with the demo of the problem.

They had a clear perspective of the problem, and a general vision of what could help.

They described it as “The Conversational Internet”. The question was whether it could be possible for blind users to interact with the web at a higher, much more task-oriented level.

Instead of being read out an endless list of links and fields on a page as options, why can’t a user just say what it is that they want to do? Clearly inspired by the promise in the marketing for the iPhone 4S’s Siri interface, the vision is kinda compelling.

I go to facebook, but no longer have to try and navigate to the text entry box for making a status update by tabbing through every entry field in turn until I get to one that sounds like it’s the right one. Instead I just say that I want to update my status.

Or I go to a website for my local leisure centre, and no longer have to listen to a list of every type of service and activity they provide. Instead I just ask what activities are available in my area at the weekend.

Is this possible? Can we do this?

Some ideas

This was the discussion that RLSB hosted – with representatives from tech companies like Google, Samsung, Cisco and IBM, consumer-facing businesses like the Post Office and RBS, academics from Universities like Queen Mary and Brighton, charities for the blind like RLSB and Vision Charity, and others.

My notes from the evening aren’t very complete, and I was remiss at noting who said what, so apologies to everyone that I will now fail to attribute, or just entirely misrepresent.

This was a very brief meeting. It was an opportunity to introduce the problem to us, and give us a short chance to bounce around some initial reactions. So these ideas were not fully-formed or thought through.

Many of ideas discussed seemed, to me, to fit into one of three general approaches:

Automated natural language analysis of web pages

Can automated natural language analysis interpret both a user’s request, and the contents of a web page, in order to perform the requested task without needing the user to choose from a list of options? Think of how IBM Watson demonstrated the potential of interpreting questions in, and identifying answers from, natural unstructured text – but on a much smaller and more focused scale.

Custom task support using underlying APIs

What about ignoring the web page altogether and using the underlying APIs that many web apps make available nowadays to build a conversational interface? Think of how the iPhone’s Siri has been programmed to provide a voice interface to the core phone apps like messaging and reminders, by being programmed to use the same underlying APIs that the UI apps use. Think of the sort of thing that zypr are doing.

Web developers providing support for a conversational interface

What about using more semantic markup on web pages so that screen-readers could be more intelligent?

Or, what about using the existing VoiceXML standard, and making it a standard alternate interface to web pages? We already have an accepted norm that web pages can define alternate representations in the metadata – such as specifying the location of an RSS or ATOM feed. So we already have an existing way to specify the location of a VoiceXML service. And this is a long-standing interface for defining voice-based interactions.

What if web developers used VoiceXML to define an alternate voice-interaction method for tasks that their site supports? Then a screen-reader could see when this was available, and use this instead of reading the contents of the page.

There were a lot of pros and cons discussed for all approaches (Remind me to hide when cameras appear, next time!).

The more automated approaches require the least rework by web developers, and so are less dependent on large-scale support to succeed. However, they are likely very technically complex. The less automated approaches seemed to be more technologically manageable, but would rely on enough web developers supporting the approach in order to be viable.

A number of hybrid approaches were discussed that could potentially take the best elements of each. For example, an automated approach could be prototyped to be good enough to work with a select number of web apps. If it can work enough to demonstrate the concept for a dozen most-used web apps, maybe this demonstration could help drive the enthusiasm and support necessary to encourage broader adoption?

What now?

RLSB are looking to pull together a group to properly look into the area. What is already possible? What has already been done but could be brought together or even just better signposted to users? What technologies are emerging in the next few years that we could start trying out now?

I’ve tried to explain what I understood from the meeting I attended, but I’m not writing this on behalf of IBM or RLSB’s group. I’m writing this as an individual who thinks this is an exciting and ambitious idea with potential applications that go far beyond accessibility for the blind, and I’m writing this to find out what you think.

If you want to know what they really think, find the RLSB at www.rlsb.org.uk/2011/conversation and on twitter at @RLSBcharity.

If you think you can help, let them know!

Tags: , ,

8 Responses to “The Conversational Internet”

  1. Rob Hayden says:

    Interesting topic

    Seems to me that you could also cater for other physical disabilities with an approach that allowed voice control of defined tasks, navigation and

    It’s a bit surprising that after 20 years we don’t have standards to define the inter- and intra-page navigation, page and site level tasks, search. A first step should be to standardise those in a way which can be layered over existing sites without recoding. Automation can come later.

  2. dale says:

    It’s a good point… relying on a web developer to use a sensible tab order doesn’t seem like enough, does it?

    It still fundamentally relies on the user to try and build a mental model of the page structure, and keeping that in your head as you navigate through the items one tab at a time. I’m amazed that people manage at all.

  3. David Colbourn says:

    An API approach is good so long as you know the target. A Semantic markup will need several context predicates and some commonly understood taxonomy which may be an education issue but the idea of bringing the human machine interface closer to humans goes against this pre knowledge on the part of the human. That is just training under another guise.
    The heavy lifting needs to be on the machine side and I think this means AI self learning or self calibrating models i.e. cognitive systems for relations. Can templates be shared by regional dialect and demographic sure and Siri is a great start but even they are reaching into personalized user results. Rather than reinventing the wheel over and over the Open Agent Architecture may seem the way to go for specialties but that will get back to nomenclature and educational requirements. A personalized assistant that learns will be required if the interface is to be move closer to the average human but the problems of the blind leading the blind or unknown unknown will still need a way of linking a personalized assistance and an open agent or specialized vocabulary. We may not be able to get out of the teaching business and have the system do it all but we are moving in that direction.

  4. […] written about this before but here is a […]

  5. […] I had the idea for “Conversational Internet” bouncing around my head. I’d been to an event hosted by a charity for the blind, RLSB, last December, where they’d pitched the vision, and there’d been a meeting and emails since then. By […]

  6. Alastair Somerville says:

    The issue of inability to skim visually and pick relevant information is much wider than visual impairment.

    As websites have become more graphically complex it is also an issue for people with cognitive impairments.

    The logical, sequential nature of most existing assistive technology is a problem as it is neither context nor user interest aware.

    I am not a web accessibility person (I work in accessible informationa and tactile design) but concepts like ARIA seem interesting.

    The ability to provide meta-signposts to users so thay can find relevant information more easily.

    A complete review of how assistive technology discusses and then seeks information out would be a good thing too and that is the conversational web you discuss.

  7. dale says:

    That’s an interesting perspective – I hadn’t thought of it like that. Thanks!

  8. […] year ago, I wrote about RLSB’s event which brought together a handful of representatives from tech companies, consumer-facing […]