Web technologies I saw at W4A


Last month I went to the International World Wide Web Conference for w4a. I saw a lot of cool web technologies and accessibility projects while I was there, so thought I would share links to some of the more interesting bits.

There are too many to put in a single post, so I’ll write a few posts to cover them all.


Subtitles and transcripts came up a few times. One study presented looked at online video, comparing single-line subtitle captions overlaid on the video with multi-line off-screen transcripts adjacent to it.

It examined which is more effective from a variety of perspectives, including readability, reader enjoyment, the effect on understanding and so on. In summary, it found that overlaid captions are generally better, although transcripts are better for content which is more technical.

Real-time transcription from a stenographer at W4A

We had subtitles for all the talks and presentations. Impressively, a separate screen projected a live transcription of the speaker. For deaf attendees, it allowed them to follow what the speaker was saying. For talks given in Portuguese, the English subtitles allowed non-Portuguese speakers like me to understand.

They did this by having live stenographers listening to an audio feed from the talks. This is apparently expensive as stenography is a skilled expertise, and it needs to be scheduled in advance. It’s perhaps only practical for larger conferences.

Legion Scribe

This was the motivation for one of the more impressive projects that I saw presented : Legion Scribe, which crowd-sourced real-time captioning so that you wouldn’t need an expert stenographer.

Instead, a real-time audio stream is chopped up into short bits, and divided amongst a number of people using Mechanical Turk. Each worker has to type the short phrase fragment they are given. The fragments overlap, so captions that each worker types can be stitched back together to form captions for the whole original audio stream.

All of this is done quickly enough to make the captions appear more or less in real-time.

Seriously impressive.

And they’re getting reasonable levels of coverage and accuracy. The system has been designed so that workers don’t need to be experts in the domain that they’re transcribing, as they’re only asked to type in a few words at a time not whole passages. With enough people, it works. If they have at least seven workers, it’s approaching the coverage you can get with a professional stenographer.

Assuming that Mechanical Turk can provide a plentiful supply of workers, then this would not only be cheaper than a stenographer, but also let you start captioning at a moments notice, rather than needing to arrange for a stenographer in advance.

Map Reduce in the browser

Speaking of crowd-sourcing, the idea of splitting up a large computing task between a large number of volunteer computers isn’t new. SETI@home is perhaps the best known, while World Community Grid is a recent example from IBM.

But these need users to install custom client software to receive the task, perform it and submit the results.

One project showed how this could be done in web browsers. A large computing task is divided up into map reduce jobs, which are made available through a website. Each web browser that visits the website becomes a map reduce worker, running their task in the background using web workers. As long as the user remains on the site, their browser can continue to contribute to the overall task in the background, without the user having had to install custom client software.

It’s an elegant idea. Not all sites would be well suited to it, but there are plenty of web sites that I keep open all day (e.g. GMail, Remember The Milk, Google Calendar, etc.) so I think the idea has potential.

Migrating browser sessions

An interesting project I saw showed how the state of a browser app could be migrated from one browser to another, potentially a different browser running on a different machine even a different platform.

This is more than just the client-server session, which you could migrate by transferring cookies. They’re transferring the entire state of dynamic AJAX-y pages: what bits are open, enabled, and so on, for any arbitrary web app.

Essentially, they started by wanting to be able to serialize the contents of window, so that it could transferred to another browser where it could be used to restore from.

That wouldn’t be enough. window doesn’t have access to local variables in functions, it wouldn’t have access to most event listeners such as those added with addEventListener, it wouldn’t have access to the contents of some HTML5 tags like canvas, it wouldn’t have access to events scheduled with setTimeout or setInterval, and so on.

Serializing window gets you the current state of the DOM which is a good start, but not sufficient to transfer the state for most web apps.

A prototype system called Imagen shows how this could be done. Looking at how they’ve implemented it, they’ve had to resort to using a proxy server which intercepts JavaScript going to the browser and instruments it with enough additional calls to let them access all of the stuff that wouldn’t normally be in scope. This is enough for them to be able to serialize the entire state of the page.

I can see a lot of uses for this, such as in testing, debugging or service scenarios, as well as just the convenience of being able to resume work in progress as you move between devices.

Inferring constraints on REST API query parameters

Many web services include constraints and dependencies for the query parameters. For example: “this option is always required”, “that parameter is optional”, or “you have to specify at least one of this or that”. For example, the twitter API docs explain how you have to specify a user_id or screen_name when requesting a user timeline.

One project I saw was an attempt to automatically infer these rules and dependencies through a combination of natural language processing to recognise them in API documentation, and automated source code analysis of sample code provided for web services. It combines these into an estimated model of the constraints in the REST APIs, which are then verified by submitting requests to the API.

They demonstrated it on APIs like twitter, flickr, last.fm, and amazon, and it was surprisingly effective.


Finally, there was a keynote talk on Wednesday by the founder of duolingo.

Captcha is particularly interesting because it uses a task that people need to do anyway (verify that they’re human) to crowd-source the completion of a task that needs to be done (digitise the text of old books that cannot be read by automated OCR).

Duolingo is similar. It takes a task that people need to do, which is to learn a new language, and uses that effort to translate texts into different languages.

It’s better explained by their demo video.

It’s been around for a little while, but I’d not come across it before. Since getting back from www, I’ve been trying it out. Even Grace has been using it to improve her French and seems to be getting on really well with it.

What else?

There were a lot of other cool projects and technologies that I saw, so I’ll follow this up with another post or two to share some more links.