The skills implications of Cognitive Computing

STEMtech is a conference about the education of science, technology, engineering and maths. The attendees are an interesting mix of people from education and policy makers, as well as people like me from industry.

This year, they invited me to do a talk. My slides are shared but they’ll make no sense by themselves. What follows is roughly what I think I said.

Today I’m going to be talking about how we teach children to use computers. I’m not going to be talking about the current provision for this. I’m not going to be talking about what we necessarily need to be teaching children today, or maybe even tomorrow – partly because that’s already being well covered in the rest of the agenda today.

Instead, I’d like to use this session to think a little further ahead.

Big changes are coming in computing, and I think we should start thinking about how we’ll need to respond to them.

big trak. I loved this thing.

We used to have this at school when I was a kid, and I think I remember those lessons more strongly than any other lesson I did at school.

It had a keypad on the back. We’d program in a series of instructions – drive forwards this much, turn this much, fire the laser!

This was computing to me when I was young. This was robotics. This was cool.

Not all schools used big trak, but a lot of schools used something like this.

Variations of logo robot turtles have been widely used by many schools for many years, and are essentially the same thing – programming a series of instructions into a robot that follows them by driving around on the floor.

Fast forward to today, and I get to spend my Monday afternoons running a Code Club that I started in my local primary school: an after-school programming group. I get just an hour or two a week to see how kids learn about computers, what they think of them, how they approach them.

We use Scratch in Code Club, just as the school are now doing in lessons. It gives a visual, drag-and-drop way to build a sequence of instructions, which are followed by sprites moving around on the screen. This is how we explain programming and computers to children.

What strikes me doing this is how similar it is to what I was doing as a kid. It’s not exactly the same, but if you squint a bit and stand back a bit, it feels pretty similar. They’re both about showing children how to break an activity down into a series of steps.

And that’s to be expected. Programming itself is very similar today to what it was when I was at school. In some ways programming today is the same as it was five, ten, twenty years ago – it’s still about getting machines to perform tasks by getting people to define the sequence of instructions to follow. It’s only natural that this will be reflected in the way we introduce this to children.

But this is going to change.

We’re starting to see a future of computing that is going to be different, and we’re going to need new metaphors and new ways of introducing it and explaining it.

Computers have changed before.

We group the evolution of computers into eras, and they start in the 1800s in the era of tabulating machines.

I’m talking about systems like punched card machines.

Machines that could count more things, more quickly and more accurately than people possibly could.

Machines that were considered revolutionary because they enabled the US census in 1890 to be analysed before they needed to start the 1900 census.

Machines like the Hollerith tabulating machine were the computers of their day. This was the state of the art in technology.

Teaching computing then would’ve meant teaching about the capabilities of these machines. I don’t just mean the mechanics of feeding the cards into the machine, although that would’ve been part of it. I mean about looking at problems as collections of things that can be counted.

The era of tabulating machines was followed by the era of programmable machines, starting around about the 1940s with machines like Colossus.

Shifting between eras isn’t instant. We didn’t just turn off all the punched card machines one day and start using programmable computers. It was gradual, there was overlap – both in terms of there being a time when we were using both kinds of machine, but also in the way that some early programmable computers included elements of the types of systems that came before. We transitioned over time from one way of thinking about computers to another.

What was different about programmable computers was that you didn’t have to just give the data to a machine to process, you could also give it the instructions that you wanted it to carry out on that data.

When I think of early programmable computers, I think of the machines of the 1950s and 1960s. Huge machines that would fill a room.

But this is still the era we’re in today. Our computers today are faster, and smaller, and more powerful – and better in so many other ways. But fundamentally, architecturally, conceptually – our computers today work to the same principles of these early systems.

We think the third era of computing will be the era of cognitive machines. In the same way that the transition to programmable computers wasn’t instant, neither will this be. So it’s debatable whether we’re already in an era of cognitive machines, whether we’re starting to see early signs of it, or whether it’s something that we’re anticipating. Regardless of the exact date it starts, this is what we think will characterize the next generation of computers.

I should clarify what I mean by cognitive computing, because it’s perhaps not a term that has got mainstream awareness yet.

I’ve got a couple of examples to help me explain it.

What is 2+2?

If I give you the instruction or the question 2+2, what answer would you give me?

I assume that most of you would answer 4. Two plus two equals four.

That’s certainly the answer I’d expect from a programmable system – a system that has been hard-coded with instructions to follow, including instructions for how to handle adding numbers together.

But what if my question was in the context of social sciences, in a discussion about family structures. Then I probably would’ve been talking about the family structure of two adults and two children.

Or in the context of automotive engineering, I probably would’ve been talking about a layout of car seats with two front seats and two back seats.

Or in the context of card games, I might’ve meant a poker hand or a poker strategy.

The response I would’ve wanted will have been dependent on the context that my request was in. And this is the kind of behaviour that we would expect from a cognitive computer.

A programmable computer needs to be coded with the instructions to follow and the answer to return, and will always return that answer. Programmable computers are deterministic in this way. A calculator will always give me ‘4’, no matter how many times I ask 2+2.

Cognitive computers will be more probabilistic. They’ll likely return a panel of possible answers instead of a single answer, each one associated with some level of probability that it’s the right response. And this won’t be fixed, but will take the context of the question into account.

By context, I don’t just mean the situational context. It’s also about a knowledge of the things mentioned.

Consider this example, and what you think it means.

Policeman helps dog bite victim.

You’d probably assumes that this means that there is a dog bite victim.

And the policeman helps him.

Policeman helps dog bite victim.

You probably wouldn’t assume that it means that the policeman helped the dog to bite the victim.

But that’s a valid way to parse the sentence.

So that’s not all that you do – you use your knowledge of the things mentioned in the sentence to handle the ambiguity of the English language. You know that the police help people. You know it’d be unusual for a policeman to bite someone.

You use this knowledge to choose between the different possible interpretations of the sentence.

This is also the behaviour we’d expect from a cognitive computer.

Systems that interact with us in our own language. Instead of us having to use machine languages or machine interfaces to work with computers, we’ll work with cognitive computers in our own languages – languages like English.

Systems that are probabilistic rather than deterministic in the way they work and the answers that they return.

Systems that take context into account, and learn how to apply their knowledge to identify the most likely responses.

In 1997, an IBM computer beat the grandmaster Garry Kasparov at a game of chess. We did that as a demonstration of the progress that we’d made in a technical field (in that case, things like massively parallel computing) but also as a way of explaining it’s potential.

We did something similar to demonstrate and explain the progress that we’re making with cognitive computing, this time by entering a computer into a TV quiz show.

Jeopardy is a TV quiz show in the US – a few contestants, buzzing in when they know the answers to questions, and winning cash prizes. It’s less well-known in the UK, but it’s huge in the US – a show that’s been going since the 1960s.

They ask difficult questions. Complex, sometimes cryptic questions, with a variety of grammatical forms.

In 2011, we entered an IBM computer called Watson as a contestant. This was our first attempt at building a cognitive computer, and we wanted to show how it was different.

Watson went up against Brad Rutter and Ken Jennings – two of the best players to have ever gone on Jeopardy. These guys are household names in the US because of their performance on this show – these were the Garry Kasparov’s of the TV quiz show world.

Watson had to compete in the same way any contestant would. This wasn’t a search engine returning hundreds of thousands of possible documents. It needed to understand complex specific questions, and be able to come up with a single specific answer in seconds to be the first to the buzzer.

Some examples of the kinds of questions it got.

This was actually the final question in the show.

It’s talking about “Dracula” – the answer is Bram Stoker.

This is another Jeopardy question, from a round called “Lincoln Blogs”. The answer is “his resignation” – Chase submitted his resignation to President Abraham Lincoln three times.

But you need an understanding of the question, and the contextual knowledge that a resignation is something you submit, in order to get that.

This is a question about Mount Everest – about who was the first person to climb Mount Everest. But it doesn’t say that. It isn’t a single-clause extractive question like “Who was the first person to climb Mount Everest?” which would be much easier to interpret.

Again, answering this question precisely depends on knowing something about George Mallory and what he is known for, and knowing what kind of thing you might be “first” at in this context.

Another Jeopardy question, this time in a round called “Before and After”.

The answer it’s looking for is “A Hard Day’s Night of the Living Dead”

“A Hard Day’s Night” answering the Beatles bit, and “Night of the Living Dead” being the Romero zombie film.

“Before and After” is the clue that they’re looking for something with that overlap between them.

This question is showing it’s age now, given recent events in Cuba.

But at the time, I think the four countries that the US wasn’t getting along with were Bhutan, Cuba, North Korea and Iran. And the question is asking which of those four countries is furthest north.

Answering this question correctly involves needing to recognise that there is a political element here (which countries does the US not have diplomatic relations with) and then a spatial one (which one is furthest north).

By the end of the shows, Watson had not only won, but with a higher score than the two Jeopardy “grandmasters” combined.

Even within the limited constraints of the TV quiz show format, it gave us an insight into what the future might be like.

It showed what interaction with computers in the future will be like. Watson got the question from the quiz show host, understood it, buzzed in and spoke it’s answer.

The quiz show is just one way to try and explain this.

A more recent example is Chef Watson: a system that is learning about food and cooking, and using that to design new dishes and new meals.

You can give it some constraints, like that you’d like a chicken dish, or that you want something like a stir-fry, or that you’d like something influenced by a cuisine like Chinese. And you can tell it how adventurous and surprising you’d like it to be.

And it will design a new dish for you.

This isn’t about building a search engine for existing recipes.

Chefs from the ICE Culinary Institute are working with Watson to create new recipes.

They’ve trained it by giving it a massive amount of recipes to read. We didn’t manually prescribe types of dishes, types of cuisines and so on – we didn’t prescribe what characteristics we think an Indian dish for example would have. Instead, Watson had to learn that there is a type of cuisine like Caribbean and what that means as it came across a range of Caribbean recipes in what it’s read.

It was also given the chemical descriptions of a wide range of ingredients, and trained with a massive range of experiences of people’s reactions to specific flavour combinations.

Watson learned which combinations people liked, and which they didn’t – and identified connections and patterns in the molecular combinations behind that.

It’s about Watson as an assistant in the kitchen – making suggestions and offering ideas.

It’s helping us improve our understanding of creativity. It’s helping us improve our understanding of what is in involved in being creative (something that we traditionally associate with being a very human thing), and explore opportunities for cognitive computing to help with this.

We’ve used Chef Watson in the same way that we used the quiz show: to help explain the potential behind cognitive computing.

Putting Watson and the Chefs onto a food truck and taking them to conferences and events is a way of making it real. Letting people try the food that they come up with is one way to try and start a conversation how cognitive computing is going to be different.

Similarly, Watson has published a cook book of some of it’s recipes.

It’s kind of fun, and not something that I would’ve expected us to be doing a few years ago.

But we’re starting to find ways to explain cognitive computing through giving people tangible experiences.

Cognitive computing is still a very new and emerging concept, so it’s hard to find a definitive definition of what it means.

But I’ve collected a few examples of how it’s being described in technology, industry and academia.

Forrester have described it as computers that learn, computers that can interact with us, and computers that make evidence-based recommendations.

IBM has talked about the way cognitive systems will transform the way people will interact with computers, and highlighted that these systems will draw on knowledge from massive amounts of data.

Gartner, who describe this as the smart machines era, say that this is going to be most disruptive change in the history of IT, and talk about this enabling things that we didn’t think computers would be able to do.

MIT have talked about the collaborative nature of working with systems like these – like I tried to describe about Chef Watson, this is about systems working with people to create things that neither might have done separately.

The British Computer Society have talked about cognitive computing as systems that learn through experiences instead of following a prescribed sequence of tasks, and highlighted that these will handle massive amounts of information.

Sticking with this British Computer Society paper for a moment, they also highlight that there is a skills gap here.

For programmable systems, we need people who can understand what the overall task a machine needs to achieve, and can identify and describe the specific steps needed to do that.

Working with cognitive systems will be different. We need people who can identify the learning and experiential opportunities that a system will need in order to learn how to achieve the task.

There will be a need for a generation of technologists who can work with systems like this.

Preparing Watson for Jeopardy is a good example of this. We didn’t try to pre-empt what questions might come up on the show and pull together a set of answers to look them up in (not that I think such an approach would be feasible or scalable).

Instead, Watson prepared for Jeopardy by reading and extracting an understanding from a wide range of sources. I don’t mean game-show-specific sources. I don’t mean tabular data, or structured data, or data that has been manually prepared to be machine readable. I’m talking about encyclopaedias and dictionaries, newspapers and magazines, books and much more. Hundreds of sources of text – stuff that has been written for use by people.

And it learned how to use this knowledge by playing Jeopardy matches. Lots and lots of Jeopardy matches, to give it the experiences it needs to learn how to use it’s knowledge, and when to use it’s many hundreds of strategies it has under the covers. Some questions are best handled in these ways, while other questions are better handled in other ways.

Watson learned how by playing the game, and got better through sparring matches with other previous champions from the Jeopardy TV show.

Since the TV show in 2011, a big focus for our work with Watson has been healthcare.

Jeopardy was about taking a wide range of general knowledge sources, letting Watson extract a knowledge from that, and then giving it the experiences necessary to learn how to use that knowledge to do the task of playing a gameshow.

After Jeopardy, we started giving Watson medical sources – text books, journals, research papers, treatment guidelines, medical records, and then working with doctors and clinicians to give Watson the experiences necessary to use that knowledge to support doctors and nurses.

We’ve partnered with cancer centres like Memorial Sloan Kettering and MD Anderson to do this. What we need are partners who understand what the system will need to be able to do, and can work backwards from there to identify what knowledge it will need and what experiences the system will need to have in order to use that knowledge to do it.

In many ways, teaching hospitals are ideal partners for us in this because they do this for their medical students. And the metaphor of Watson going to medical school is one that seems to have stuck. It’s not quick – it’s taking a roughly comparable amount of time that it would take a person to go through medical school.

But it’s working. Watson is being used today, albeit at relatively small scales, by doctors and nurses in the treatment and diagnosis of some of the world’s toughest diseases.

It doesn’t have to be so dramatic though. I think cognitive computing will become a part of all of our lives, not just something exclusive to specialists like doctors.

I went to a conference in Twickenham last year, and one of the talks was about a project for a mobile phone retailer, trialling cognitive computing as a way to answer the questions they get from their customers.

I loved the way that he described what they’re doing – working with Watson, rather than using it. And he described it as being like having a new member of staff to train, and needing to identify what that new member of staff would need to read, and what experiences of customer interactions they could give it to teach it how to support them.

Examples of this are all around us. In the same way that the early programmable systems built on the achievements and techniques that had come before them, cognitive computers are building on years of progress in fields like machine learning and natural language processing.

Google Translate is a great example of this.

You put in some text in one language, and it can translate that into another.

Unlike many of the translation systems that came before it, they didn’t build this just by collecting together linguists and getting them to prescribe the instructions for translating every word.

Instead, they trained a machine learning system to be able to do this, using sources like documents from the United Nations. The UN is a great source for this as they produce a lot of documents, and have to translate them into a wide range of languages for all their member nations.

What you’ve got is a large number of examples that this in one language means that in another.

Cognitive computing will need us to approach problems in this way – not trying to come up with all the answers ourselves, but being able to identify how to give a computer the experiences it will need in order to help.

I said that there is going to be a skills gap here, and we’re already starting to see it in the graduates that join us.

We’re starting to tackle that by working with Universities to introduce modules on cognitive computing into their courses, and giving them access to instances of Watson for use in student projects.

But there will come a point where we need to start introducing it earlier, into colleges and schools.

We need to start thinking about how we explain the computers of the future to children.

We need to think about what is the cognitive computing equivalent of Big Trak.

That experience made computers real to me as a kid. It inspired me. It made the concept and the potential come alive. What is going to do that for cognitive computers?

In the same way that systems like Logo and Scratch have given us the way to let children try out and play with the concepts behind programmable computers, we need a way to do this for cognitive systems.

Scratch has it’s palette of blocks to snap together. What is going to be the metaphor to explain systems that need to trained?

We talk about computers that can think. It’s obviously a metaphor, and is true in many ways but doesn’t hold in others. I’m not trying to imply I think these are going to be systems with a conciousness any time soon.

But how far can we take the metaphor? As we need people who can work with systems that think and learn, this needs to take into account the way that computers learn.

At the risk of pushing the metaphor too far, we need an approach built around the psychology of these emerging systems.

I started by saying that big changes are coming in computing. Tomorrow’s children are going to have amazing, exciting, powerful systems to play with, and they’re going to grow up and use them to achieve fantastic things.

But first we’re going to need to figure out how to get them started.


Unpacking binary data from MQTT in Javascript

While doing trawl of Stackoverflow for questions I might be able to help out with I came across this interesting looking question:

Receive binary with paho mqttws31.js

The question was how to unpack binary MQTT payloads into double precision floating point numbers in javascript when using the Paho MQTT over WebSockets client.

Normally I would just send floating point numbers as strings and parse them on the receiving end, but sending them as raw binary means much smaller messages, so I thought I’d see if I could help to find a solution.

A little bit of Googling turned up this link to the Javascript typed arrays which looked like it probably be in the right direction. At that point I got called away to look at something else so I stuck a quick answer in with a link and the following code snippet.

function onMessageArrived(message) {
  var payload = message.payloadByte()
  var doubleView = new Float64Array(payload);
  var number = doubleView[0];
  console.log(number);
}

Towards the end of the day I managed to have a look back and there was a comment from the original poster saying that the sample didn’t work. At that point I decided to write a simple little testcase.

First up quick little Java app to generate the messages.

import java.nio.ByteBuffer;
import org.eclipse.paho.client.mqttv3.MqttClient;
import org.eclipse.paho.client.mqttv3.MqttException;
import org.eclipse.paho.client.mqttv3.MqttMessage;

public class MessageSource {

  public static void main(String[] args) {
    try {
      MqttClient client = new MqttClient("tcp://localhost:1883", "doubleSource");
      client.connect();

      MqttMessage message = new MqttMessage();
      ByteBuffer buffer = ByteBuffer.allocate(8);
      buffer.putDouble(Math.PI);
      System.err.println(buffer.position() + "/" + buffer.limit());
      message.setPayload(buffer.array());
      client.publish("doubles", message);
      try {
        Thread.sleep(1000);
      } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
      }
      client.disconnect();
    } catch (MqttException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

It turns out that using typed arrays is a little more complicated and requires a bit of work to populate the data structures properly. First you need to create an ArrayBuffer of the right size, then wrap it in a Uint8Array in order to populate it, before changing to the Float64Array. After a little bit of playing around I got to this:

function onMessageArrived(message) {
  var payload = message.payloadBytes
  var length = payload.length;
  var buffer = new ArrayBuffer(length);
  uint = new Uint8Array(buffer);
  for (var i=0; i<length; i++) {
	  uint[i] = payload[i];
  }
  var doubleView = new Float64Array(uint.buffer);
  var number = doubleView[0];
  console.log("onMessageArrived:"+number);
};

But this was returning 3.207375630676366e-192 instead of Pi. A little more head scratching and the idea of checking the byte order kicked in:

function onMessageArrived(message) {
  var payload = message.payloadBytes
  var length = payload.length;
  var buffer = new ArrayBuffer(length);
  uint = new Uint8Array(buffer);
  for (var i=0; i<length; i++) {
	  uint[(length-1)-i] = payload[i];
  }
  var doubleView = new Float64Array(uint.buffer);
  var number = doubleView[0];
  console.log("onMessageArrived:"+number);
};

This now gave an answer of 3.141592653589793 which looked a lot better. I still think there may be a cleaner way to do with using a DataView object, but that’s enough for a Friday night.

EDIT:

Got up this morning having slept on it and came up with this:

function onMessageArrived(message) {
  var payload = message.payloadBytes
  var length = payload.length;
  var buffer = new ArrayBuffer(length);
  uint = new Uint8Array(buffer);
  for (var i=0; i<length; i++) {
	  uint[i] = payload[i];
  }
  var dataView = new DataView(uint.buffer);
  for (var i=0; i<length/8; i++) {
      console.log(dataView.getFloat64((i*8), false));
  }
};

This better fits the original question in that it will decode an arbitrary length array of doubles and since we know that Java is big endian, we can set the little endian flag to false to get the right conversion without having to re-order the array as we copy it into the buffer (which I’m pretty sure wouldn’t have worked for more than one value).

Thinking Digital 2014

This week I went up to Newcastle for Thinking Digital.

It was the seventh Thinking Digital, but my first.

I’d seen a bunch of references to it being the UK’s answer to TED, the tickets aren’t cheap, videos from previous years look slick and professional, it’s held in The Sage which is a hugely impressive venue, they manage to get a great line-up of speakers, and the logistics in the run-up to the event were more organised than any event I’ve been to before.

So… I was expecting a cool and geeky, if faceless, serious, formal, and intimidating event.

I’d read it completely wrong. It’s absolutely a professionally run event. And there was no shortage of cool geekiness. But, more than that, the organizer, Herb Kim, has created a real sense of community in it. There’s a feeling of almost familial warmth amongst attendees who come year after year after year.

And they do it without being too cliquey. Everyone I spoke to was very friendly and welcoming, which made the few days a lot easier for an introvert like me. A few days being surrounded by and trying to talk to and socialise with several hundred smart brilliant people is the kind of thing I normally find hugely draining and more than a little daunting. But the crowd at TDC make it easier than most.

They value their time there, too. More than one person told me they’d paid for their own ticket and expenses to attend. I’m used to corporate-run conferences where everyone is paid for by their employer, or barcamps where people moan about being asked for a five pound deposit, so this surprised me.

The talks made for a fascinating and thought-provoking couple of days. I can’t do them justice here (when videos of the talks are available I’ll embed/link them here instead) but I want to give an idea of what the programme was like.

Jeni TennisonOpen Data Institute
Talked about the potential impact of open data on society, giving examples of how open data could be used to inform and widen access to debate.

Maik MaurerSpritz
Demonstrated their speed-reading technology – streaming one word at a time in a fixed place, for fast reading on mobile and wearable devices.

Gerard GrechTech City
Talked about the role of Tech City as a feedback loop between Government and the tech community.

Meri WilliamsChromeRose
Talked about the lessons that people managers could learn from artificial intelligence in how to inspire, motivate, and enable geeks to achieve great things.

Aral Balkanindie Phone
Gave an impassioned and stirring talk entitled “Free is a Lie” about the conflict between advertising-led business models, and user’s privacy and other interests.

David Griffithsfoam
Talked about using his background in the video game industry to combine crowd-sourcing and gaming to perform impressive citizen science projects.

Chi OnwurahMP for Newcastle Central
Talked about the parallels between technology and politics as driving forces for change, and the aims of the current Digital Government Review.

Mariana MazzucatoUniversity of Sussex
Argued that the image of the private sector as entrepreneurial and public sector as meddling and restrictive is an unhelpful myth and made the case for a bolder, entrepreneurial state.

Erin McKeanWordnik
Talked about the limitations of search as a model for accessing data and the need for discovery engines to find what you don’t know you want.

Blaise Aguera y ArcasGoogle
Described the history of machine intelligence and his predictions about what the future of machine intelligence might look like.

Carl LedbetterMicrosoft
Outlined the history and evolution of digital entertainment, and described the process that went into the design of the XBox One.

Jennifer GardyBC Centre for Disease Control
Described our progress in increasing our understanding of the human genome, and where it’s complexity lies.

Peter Gregson – Cellist
Gave a representation of the genome work that Jennifer had described. Instead of a data visualisation, it was a sonification. Using a cello.

Sean CarassoFalling Whistles
Told an inspiring story of how he came to learn about the terrible things happening in Congo, and how he went about trying to bring peace.

Conrad BodmanThe Barbican
Argued for recognition of the impact of digital tech on the arts, and described his projects to exhibit and showcase video games, animation, and digital effects.

Mark DearnleyHMRC
Described the challenges and need for technology in what HMRC do, and their digital ambition for the future.

Xavier De KestellerFoster + Partners
Talked about an amazing project to build a base on the moon, using autonomous robots with 3D printing heads to print a building out of moon dust.

Susan MulcahyImperial College London
Gave an energetic performance to describe the role of the red blood cell, and the science behind understanding brain injury.

Carlos UlloaHelloEnjoy
Showed what was possible using WebGL, bringing native 3D gaming to the browser without the need for plugins.

Jonathan O’HalloranQuantuMD
Described his work to create a mobile genetic-testing device, and the potential that real-time epidemiology from a mobile device could bring.

Blaise Aguera y ArcasGoogle
Talked about changes needed in society when more jobs are replaced by technology, and his observations about changes in gender dynamics.

Steve MouldBBC
Gave an entertaining talk about how he discovered, and tried to understand the science behind, the bead chain fountain.

Tom ScottUs Vs Th3m
Ended the conference with a fantastic performance showing what the impact of technology might be like in 2030.

Dale LaneIBM
And I did a Watson talk. I really didn’t want it to seem like a sales pitch, so I tried to put it in a bigger context of being a step forwards in changing how we use computers. I talked about why I work on Watson, what motivates and inspires me about it, and why I think what we’re doing is difficult but hopefully valuable. And I walked through a short demo to explain the value I see in where we are even now. Annoying technical issues (Keynote + clicker + multiple screens = fail) aside, it went okay. It was a lot to try and fit into 20 minutes, so I talked fast. :-)

Overall…

It was a fantastic event, and one I’d wholeheartedly recommend.

If you can get to a future Thinking Digital, you absolutely should.

It’s one of the most thought-provoking and interesting couple of days I’ve had in a long time.

.

Full-diclosure: As a speaker, I didn’t have to pay for a ticket to attend this event. My travel and accommodation costs were paid for by IBM.


W4A : Accessibility of the web

This is the last of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several presentations looked at how accessible the web is.

Web Accessibility Snapshot

In 2006, an audit was performed by Nomensa for the United Nations. They reviewed 100 popular websites for conformance to accessibility guidelines.

The results weren’t positive: 97% of sites didn’t meet WCAG level 1.

Obviously, conformance to guidelines doesn’t mean a site is accessible, but it’s an important factor. It’s not sufficient, but it is required. Conformance to guidelines can’t prove that a website is accessible, however there are some guidelines that we can be certain would break accessibility if not followed. So they are at least a useful starting point.

However, 2006 is a long time ago now, and the Internet has changed a lot since. One project, from colleagues of mine at IBM, is creating a more up to date picture of the state of the web. They analysed a thousand of the most popular websites (according to Alexa) as well as a random sampling of a thousand other sites.

(Interestingly, they found no statistically significant difference between conformance in the most popular websites and the randomly selected ones).

Their intention is to perform this regularly, creating a Web Accessibility Snapshot, with regular updates on the status of accessibility of the web. It looks like it could become a valuable source of information.

Assessing accessibility

There was a lot of discussion about how to assess accessibility.

One paper argued there is an over-reliance on automated tools and a lack of awareness of the negative effects of this. They demonstrated a manual review of websites, comparing results with output from six popular tools. Their results showed how few accessibility problems automated tools discover.

Accurately assessing a website against accessibility guidelines doesn’t necessarily mean that you can prove a site is accessible or easy to use.

Some research presented suggests guidelines only cover a little over half of problems encountered by users. Usability studies suggest some websites that don’t meet guidelines may be easier to use than websites that do, as users may have effective coping strategies for (technically) non-compliant sites. This suggests we need a better way of assessing accessibility.

A better approach might be to observe users interact with a website and assess based on their experiences. One tool presented, WebTactics, showed an automated approach to assessing accessibility by observing a user and identifying behaviours they employ.

Another paper detailed how to add accessibility monitoring to a live website by adding additional JavaScript that captures and evaluates mouse clicks and button presses client-side before submitting them to a server for processing. Instead of requiring the user to perform predefined, and perhaps artificial, tasks, they hope to be able to discover tasks implicitly – that common tasks will emerge from the low-level actions that they collect.

Accessibility training

Given that most websites have some sort of accessibility problems, there was some talk about how this could be improved.

One project presented showed training that has been developed to raise awareness of how people with disabilities access the web, and the implications of the accessibility guidelines. It’s a practical course including hands-on assignments, and looks like it could be the sort of thing that could help web developers make a real difference.

Social Accessibility

Another project is using crowd-sourcing to improve web sites that already exist. Social Accessibility, another IBM project, enables volunteers to make web pages more accessible to the visually impaired.

It provides a mechanism for accessibility problems to be gathered directly from visually impaired users. Volunteers are then notified, and can respond using a tool that allows them to externally modify web pages to make them more accessible. It lets them publish metadata associated with the original web page. This can be applied to the web page for all visually impaired users who visit it in future using this tool, so that many users can benefit from the improvement.

cloud4all

Finally, a project called cloud4all is developing a roaming profile that stores your preferences in a way that multiple services can access. The focus is on accessibility – a user can store their accessibility needs in one place, and then interfaces can use this to adapt for them.


Dyslexia at W4A

This is the third of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

There were a few sessions presenting work done to improve understanding of how to better support people with dyslexia.

One interesting study investigated the effect of font size and line spacing on the readibility of wikipedia articles.

This was assessed in a variety of ways, some of which were based on the reader’s opinions, while others were based on measurements made of the reader during reading and of their understanding of the content after. The underlying question (can we make Wikipedia easier to read for dyslexics?) was compelling. It was also interesting to see this performed not on abstract passages of text, but in the context of using an actual website.

Accessibility isn’t just about the presentation but also the content itself. Another study looked at strategies for simplifying text that could make web pages more readable for dyslexic readers.

It compared the effectiveness of two strategies: firstly, providing synonyms on demand – giving a reader a way to be able to request an alternative for any word. The second was providing synonyms automatically – with complex words automatically substituted for simpler equivalents. Again, this was assessed in several ways, such as the speed of reading, the reader’s comprehension, on the reader’s opinion of easiness, on the effort it took (e.g. interpreting facial expression, etc.), on fixation duration measured using eye tracking, and so on.

On a more practical note, there were also tools presented that are being created to help support people with dyslexia.

Firefixia is a Firefox toolbar extension being created by colleagues of mine in IBM. It provides options for users to customise the web page they are looking at, offering modifications that have been demonstrated to make it easier for dyslexic users.

Dyseggxia is an impressive looking iPad game that aims to support children with dyslexia through fun word games.


W4A : Future of screen readers

This is the second of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several of the projects that I saw showed glimpses of a possible future for screen readers.

I’ve written about screen readers before, and some of the challenges with using them.

Interactive SIGHT

One project interpreted pictures of charts or graphs and created a textual summary of the information shown in them.

I’m still amazed at this. It takes a picture of a graph, not the original raw data, and generates sensible summaries of what it shows.

For example, given this image:

It can generate:

This graphic is about United States. The graphic shows that United States at 35 thousand dollars is the third highest with respect to the dollar value of gross domestic product per capita 2001 among the countries listed. Luxembourg at 44.2 thousand dollars is the highest

or

The dollar value of gross domestic product per capita 2001 is 25 thousand dollars for Britain, which has the lowest dollar value of product per capita 2001. United States has 1.4 times more product per capita 2001 than Britain. The difference between the dollar value of gross domestic product per capita 2001 for United States and that for Britain is 10 thousand dollars.

The original version was able to process bar graphs, and was presented to W4A in 2010. What I saw was an extension that added support for line graphs.

Their focus is on the sort of graphics found in newspapers and magazines – informational, rather than scientific graphs. They want to be able to generate a high level summary, rather than a list of plot points that require the user to build a mental model in order to interpret.

For example:

The image shows a line graph. The line graph presents the number of Walmmart’s sales of leather jackets. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1997 to 1999 followed by a falling trend through 2006. The first segment is the rising trend. The rising trend is steep. The rising trend has a starting value of 1890. The rising trend has an ending value of 36840. The second segment is the falling trend. The falling trend has a starting value of 36840. The falling trend has an ending value of 12606.

The image shows a line graph. The line graph presents the number of people who started smoking under the age of 18 in the US. The line graph shows a trend that changes. The changing trend consists of a rising trend from 1962 to 1966 followed by a falling trend through 1980. The first segment is the rising trend. The rising trend is steep. The second segment is the falling trend.

It’s able to interpret an image and recognise trends, recognise how noisy or smooth it is, recognise if the trend changes, and more. Impressive.

Interpreting data in tables

Another project demonstrated restructuring data tables in web pages to make them easier to explore with a screenreader.

They have an interesting approach of analysing an HTML table and reorganising it to make it more accessible, abstracting out complex sections into a series of menus.

For example, given a table such as this:

it can produce navigable menus such as this:

Even quite complex tables, with row and column spans, which would otherwise be quite difficult to interpret if read row-by-row by a screenreader, is made much more accessible.

Capti web player

Another technology I saw demonstrated was the Capti web player.

Tools such as instapaper and read it later have showed that we can take most web pages and extract the body text for the article on the page.

This capability should be ideal for visually impaired users, but the tools themselves are still quite difficult to use and integrate poorly with assistive technologies. Someone described them as obviously “designed by sighted people for sighted people”.

Capti combines this capability with an accessible media player making it easy to navigate through an article, move through a list of articles, and so on. To a sighted user like me, it looked like they’ve mashed together instapaper with an audiobook-type media player. I often listen to podcasts while I go running, and am a heavy user of pocket and Safari’s reading list. So this looks ideal for me.

Multiple simultaneous audio streams

Finally, one fascinating project looked at how to make it quicker to scan large amounts of content with a screenreader to find a specific piece of information. I’ve written before that relying on a screenreader (which creates a sequential audio representation of the information on the page, starting at the beginning and going through the contents) can be tremendously time-consuming, and that it results in visually impaired users taking considerably more time to find information on the web.

This project investigated whether this could be improved by using multiple simultaneous sound sources.

It sounds mad, but they’re starting from observations such as the cocktail party effect – that in a noisy room with several conversations going on, we’re able to pick out a specific conversation that we want to listen for. Or that a student not paying attention in a lecture will hear if a lecturer says something like “this will be on the exam”.

They’re looking at a variety of approaches, such as separating the channels directionally, so one audio stream will sound like it’s coming from the left, while another is in front. Or having different voices, such as different genders, for the different streams. It’s an intriguing idea, and I’d love to see if it could be useful.


Web technologies I saw at W4A

WWW2013

Last month I went to the International World Wide Web Conference for w4a. I saw a lot of cool web technologies and accessibility projects while I was there, so thought I would share links to some of the more interesting bits.

There are too many to put in a single post, so I’ll write a few posts to cover them all.

Subtitles

Subtitles and transcripts came up a few times. One study presented looked at online video, comparing single-line subtitle captions overlaid on the video with multi-line off-screen transcripts adjacent to it.

It examined which is more effective from a variety of perspectives, including readability, reader enjoyment, the effect on understanding and so on. In summary, it found that overlaid captions are generally better, although transcripts are better for content which is more technical.

Real-time transcription from a stenographer at W4A

We had subtitles for all the talks and presentations. Impressively, a separate screen projected a live transcription of the speaker. For deaf attendees, it allowed them to follow what the speaker was saying. For talks given in Portuguese, the English subtitles allowed non-Portuguese speakers like me to understand.

They did this by having live stenographers listening to an audio feed from the talks. This is apparently expensive as stenography is a skilled expertise, and it needs to be scheduled in advance. It’s perhaps only practical for larger conferences.

Legion Scribe

This was the motivation for one of the more impressive projects that I saw presented : Legion Scribe, which crowd-sourced real-time captioning so that you wouldn’t need an expert stenographer.

Instead, a real-time audio stream is chopped up into short bits, and divided amongst a number of people using Mechanical Turk. Each worker has to type the short phrase fragment they are given. The fragments overlap, so captions that each worker types can be stitched back together to form captions for the whole original audio stream.

All of this is done quickly enough to make the captions appear more or less in real-time.

Seriously impressive.

And they’re getting reasonable levels of coverage and accuracy. The system has been designed so that workers don’t need to be experts in the domain that they’re transcribing, as they’re only asked to type in a few words at a time not whole passages. With enough people, it works. If they have at least seven workers, it’s approaching the coverage you can get with a professional stenographer.

Assuming that Mechanical Turk can provide a plentiful supply of workers, then this would not only be cheaper than a stenographer, but also let you start captioning at a moments notice, rather than needing to arrange for a stenographer in advance.

Map Reduce in the browser

Speaking of crowd-sourcing, the idea of splitting up a large computing task between a large number of volunteer computers isn’t new. SETI@home is perhaps the best known, while World Community Grid is a recent example from IBM.

But these need users to install custom client software to receive the task, perform it and submit the results.

One project showed how this could be done in web browsers. A large computing task is divided up into map reduce jobs, which are made available through a website. Each web browser that visits the website becomes a map reduce worker, running their task in the background using web workers. As long as the user remains on the site, their browser can continue to contribute to the overall task in the background, without the user having had to install custom client software.

It’s an elegant idea. Not all sites would be well suited to it, but there are plenty of web sites that I keep open all day (e.g. GMail, Remember The Milk, Google Calendar, etc.) so I think the idea has potential.

Migrating browser sessions

An interesting project I saw showed how the state of a browser app could be migrated from one browser to another, potentially a different browser running on a different machine even a different platform.

This is more than just the client-server session, which you could migrate by transferring cookies. They’re transferring the entire state of dynamic AJAX-y pages: what bits are open, enabled, and so on, for any arbitrary web app.

Essentially, they started by wanting to be able to serialize the contents of window, so that it could transferred to another browser where it could be used to restore from.

That wouldn’t be enough. window doesn’t have access to local variables in functions, it wouldn’t have access to most event listeners such as those added with addEventListener, it wouldn’t have access to the contents of some HTML5 tags like canvas, it wouldn’t have access to events scheduled with setTimeout or setInterval, and so on.

Serializing window gets you the current state of the DOM which is a good start, but not sufficient to transfer the state for most web apps.

A prototype system called Imagen shows how this could be done. Looking at how they’ve implemented it, they’ve had to resort to using a proxy server which intercepts JavaScript going to the browser and instruments it with enough additional calls to let them access all of the stuff that wouldn’t normally be in scope. This is enough for them to be able to serialize the entire state of the page.

I can see a lot of uses for this, such as in testing, debugging or service scenarios, as well as just the convenience of being able to resume work in progress as you move between devices.

Inferring constraints on REST API query parameters

Many web services include constraints and dependencies for the query parameters. For example: “this option is always required”, “that parameter is optional”, or “you have to specify at least one of this or that”. For example, the twitter API docs explain how you have to specify a user_id or screen_name when requesting a user timeline.

One project I saw was an attempt to automatically infer these rules and dependencies through a combination of natural language processing to recognise them in API documentation, and automated source code analysis of sample code provided for web services. It combines these into an estimated model of the constraints in the REST APIs, which are then verified by submitting requests to the API.

They demonstrated it on APIs like twitter, flickr, last.fm, and amazon, and it was surprisingly effective.

duolingo

Finally, there was a keynote talk on Wednesday by the founder of duolingo.

Captcha is particularly interesting because it uses a task that people need to do anyway (verify that they’re human) to crowd-source the completion of a task that needs to be done (digitise the text of old books that cannot be read by automated OCR).

Duolingo is similar. It takes a task that people need to do, which is to learn a new language, and uses that effort to translate texts into different languages.

It’s better explained by their demo video.

It’s been around for a little while, but I’d not come across it before. Since getting back from www, I’ve been trying it out. Even Grace has been using it to improve her French and seems to be getting on really well with it.

What else?

There were a lot of other cool projects and technologies that I saw, so I’ll follow this up with another post or two to share some more links.


Everybody Technology

This afternoon I went to Everybody Technology, an event to discuss the need for technology to be inclusive and made in a way that is “so smart, so simple and so powerful it works for everybody”.

A highlight of the afternoon was Stephen Hawking – perhaps one of the best examples of the power of technology to enable someone to reach their potential. He also supported the event by lending his voice to a promotional video which explains the idea better than I can.


“Who is Technology Made For?” (YouTube)

There were several speakers. I won’t do them justice, but I did jot a few notes…

Panel discussion with Rupert Goodwins (ZDNet UK) & Damon Rose (BBC)

They talked of the stigma of using “special” equipment created especially for the blind. There were examples where even when technology or tools exist that can help, people don’t always want to use them. Maybe because they feel embarrassed, or they don’t want to be different, or even that they’re struggling with feeling forced to join a group of people they don’t feel a part of.

They discussed how it was more acceptable to use technologies when they are “standard” and how some felt more comfortable using technology that doesn’t single them out as being different.

Someone noted how people can be embarrassed wearing a hearing aid to help them hear, whilst few people would be embarrassed to wear glasses to help them see. Why are some assistive technologies more culturally acceptable than others?

There was a lot of mention of iDevices and appreciation of assistive technology being delivered as iPhone apps. To everyone else, it’s an iPhone and doesn’t stand out as being different. In addition, the fact that it’s mass-manufactured has meant that an expensive collection of advanced sensors and processing capability can be made affordable. An equivalent device produced purely as an assistive technology would be prohibitively expensive. The iPhone sparked a smartphone revolution that made this technology affordable in a way that it wasn’t before.

There was also discussion about how the app culture removed barriers between potential users and developers. Affordable sensors and technology made widely available, combined with a low-cost delivery mechanism for software innovations, make possible innovations in assistive technology that would have been impossible a few years ago.

Presentation on accessible architecture by Paul Kalkhoven

This looked at parallels between buildings and software. Disability became accepted as important in architecture and you can’t build a new building without considering accessibility. This isn’t yet true of technology.

He talked of the conflicting interests of design and utility. When designing a building, you want it to be unique and different. However, you want it to be obvious. If you want to find a toilet or fire exit, you want to understand the layout immediately. The same applies to technology: we want to make something new and exciting. But there is an expectation that it should be usable without a manual. It needs to be accessible.

One observation I hadn’t really recognised: transport buildings lead the way for accessible architecture, often abiding by a common, albeit unwritten, set of standards.

He challenged us to consider what technologists could learn from their experience.

Presentation on talking TVs by Mark Vasey (Panasonic)

Voice guidance is included as standard in most new Panasonic TVs, offering text-to-speech guidance for complex TV menus.

Perhaps more interesting was how they made it happen. He talked about challenges such as the cost of development, licensing and royalties for a feature they include “for free”. There were challenges in marketing to a minority, without wanting to classify it as a specialist product, and without making sighted users think that they were paying for a feature they didn’t need or want.

Similar to the discussion of the iPhone’s impact, he explained how the only way they could do this and make it affordable was to make it standard. Making a specialist TV with accessibility features for the visually impaired would not have been affordable. Spreading the cost across their entire product line is what made it possible.


“Introduction to Voice Guidance on Panasonic talking TVs” (YouTube)

Presentation on Threedom Phone – Antony Ribot (Ribot)

Antony gave a thought-provoking presentation about their project to make the world’s simplest smartphone.

The smartphone revolution has been great for many, but isn’t suitable for everyone. For some, the controls are too small, or too fiddly, or just too complicated. What if we made a smartphone that had only three buttons? Could we provide the essential functions that people need on a device with three large, easy to press, easy to understand, buttons?

He had an example with him and made a convincing case that there is a need for a device like this, in a market where devices are racing to get more complicated.

Everybody Technology : rlsb.org.uk/everybody

A year ago, I wrote about RLSB’s event which brought together a handful of representatives from tech companies, consumer-facing businesses, Universities, and charities for the blind. We talked about a vision of a Conversational Internet.

A year later, and RLSB got together a couple of hundred people to talk about projects that had happened – both by them, such as the Conversational Internet prototype that I presented, and by others such as Panasonic’s collaboration with RNIB to produce Voice Guidance.

They talked about what comes next, establishing a new group to bring together technologists and designers with people who understand disabilities, to make real their vision where everyone is taken into consideration.

If you think this is something you can help with, either as a developer, designer, or someone who understands a disability, then why not join them.


Conversational Internet

tl;dr

We’ve built a prototype to show how we could interact with the Internet using a command-driven approach.

  • A screen reader, but one that uses machine learning and natural language processing, in order to better understand both what the user wants to do, and what the web page says.
  • One that can offer a conversational interface instead of just reading out everything on the page.

It’s a proof-of-concept, but it’s an exciting idea with a lot of potential and we’ve got a demo that shows it in action.

The problem : screen readers today

I’ve written about this before but here is a recap.

Visually impaired people can interact with the web using screen readers. These read out every element on a page.

The user has to make a mental model of the structure of the page as it’s read out, and keep this in their head as they arrow-key around the page.

For example, on a news site’s front page, once the screen reader has read out the page, you have to remember if the story you want is the fifth or sixth story in the list so you can tab the right number of times to get to it.

Imagine an automated telephone menu:
“for blah-blah-blah, press 1, for blather-blather-blather, press 2, for something-or-other, press 3 … for something-else-vague, press 9 …”

Imagine this menu was so long it took 15 minutes or more to read.

Imagine none of the options are an exact match for what you want. But by the time you get to the end, you can’t remember whether the closest match was the third or fourth, or fiftieth option.

The vision : a Conversational Internet

Software could be smarter.

If it understood more about the web page, it could describe it at a higher, task-oriented level. It could read out the relevant bits, instead of everything.

If it understood more about what the user wants to do, the user could just say that, instead of working out the manual navigation steps themselves.

The vision is software that can interpret web pages and offer a conversational interface to web browsing.

Continue reading