It’s been about a year since the computer system IBM Watson entered a TV quiz show against two of the best people to play the game, and won.
You knew that already, right? If not, skip reading this and go watch some of the footage instead. It’ll be more interesting.
But what has been done with Watson since?
With all those updates, it’d be useful to bring some of it together into an overview of what sort of work has been taking place in the last year.
The research didn’t stop after Jeopardy!. Research continue to push the underlying technology forwards. For example, one of the researchers is working on “improving Watson’s ability to deal with pairs of verbs that mean roughly the same thing but not exactly the same thing”.
IBM Research regularly work in collaboration with academia and the Watson project was no exception. In February, IBM announced the Universities that are contributing to the development of the project.
The University of Texas are “working to extend the capabilities of Watson… by developing a computational resource of common sense knowledge”.
The Rensselaer Polytechnic Institute are “working on a visualization component to visually explain to external audiences [Watson's] massively parallel analytics”.
The University at Albany developed an “interactive QA capability for sustained investigation” to be able to maintain the context for a series of questions and answers in a conversation, rather than take isolated individual questions like in Jeopardy!. IBM is working with UAlbany to integrate this capability into Watson for the future.
That’s not all. Other research areas being worked on with University partners such as Carnegie Mellon and MIT are outlined in the press release.
Healthcare and Watson – a little background
IBM’s main focus for the application of Watson is in healthcare.
Watson wasn’t specifically designed for Jeopardy! The quiz show was one demonstration of the architecture. Watson was given general purpose natural language texts like dictionaries and encyclopaedias, and used them to build itself a general knowledge from which it could answer quiz questions.
Post-Jeopardy!, instead of giving it general knowledge texts, we give it specialist texts (e.g. medical textbooks, journals, and research papers) and let it use them to build a specialist knowledge from which it can answer detailed medical questions.
That’s the goal: a system that uses the mountain of medical literature to build a question answering system able to be a physician’s assistant.
There are loads of statistics highlighting the need for this. For example, the amount of medical information doubles every 5 years. General practitioners spend about five hours a month reading journals – a fraction of the time that would be required to read the mountain of medical literature being published every day (One estimate suggested over 50,000 papers were published on the topic of neuroscience alone in 2010).
Watson can ingest every medical text, study and journal published, and use them all to build a knowledge from which it could answer detailed diagnostic questions.
In March, the head of IBM Software described why healthcare was chosen as Watson’s first challenge: “healthcare was the perfect example of how Watson in its current form can be used… In healthcare, a patient defines … vague symptoms and a doctor has to consult data, experience and literature to figure out the cause. Tests are ordered. It’s complex…” In many ways, the medical approach of differential diagnosis is a good analogy for how Watson works, so bringing this to healthcare makes sense.
To make this vision a reality, IBM needs partners who understand healthcare, so a number of partners were announced in 2011 who understand the problems that the medical profession deal with and what workflows would work with them.
Columbia University are helping identify issues in the practice of medicine where Watson may be able to contribute, and the University of Maryland are working to identify the best way Watson could interact with doctors.
Columbia University Medical Centre
About a year before Watson went on Jeopardy!, Herbert Chase – professor of clinical medicine at Columbia University – and two of his students started performing a series of tests for Watson’s ability to handle the medical domain.
They asked Watson thousands of medical diagnostic questions, and manually reviewed it’s answers. When Watson got a question wrong, they tried to work out how close it was and why, and reported their findings to the Watson team.
In May last year, a demonstration was given of Watson’s ability to assist with medical diagnoses. “Watson was gradually given information about a fictional patient with an eye problem. As more clues were unveiled — blurred vision, family history of arthritis, Connecticut residence — Watson’s suggested diagnoses evolved from uveitis to Behcet’s disease to Lyme disease. It gave the final diagnosis a 73 percent confidence rating.”
The demonstration isn’t online, but a shortened, similar walkthrough was used in a Watson keynote presentation last November.
It’s a compelling demo, showing how a conversational, interactive question-answering system could build up a picture of what is wrong with a patient to deliver a suggested diagnosis based on a wide variety of factors, without being limited to the common answers.
Professor Chase has spoken of similar examples he has tried on Watson, such as “a tough case he had experienced as a young doctor: a woman in her thirties with severe muscle weakness, who had blood tests indicating a low level of phosphate and elevated alkaline phosphatase, an enzyme. Watson’s top suggestions were hyperparathyroidism and rickets. It also flagged the possibility of a rare form of rickets that is vitamin D resistant—which the woman indeed had.”
They’re also considering the impact of other sources of data Watson could use, such as anecdotal evidence. What would be the effect of including discussions from blogs, message boards and other online forums?
University of Maryland School of Medicine
At a similar time, about a year before Jeopardy!, IBM started working with Dr Eliot Siegel, professor at the University of Maryland.
They suggested what would be appropriate medical literature for Watson to learn from – essentially preparing a curriculum suitable for a medical Watson, such as all of Medline, PubMed, and dozens of selected textbooks.
They provided Watson with access to (anonymised) medical records, helping to teach Watson about unusual cases with rare diseases or unexpected symptoms, and how to match what it knows about diagnostics with the experiences of procedures, treatments and outcomes that follow.
There was a great interview with Dr Siegel in March, which I recommend watching. He has a fascinating perspective on the work being done to create a medical Watson system.
Maryland is a med school – they teach med students. In many ways, he describes their work with Watson in a similar way. They’re training Watson as a med student, albeit a new type of student. Siegel even talks about a need for a new branch of the medical school focused at training software rather than people.
It’s not about manually programming Watson with medical information. In fact, Dr Siegel highlights that this sort of work wasn’t possible before Watson because of Watson’s unique ability to ingest information in natural language by itself.
They identified what a medical Watson system would need to know, and are working out what it needs to be taught in order to get there. They also test Watson to measure it’s progress, asking it every question from board exams, and “the clinicopathological (CPC) puzzlers” that appear in each issue of the New England Journal of Medicine.
Siegel warns that teaching Watson may take a similar amount of time to teaching a human med student. In May, he suggested that they would be three to five years from a pilot test with doctors, and that widespread use of Watson as a diagnosis tool would be maybe 8 to 10 years away. This view was echoed by the general manager of IBM Watson Systems who said that “realizing the full value of Watson will take five, seven, or maybe as many as 10 years“.
In an interview in February, Dr Siegel outlined the vision for how a trained Watson system could help:
- Read electronic medical records, create summaries and call out what’s important
- Read all of the literature in a doctor’s specialty
- Check for drug interactions
- Comb through clinical data and suggest possible diagnoses and potential treatments
- Advise a doctor in real time when they’re meeting with a patient
In September, a partnership was announced with WellPoint, a health insurance company in the US. WellPoint is working with IBM to build Watson systems for healthcare.
A WellPoint executive gave a presentation in November outlining why they are involved. They think Watson could help in the face of rising costs of chronic medical treatment, the need to make healthcare affordable, and the under-utilisation of evidence-based medicine.
As a massive healthcare provider in the United States and the largest provider in an association providing health insurance to over 100 million people, WellPoint have access to a massive amount of medical data, which is a critical requirement for Watson’s development.
One of their first tasks was preparing their data, integrating and collecting data from several disparate sources, such as reference texts, results from clinical trials and studies, and historical medical records.
Having access to a lot of data is an asset, but with so much of it in unstructured written observations and doctors reports, they aren’t able to do as much with it as they would like. Watson could help them to improve the flow of data between the providers that they support.
Ultimately, their goal is similar to other partners – to work out how to train Watson for medical. Their focus is on it’s potential for evidence-based diagnosis and treatment.
They’re working with IBM to identify how Watson can ingest domain-specific sources, and are performing training with their real (anonymised) case data.
They’re also comparing Watson’s performance with existing clinical decision support tools. Software-based clinical decision support systems are not new, and so it’s important to prove that Watson really is something different.
She described the phases of this, with ingesting of medical literature as like Watson being at medical school, and the training phase as like Watson doing a medical residency.
They also have a number of other areas where they plan to help.
To make Watson’s technical innovation real, they see a need for other areas of innovation: innovation in business processes, innovation in payment methods, and working out the realities of ownership, intellectual property, privacy, permission and other policy issues for a system that uses very personal data from a wide variety of sources.
What about security, what about HIPA compliance, what about liability, where can anonymised data be used, and where can’t it? These are also the sorts of areas where WellPoint think they can help.
They’re going to perform two pilots.
One pilot will be using Watson as a clinical decision support tool for use by WellPoint’s nurses responsible for managing complex patient cases and reviewing treatment requests from medical providers.
The pilot will be to see if Watson could improve the efficiency of clinical review of complex cases – helping them to make more streamlined authorisation of services, and include more evidence and explanation in their approval responses. It will also be to see if they can make more personalised and patient-specific decisions.
The second pilot is to trial the use of Watson in oncology practices allowing doctors to access Watson through a web-based platform to support the evidence-based diagnosis, treatment, and coordination of care of cancer patients – in particular, breast, lung and colon cancer.
The decision to focus on oncology is described as being because of how complex it is as a field. Essentially, to be a useful validation of Watson, it needs to be a complex field with high level of variability between treatments.
In addition, it’s an expensive area: cancer treatment costs are growing faster than other areas, so the potential benefits of Watson’s ability to make healthcare more affordable to provide are even greater.
After cancer, diabetes and cardiology are the next diseases being thought of as suitable projects.
WellPoint and Cedars-Sinai Medical Center
The first cancer clinic chosen by IBM and WellPoint for this second pilot was Cedars-Sinai‘s cancer centre. Their historical data on cancer and current clinical records were ingested into a version of Watson being hosted by WellPoint. Cedars-Sinai doctors are piloting the use of Watson as an adviser for oncologists and providing feedback and advice on the development of applications built on the Watson system.
“Watson Advisers” is a term increasingly being used for solutions built on Watson. The Watson Oncology Adviser is the highest profile one, and with Cedars-Sinai and Wellpoint, it is being taught about cancer using actual cases that have been solved, as well as what can be found in medical references.
For Jeopardy!, Watson used text-to-speech to give it’s answers. It didn’t use speech-to-text to get questions. It received questions as text files.
A future Watson could receive questions verbally. In February, IBM announced a partnership with the speech technology vendor Nuance Communications.
This isn’t the first time IBM and Nuance have worked together – in 2009, a joint initiative was announced. Nuance help bring IBM speech research to the market. Later in 2009, Nuance bought several patents relating to speech recognition from IBM.
IBM and Nuance have worked together in healthcare, too. In 2010, IBM and Nuance announced a partnership in speech recognition for getting doctor’s dictated text into the structured fields of electronic health records.
Nuance have a history of bringing speech recognition to healthcare, with their specialist software for understanding clinical language. Integrating this capability with Watson brings the potential of a Watson system that can understand what doctors say. Nuance described working with Watson as a next logical step for them, taking them from “recognizing what was said to understanding the intent”.
IBM and Nuance started talking about hoping to prepare commercial tools from this collaboration in 18 – 24 months.
Ready for Watson
When Watson solutions become widely available, it will not be a trivial system for businesses to start using. There will be a need to identify and prepare data for Watson to ingest in order to build knowledge. There will be a need to perform training and testing – asking enough questions with known answers for Watson to build it’s machine learning models, and for administrators to validate the accuracy.
As a result, even though Watson isn’t commercially available today, there are customers starting to think about how they might prepare for it.
In October, IBM announced the first “Ready for Watson” offering. Ready for Watson offers companies the chance to use technologies with elements that are related to Watson and are commercially ready today. More importantly, they provide assurance that they’ll be compatible with future Watson solutions.
Customers can invest time in getting going with analytics, and get the benefits this can bring today. And in the longer-term, this effort may serve as a on-ramp to a future Watson solution.
The first Ready for Watson offering to be announced was IBM Content and Predictive Analytics (ICPA) for Healthcare
The content analytics uses the same type of natural language processing as Watson, helping to extract meaning from unstructured medical information. The predictive analytics can be used to identify deviations and root causes, and predict the probability of outcomes.
Seton Healthcare Family
One of the first users of ICPA was Seton Healthcare Family, a network of medical facilities throughout central Texas. Seton are using ICPA to identify root causes of hospital readmissions, and predict ways to decrease preventable multiple hospital visits (in particular, in patients with congestive heart failure).
This isn’t Watson, and it isn’t doing question answering. But it is using some of Watson’s techniques to perform content analysis on the 80% of medical data which is in unstructured natural language, and doing deep content analysis on that to derive useful, actionable knowledge.
In the first four months of use, it’s already produced promising results identifying possible factors that Seton are now investigating.
Call centres, retail and sales
Other industries are being considered for Watson. One of these is helping call centres to cope with the millions of questions they receive, and come up with answers more quickly and effectively than they can with flow charts or FAQs.
In February, “IBM executives said they are in discussions with a major consumer electronics retailer to develop a version of Watson… able to interact… on a variety of subjects like buying decisions and technical support“.
In March, the head of IBM Software talked about the potential for small “baby Watson’s for specific institutional tasks like call-centers“.
The thinking is that Watson could be a tool to help call-centre staff get answers for their clients while still on the phone. Unfortunately, some of this was misinterpreted by articles talking about Watson being used to directly respond to callers or replacing telemarketers.
No announcements have been made about Watson for retail, however IBM has talked about a future vision of Watson’s question answering ability available to shop assistants through an in-store kiosk or mobile tablet.
Back to Jeopardy!
In terms of development, Jeopardy! is behind Watson. No new work is being done to get better at game shows.
The stage that was used for the Watson Jeopardy! challenge was finally disassembled in October, with some of the pieces going to the Smithsonian Museum.
However, from time to time, Watson was tempted back into the game. At the end of February, Watson played an exhibition match with five US Congress members at an event focusing on the importance of IT for the US economy and the need for greater focus on education in science.
New Jersey Congressman Rush Holt beat Watson. Perhaps it was fortunate that this wasn’t the match that got televised.
In a similar vein, Watson visited universities and colleges to compete in Jeopardy! matches against student teams. In October, Watson was the showcase of a symposium with Harvard Business School and MIT’s School of Management.
This time, at least, Watson won.
This was part of a university tour, following visits to Carnegie Mellon and the University of Pittsburgh, with later visits to universities like Stanford and UC Berkeley in November. There was as much focus on getting business school students thinking about the potential of technology as for the science students.
That said, these events came with some downside. The University games were run using a travelling version of Watson. A large number of Jeopardy! questions are run through “real” Watson before the event, and some custom code collects both Watson’s answer and the amount of time it took to come up with it.
In other words, we know what Watson would’ve answered to that question, and how quickly it would’ve been able to buzz in. This is then used by the travelling Watson simulator. It doesn’t need a cluster of servers to run, as it just has to buzz in after the pre-defined amount of time with the pre-defined answer.
In terms of “competing against Watson”, it is equivalent to what would’ve happened if the rack of servers had been dragged to the events. However, it’s tricky to explain, and was not well described in some press articles, leading to some comments like:
“Waiiit a minute. All these months, news on these Watson appearances had me believing he (she, it) was conjuring answers in real-time … Yet this story reports … Watson had already come up with answers to the questions prior to [the event]… Am I to understand Watson gets his (her, its) questions ahead of time?!” and “Sounds like IBM cheated”
Lastly, development. There is development work needed any time you take a research project and turn it into a product.
Some is architectural, as a system created for use for short periods answering questions from a single user (i.e. Alex Trebek) is productised into a system capable of running for long periods with multiple users. And made robust enough to cope with administrators who aren’t the PhD’s who invented it and will do things wrong.
Some is usability – creating tooling necessary to turn a complex Research project into a system that can be administered, configured and used without a PhD or machine-learning expertise.
Watson on Jeopardy! was one demonstration of the DeepQA architecture. Creating a platform that can be adapted to take on other domains means working out the steps necessary to do that domain adaptation, and building procedures and tools to perform and verify it.
I couldn’t find any articles talking about this aspect of the work. It’s not terribly sexy, but it’s the area where I work so I think it’s interesting enough to mention.
That’s enough for now.
Is this a complete list? Nope. For one thing, I did virtually no research. I’ve written about stuff that I can remember. There are almost certainly things I’ve forgotten.
And there are bound to be projects I’m not aware of. There is some work that I’m aware of but haven’t seen press releases for, so have to keep quiet on for now.
But this is already long enough. My aim was to give a taste of the sort of stuff that Watson has been up to.
What happens next? Watch this space.