Watson Discovery - Gettysburg College
Summer 2017-Fall 2017“Alexa, ask Gettysburg College Watson am I allowed to house a goldfish in my dorm room?”
Ever heard of Watson?
It’s possible that you have in recent commericals. IBM is showing off Watson as being an excelent tool for healthcare, insurance, taxes, engineering, and even wine making!
Watson is an artificial intelligent supercomputer that is named after its founder Thomas J Watson. It is said that Watson is now made up of over 2800 processing cores and has 16 terabytes of RAM. It is estimated that Watson can process 500 gigabytes of information a second, a size that is equivalent to one million books.
Watson was first introduced in 2011 as a Jeopardy! challenger to Ken Jennings and Brad Rutter (the two most successful players in Jeopardy! history). Watson was completely cut off from the internet just as Jennings and Rutter were, making Watson unable to just “Google” the answers. Watson had to rely only on the information that it was initially ingested with which was basically the entire Wikipedia encylcopedia library. In a hard fought battle, Watson bested the best and paved the way for a new AI in the industry.
This super computer was not just created to win some stupid game show though. IBM had much bigger plans for Watson. IBM now offers multiple types of services that developers can use in their applications to make them smarter. One of these services involves creating a conversation bot that you specifiy what you want it to say to users but it uses Natural Language Processing so that users can talk to the bot like they would a person.
So why would a small liberal arts college want to use Watson?
I’m glad you asked! We could use Watson as an analytics tool for our applicants to better undersand the incoming pool or maybe we could use Watson to create a curriculum plan that is beneficial to the majority of students. The possibilities are endless. My advisor, Rodney Tosten, and I decided a good first way to experiment with the AI was to create a quesion and answer service to be used by the students and faculty of the college. In theory, the service will be trained on all of the content our gettysburg.edu website has to offer and a user will be able to ask any question using natural language and get the correct answer as a response.
What was I able to accomplish?
I started this project as a full time Summer research job after my Juinor year of university. I was able to create my own working hours, mostly 40 hours a week, and could work right in my dorm room. About every week I would check in with my advisor to get his input and ask him any questions that I would need to before moving forward. The first couple weeks involved a lot of research mostly in the form of YouTube videos and reading long paragraphs of documentation because I knew almost 0% of what Watson was or how I could possibly develop with it.
After many hours of research, we decided that we would continue and start to implement the features that we were hoped to acheive. We thought the student handbook was good place to start since Watson was able to ingest a pdf directly to generate possible answers. We quickly discovered that due to the formatting of the handbook more work would need to be done on our end to generate good and reasonable answer units. I used a couple different third party libraries in Java in order to reformat the document into something that Watson could easily injest and give good results.
The second task that needed to be completed was to generate possible questions in order to train watson. One way this could have been accomplished is manually by hand but that can get quite intense considering over 500 questions would need to be generated so I developed another Java GUI to generate possible questions. The concept was quite simple, take the first sentence of each answer unit and turn it into a question. I did this with the help of Stanford’s sentence parser and was able to rearrange verbs and nouns automatically. For example, the statement”Students are not permitted to enroll in a course for credit later than 10 class days after the beginning of the semester.” would turn into the question, “Are students permitted to enroll in a course for credit later than 10 days after the beginning of the semester?”
Now with both questions and answer units Watson was ready to be trained for our use case. We hired about 10 students for one hour to use Watson’s GUI training software where students rated whether or not suggested answers were good for each questions. We realized quickly that students were answering the same questions as other students at the same time so we were not getting the wide spread training we had been hoping for.
In the Fall IBM announced that it would be ending its Retrieve & Rank service that I had previously been using and would be merging it with its Discovery service. I spent some time trying to migrate the work that I had previously accomplished in the Summer but there was no need for it anymore. In conclusion, the project for the most case ended up in failure but however has layed some of the foundation for another student to take over next Summer to experiment with the new Watson service. Watson is still very young and will continue to change and we will need to continue to adapt. I do believe a future student will be able to implement our use case of training Watson on our college website, it will just take a little more time.