Can IBM's Watson outsmart hackers?
Loading...
Every year, some 720,00 blogs, 10,000 research papers, and data from countless malware varieties, viruses, and software vulnerabilities add to the massive, growing, and often messy collection of cybersecurity knowledge.
But because most of that information is in written form and not formally structured for data crunching computers,听much of that information isn't analyzed and dissected to help solve today's most pressing digital security problems.
Now, researchers at IBM want to see if they can use the company鈥檚 Watson super computer to digest that data in hopes its machine听can help听humans听outsmart malicious hackers.
If its winning performance on "Jeopardy!" is any indication,听Watson's processing power may be a boon to an industry drowning in data and struggling to more quickly find and fix computer vulnerabilities.
"Security analysis is based upon the consumption of lots of data," said听Jon Oltsik, an analyst at Enterprise Strategy Group, a tech research firm.
But since many cybersecurity professionals can't spend all day crunching data,听"Watson is engineered to do this and actually learn as it does so. It can help sort through the noise and point analysts toward relevant content,"听he said.听
Given the huge skills gaps that exists in the security industry, most organizations do not have anywhere near the resources required to manually pore through and correlate data from other sources with the data generated by their own devices.
Applying machine learning technology to the problem offers a way to combine and extract value from a much broader and diverse data sets than possible today, says Caleb Barlow, vice president of IBM Security.
"Watson is an unstructured data engine," said Mr. Barlow, referring to the technology鈥檚 ability to make sense of data that has not been specifically structured for use by computers. "It allows us to go look at thing in blogs, wikis, video transcripts and bring that data into the context of trying to solve cybersecurity challenges."
IBM says its research shows that a staggering 80 percent of all security information on the Internet is in a form that cannot be easily consumed by modern security software tools. In fact, the average organization taps just 8 percent of the data available to them that is not generated by a network security product.
But before Watson begins analyzing cyberthreats, it'll need to learn the language of cybersecurity,听Barlow said. Just like IBM researchers trained the supercomputer over a period of time to play "Jeopardy!,"听they now need to train it to look at documents and data and extract security intelligence from it.
That's a task that requires annotating and inputting huge volumes of security reports into the system and helping it identify the terms, the definitions and the language associated with cybersecurity 鈥撎齭imilar to Watson's , where the supercomputer learned to develop recipes from thousands of ingredients for a food truck at the South by Southwest festival in 2014.
Over the next several months, students from the California State Polytechnic University, Pomona, Pennsylvania State University, the Massachusetts Institute of Technology, New York University, and four other universities will process and input content into Watson from an average of 15,000 security documents per month.
"This isn鈥檛 like developing a normal software development product," IBM's听Barlow said. "It is much like teaching a child to read. We have to teach Watson how to read and understand security data. We have to teach it what an attack is, who an attacker is and what an indicator of compromise looks like."听
Smart as Watson is, it can make mistakes, said Barlow. A case in point has been Watson鈥檚 tendency to classify the term听"ransomware" as a city. "We really had to go in and force the correction that ransomware is not a city."
听