If you wonder what the government has done for you lately, take a look at DeepDive. DeepDive is a free version of IBM's Watson, developed in the same Defense Advanced Research Projects Agency (DARPA) program, but it is being made available free and open-source.
Although never having been pitted against IBM's Watson, DeepDive has gone up against a more fleshy foe: the human being. Result: DeepDive beat or at least equaled humans in the time it took to complete an arduous cataloging task. These were no ordinary humans, but expert human catalogers tackling the same task as DeepDive — to read technical journal articles and catalogue them by understanding their content.
“We tested DeepDive against humans performing the same tasks, and DeepDive came out ahead or at least equaled the efforts of the humans,” professor Shanan Peters, who supervised the testing, told EE Times.
DeepDive is free and open-source, which was the idea of its primary programmer, Christopher Re.
“We started out as part of a machine reading project funded by DARPA in which Watson also participated,” Re, a professor at the University of Wisconsin, told EE Times. “Watson is a question-answering engine (although now it seems to be much bigger). [In contrast] DeepDive's goal is to extract lots of structured data” from unstructured data sources.
DeepDive incorporates probability-based learning algorithms as well as open-source tools such as MADlib, Impala (from Oracle), and low-level techniques, such as Hogwild, some of which have also been included in Microsoft's Adam. To build DeepDive into your application, you should be familiar with SQL and Python.
Click to see larger image.
(Image: University of Wisconsin-Madison)
“Underneath the covers, DeepDive is based on a probability model; this is a very principled, academic approach to build these systems, but the question for use was 'could it actually scale in practice?' Our biggest innovations in Deep Dive have to do with giving it this ability to scale,” Re told us.
To read the rest of this article, visit EBN sister site EETimes.