Год выпуска: 2010 Автор: Soraya Abad-Mota Издательство: LAP Lambert Academic Publishing Страниц: 120 ISBN: 9783838310329
Описание
Organizations produce large numbers of documents. Often the contents of these documents do not reach the operational databases or data warehouses of the enterprise. With the world-wide accessibility to the web these documents are made available to a wide audience, but browsing through them manually is cumbersome, at best. The semantic web concept has led to fascinating possibilities in trying to make explicit the semantics of terabytes of unstructured data available today. In this book we define the Document Interrogation Architecture (DIA) to extract data from the documents using information extraction techniques and to populate a database with the extracted data. The domain of the documents is represented with an ontology, which is the basis for the definition of an interrogation language with approximate query processing capabilities. With DIA many organizations could take advantage of the contents of their documents. Therefore this book should be particularly useful for computer...