Год выпуска: 2010 Автор: Jeremy Ellman Издательство: Страниц: 228 ISBN: 3838338405
Описание
This thesis addresses the problem of extracting a representation of text's meaning from its content. The solution investigated is based on the use of Roget?s thesaurus as an external knowledge source and can be used to analyse texts of any length or complexity. The resulting document representation can then be compared to others, producing a new method for text similarity assessment. All coherent texts contain embedded sequences of words that are related in meaning. These sequences can be detected by identifying simple relationships between the relevant thesaural entries in which the words are found. The identification of initial sequences drives the addition of further related words into conceptually related ?lexical chains?. Every coherent text contains many lexical chains of different lengths and strengths. These may be used to represent the broad subject matter of a text. By identifying the key concept of each chain, and relating this to its presence we may...