Год выпуска: 2013 Автор: Jiayun Han Издательство: LAP Lambert Academic Publishing Страниц: 68 ISBN: 9783659376221
Описание
This project is aimed to build an efficient, scalable, portable, and trainable part-of-speech tagger. Using 98% of Penn Treebank-3 as the training data, it builds a raw tagger, using Bayes’ theorem, a hidden Markov model, and the Viterbi algorithm. After that, a reinforcement machine learning algorithm and contextual transformation rules were applied to increase the tagger’s accuracy. The tagger’s final accuracy on the testing data is 96.51% and its speed is about 26,000 words per second on a computer with two-gigabyte random access memory and two 3.00 GHz Pentium duo processors. The tagger’s portability and trainability are proved by the tagger-maker’s success in building a new tagger out of a corpus that is annotated with the tagset different from that of Penn Treebank.
Здравствуйте, Ирина. Задачки получила. Очень благодарна. Никто не хотел браться за расчёты. Вы меня выручили. Уже второй раз обращаюсь за помощью к Студенточке и очень довольна.