In this thesis, I propose a novel assembly of techniques which can be applied on unstructured information sources such as news articles, news reports, or other text-based documents.While their algorithm can be extended to remove other non-content blocks, its efficacy for the general Web- cleaning problem has not been studied. Besides, their algorithm generates rules from training examples using a manually- specified procedure that states how the features ... Since a large percentage of dynamically-generated Web-documents have some form of underlying templates, Wrapper [81, anbsp;...
|Title||:||Automatic Text-based Explanation of Events|
|Publisher||:||ProQuest - 2005|