Boilerpipe -- Boilerplate Removal and Fulltext Extraction from HTML pages


Boilerpipe -- Boilerplate Removal and Fulltext Extraction from HTML pages

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended f...

Версия Репозиторий Использование Дата
1.1.x 1.1.0 central нояб. 03, 2010
1.0.x 1.0.4 central мая 20, 2010