1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
|
Source: boilerpipe
Section: java
Priority: optional
Maintainer: Debian Java Maintainers <pkg-java-maintainers@lists.alioth.debian.org>
Uploaders: Emmanuel Bourg <ebourg@apache.org>
Build-Depends:
ant (>= 1.6.5),
debhelper-compat (= 13),
default-jdk,
javahelper,
libnekohtml-java,
libxerces2-java,
maven-repo-helper
Standards-Version: 4.5.1
Vcs-Git: https://salsa.debian.org/java-team/boilerpipe.git
Vcs-Browser: https://salsa.debian.org/java-team/boilerpipe
Homepage: https://github.com/kohlschutter/boilerpipe
Package: libboilerpipe-java
Architecture: all
Depends: libnekohtml-java, libxerces2-java, ${misc:Depends}
Description: Boilerplate removal and fulltext extraction from HTML pages
The boilerpipe library provides algorithms to detect and remove the surplus
"clutter" (boilerplate, templates) around the main textual content of a web
page.
.
The library already provides specific strategies for common tasks (for example:
news article extraction) and may also be easily extended for individual problem
settings.
.
Extracting content is very fast (milliseconds), just needs the input document
(no global or site-level information required) and is usually quite accurate.
|