File: control

package info (click to toggle)
golang-gopkg-neurosnap-sentences.v1 1.0.6-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 19,816 kB
  • sloc: makefile: 80; python: 17
file content (49 lines) | stat: -rw-r--r-- 2,084 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Source: golang-gopkg-neurosnap-sentences.v1
Standards-Version: 4.7.2
Maintainer: Debian Go Packaging Team <team+pkg-go@tracker.debian.org>
Uploaders:
 Anthony Fok <foka@debian.org>,
Section: golang
Testsuite: autopkgtest-pkg-go
Priority: optional
Build-Depends:
 debhelper-compat (= 13),
 dh-golang,
 golang-any,
Vcs-Browser: https://salsa.debian.org/go-team/packages/golang-gopkg-neurosnap-sentences.v1
Vcs-Git: https://salsa.debian.org/go-team/packages/golang-gopkg-neurosnap-sentences.v1.git
Homepage: https://gopkg.in/neurosnap/sentences.v1
XS-Go-Import-Path: gopkg.in/neurosnap/sentences.v1

Package: golang-gopkg-neurosnap-sentences.v1-dev
Architecture: all
Multi-Arch: foreign
Depends:
 ${shlibs:Depends},
 ${misc:Depends},
Description: Sentence tokenizer for Go
 A golang package that converts a blob of text into a list of sentences.
 .
 This package attempts to support a multitude of languages: Czech,
 Danish, Dutch, English, Estonian, Finnish, French, German, Greek,
 Italian, Norwegian, Polish, Portuguese, Slovene, Spanish, Swedish,
 and Turkish.
 .
 An unsupervised multilingual sentence boundary detection library for
 golang. The goal of this library is to be able to break up any text into
 a list of sentences in multiple languages. The way the punkt system
 accomplishes this goal is through training the tokenizer with text in
 that given language. Once the likelihoods of abbreviations,
 collocations, and sentence starters are determined, finding sentence
 boundaries becomes easier.
 .
 There are many problems that arise when tokenizing text into sentences,
 the primary issue being abbreviations. The punkt system attempts to
 determine whether a word is an abbreviation, an end to a sentence, or
 even both through training the system with text in the given language.
 The punkt system incorporates both token- and type-based analysis on the
 text through two different phases of annotation.
 .
 Original research article: http://citeseerx.ist.psu.edu/viewdoc/downloa-
 d;jsessionid=BAE5C34E5C3B9DC60DFC4D93B85D8BB1?doi=10.1.1.85.5017&rep=re-
 p1&type=pdf