1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
|
grc: Used the first part of the Iliad given that I doubt anyone's translating the UDHR
into ancient Greek anytime soon
nr: In order to try and distinguish xh, zu and nr from eachother, temporarily set
MAXNGRAMS 2000
MAXNGRAMSYMBOL 4
while generating those three fingerprints as a bit of a bodge
rue: NO UDHR translations available. Visited rue.wikipedia.org/wiki/Special:LongPages to get
a list of Rusyn pages on wikipedia and picked the longest page which didn't seem
too contaminated with mathematical or characters from different languages and scripts
sd: The UDHR for Sindhi is basically a picture of the text, so can't extract it. So visited
sd.wikipedia.org/wiki/Special:LongPages and picked the longest page which didn't seem
too contaminated with mathematical symbols or different script
shs: No UDHR translation available. Nor any lengthy cohesive text. So sample
text is cut and pasted phrases from http://www.firstvoices.ca/en/Secwepemc/phrase-book
remaining languages with LibreOffice support missing fingerprints:
these are just a little confusing with similar languages, just needs to be unpicked
sdc-IT
sdn-IT
src-IT
sro-IT
ku-IQ
ku-IR
ku-SY
ku-TR
these are trickier
sat-IN
sma-SE
smj-NO
smj-SE
smn-FI
sms-FI
sjd-RU
|