1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
|
#!/bin/bash
LANG="bul cat ces dan deu ell eng fin fra hun ind ita lav lit nld nor pol por ron rus slk slv spa srp swe tur ukr vie chi-sim chi-tra amh asm aze-cyrl bod bos ceb cym dzo fas gle guj hat iku jav kat kat-old kaz khm kir lao lat mar mya nep ori pan pus san sin srp-latn syr tgk tir uig urd uzb uzb-cyrl yid afr ara aze bel ben chr enm epo est eus frk frm glg heb hin hrv isl ita-old jpn kan kor mal mkd mlt msa spa-old sqi swa tam tel tha grc"
LANG_NEW="bre chi-sim-vert chi-tra-vert cos div fao fil fry gla hye jpn-vert kor-vert kmr ltz mon mri oci que snd sun tat ton yor"
SCRIPT="arab armn beng cans cher cyrl deva ethi frak geor grek gujr guru hans hans-vert hant hant-vert hang hang-vert hebr jpan jpan-vert knda khmr laoo latn mlym mymr orya sinh syrc taml telu thaa thai tibt viet"
CONTROL="control"
rm -f *.install
cp -f control.in ${CONTROL}
# See https://github.com/tesseract-ocr/tessdata_best/pull/17
dependencies() {
case "$1" in
aze)
sed 's/${misc:Depends}/&, tesseract-ocr-aze-cyrl (>= 4.0.9)/g'
;;
uzb)
sed 's/${misc:Depends}/&, tesseract-ocr-uzb-cyrl (>= 4.0.9)/g'
;;
aze-cyrl)
sed 's/Recommends.*/&, tesseract-ocr-aze (>= 4.0.9)/g'
;;
uzb-cyrl)
sed 's/Recommends.*/&, tesseract-ocr-uzb (>= 4.0.9)/g'
;;
srp-latn)
sed 's/${misc:Depends}/&, tesseract-ocr-srp (>= 4.0.9)/g'
;;
*)
cat
;;
esac
}
for i in ${LANG} ${LANG_NEW}; do
j=$(cat lang.txt | grep "^${i}__" | awk -F '__' '{print $2}')
dependencies $i >> ${CONTROL} << EOF
Package: tesseract-ocr-${i}
Architecture: all
Multi-Arch: foreign
Provides: tesseract-ocr-language, tesseract-ocr-lang
Depends: \${misc:Depends}
Recommends: tesseract-ocr (>= 4.9.9)
Breaks: tesseract-ocr (<< 4.9.9)
Replaces: tesseract-ocr-data (<< 2)
Description: tesseract-ocr language files for ${j}
Tesseract is an open source Optical Character Recognition (OCR)
Engine. It can be used directly, or (for programmers) using an API to
extract printed text from images. This package contains the data
needed for processing images in ${j} language.
EOF
cat >> tesseract-ocr-${i}.install << EOF
$(echo ${i} | sed 's/-/_/g').* usr/share/tesseract-ocr/5/tessdata/
EOF
done
dependencies osd >> ${CONTROL} << EOF
Package: tesseract-ocr-osd
Architecture: all
Multi-Arch: foreign
Provides: tesseract-ocr-language, tesseract-ocr-lang
Depends: \${misc:Depends}
Recommends: tesseract-ocr (>= 4.9.9)
Breaks: tesseract-ocr (<< 4.9.9)
Replaces: tesseract-ocr-data (<< 2)
Description: tesseract-ocr language files for script and orientation
Tesseract is an open source Optical Character Recognition (OCR)
Engine. It can be used directly, or (for programmers) using an API to
extract printed text from images. This package contains the data
needed for identifying script and orientation.
EOF
cat >> tesseract-ocr-osd.install << EOF
$(echo osd | sed 's/-/_/g').* usr/share/tesseract-ocr/5/tessdata/
EOF
for i in ${SCRIPT}; do
j=$(cat script.txt | grep "^${i}__" | awk -F '__' '{print $3}')
dependencies $i >> ${CONTROL} << EOF
Package: tesseract-ocr-script-${i}
Architecture: all
Multi-Arch: foreign
Provides: tesseract-ocr-language, tesseract-ocr-lang
Depends: \${misc:Depends}
Recommends: tesseract-ocr (>= 4.9.9)
Breaks: tesseract-ocr (<< 4.9.9)
Replaces: tesseract-ocr-data (<< 2)
Description: tesseract-ocr data for ${j} script
Tesseract is an open source Optical Character Recognition (OCR)
Engine. It can be used directly, or (for programmers) using an API to
extract printed text from images. This package contains the data
needed for processing images in ${j} script.
EOF
cat >> tesseract-ocr-script-${i}.install << EOF
script/$(cat script.txt | grep "${i}__" | awk -F '__' '{print $2}') usr/share/tesseract-ocr/5/tessdata/
EOF
done
|