1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
|
<?xml version="1.0" encoding="utf-8"?>
<test>
<name>stemming</name>
<config>
indexer
{
mem_limit = 16M
}
searchd
{
<searchd_settings/>
workers = threads
}
source srctest
{
type = mysql
<sql_settings/>
sql_query = SELECT * FROM test_table
}
index test
{
source = srctest
path = <data_path/>/test
charset_table = -, 0..9, A..Z->a..z, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
morphology = stem_ru, stem_en
}
index morph0
{
source = srctest
path = <data_path/>/morph0
dict = keywords
min_prefix_len = 1
}
index morph1
{
source = srctest
path = <data_path/>/morph1
dict = keywords
min_prefix_len = 1
morphology = stem_en
}
source src_ru
{
type = mysql
<sql_settings/>
sql_query = SELECT *, 11 as idd FROM test_ru
sql_query_pre = SET NAMES utf8
sql_attr_uint = idd
}
index test_ru
{
source = src_ru
path = <data_path/>/test_ru
dict = keywords
morphology = stem_ru
charset_table = 0..9, A..Z->a..z, _, ., -, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+0401->U+0435, U+0451->U+0435
}
index rt
{
type = rt
path = <data_path/>/rt
rt_field = title
rt_attr_uint = idd
dict = keywords
morphology = stem_ru
charset_table = 0..9, A..Z->a..z, _, ., -, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+0401->U+0435, U+0451->U+0435
}
source src_stop
{
type = mysql
<sql_settings/>
sql_query = SELECT 1, 11 as idd, 'running far' as text UNION SELECT 2, 22 as idd, 'runs me' as text UNION SELECT 3, 33 as idd, 'rufus my friend' as text
sql_query_pre = SET NAMES utf8
sql_attr_uint = idd
}
index test_stop
{
source = src_stop
path = <data_path/>/test_stop
dict = keywords
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, ., -, a..z
stopwords = <this_test/>/stopwords.txt
min_prefix_len = 1
enable_star = 1
index_exact_words = 1
}
index test_stop1
{
source = src_stop
path = <data_path/>/test_stop1
dict = crc
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, ., -, a..z
stopwords = <this_test/>/stopwords.txt
min_prefix_len = 1
enable_star = 1
index_exact_words = 1
}
source src_metaphone
{
type = mysql
<sql_settings/>
sql_field_string = name
sql_query = SELECT 1 as id, '加藤郁子アーティスト片山耕アーティストピューロキッズアーティスト金子知恵アーティスト井上かなえアーティスト他知恵アーティスト井上かなえアーティスト他' as name
}
index test_metaphone
{
source = src_metaphone
path = <data_path/>/test_metaphone
morphology = metaphone
charset_table = U+21..U+29, U+30..U+999, U+1000..U+FFFF
}
</config>
<queries>
<!-- long query after proximity length was not properly 0-terminated, and caused overlong token to be passed to stemmer, and crash -->
<query mode="extended2" index="test morph0 morph1 test_ru rt test_stop test_stop1">"one two"~3 three</query>
<!--here is going regression dict=keywords got broken by morphology-->
<query mode="extended2" index="morph1">=running</query>
<query mode="extended2" index="morph1">=run</query>
<query mode="extended2" index="morph1">running*</query>
<query mode="extended2" index="morph1">runnin*</query>
<query mode="extended2" index="morph1">run*</query>
<query mode="extended2" index="morph1">ru*</query>
<query mode="extended2" index="morph1">=runnin*</query>
<query mode="extended2" index="morph1">runnings*</query>
<query mode="extended2" index="morph1">runnin</query>
<query mode="extended2" index="morph1">running</query>
<query mode="extended2" index="morph1">run</query>
<query mode="extended2" index="morph0">=running</query>
<query mode="extended2" index="morph0">=run</query>
<query mode="extended2" index="morph0">running*</query>
<query mode="extended2" index="morph0">runnin*</query>
<query mode="extended2" index="morph0">run*</query>
<query mode="extended2" index="morph0">ru*</query>
<query mode="extended2" index="morph0">=runnin*</query>
<query mode="extended2" index="morph0">runnings*</query>
<query mode="extended2" index="morph0">runnin</query>
<query mode="extended2" index="morph0">running</query>
<query mode="extended2" index="morph0">run</query>
<!-- regression stemmed word vs dict=keyword -->
<query mode="extended2" index="test_ru">ордеру АВАВ</query>
<!-- regression stemmed exact form from stoplist vs dict=keyword prefix -->
<query mode="extended2" index="test_stop">run*</query>
<query mode="extended2" index="test_stop">ru*</query>
<query mode="extended2" index="test_stop">=runs</query>
<query mode="extended2" index="test_stop">runs</query>
<query mode="extended2" index="test_stop1">run*</query>
<query mode="extended2" index="test_stop1">ru*</query>
<query mode="extended2" index="test_stop1">=runs</query>
<query mode="extended2" index="test_stop1">runs</query>
</queries>
<sphqueries>
<!-- here is going regression 2byte vs 2byte + sbc -->
<sphinxql>CALL KEYWORDS ('то-тический', 'test')</sphinxql>
<sphinxql>CALL KEYWORDS ('тоЫтический', 'test')</sphinxql>
<!-- regression stemmed word vs dict=keyword -->
<sphinxql>INSERT INTO rt (id, idd, title) VALUES ( 1, 1, 'testing that' ),( 2, 1, 'ордеру АВАВ/А85/007' )</sphinxql>
<sphinxql>SELECT * FROM test_ru WHERE MATCH('ордеру АВАВ')</sphinxql>
<sphinxql>show meta</sphinxql>
<sphinxql>SELECT * FROM rt WHERE MATCH('ордеру АВАВ')</sphinxql>
<sphinxql>show meta</sphinxql>
</sphqueries>
<db_create>
CREATE TABLE `test_table`
(
`document_id` int(11) NOT NULL default '0',
`body` varchar(255) NOT NULL default ''
)
</db_create>
<db_drop>
DROP TABLE IF EXISTS `test_table`
</db_drop>
<db_insert>
INSERT INTO `test_table` VALUES
( 1, 'and nothing else matters' ),
( 2, 'running into trouble' )
</db_insert>
<db_create>
CREATE TABLE `test_ru`
(
`document_id` int(11) NOT NULL default '0',
`body` varchar(255) CHARACTER SET UTF8 NOT NULL default ''
)
</db_create>
<db_drop>DROP TABLE IF EXISTS `test_ru`</db_drop>
<db_insert>SET NAMES utf8</db_insert>
<db_insert>INSERT INTO `test_ru` (document_id, body) VALUES ( 1, 'testing that' ), ( 2, 'ордеру АВАВ/А85/007' )</db_insert>
</test>
|