1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
|
<?xml version="1.0" encoding="utf-8"?>
<test>
<name>snippets vs query highlighting</name>
<config>
indexer
{
mem_limit = 16M
}
searchd
{
<searchd_settings/>
}
source test
{
type = mysql
<sql_settings/>
sql_query = SELECT 1, 'title' as title, 'text' as text;
}
index test
{
source = test
path = <data_path/>/test
phrase_boundary = U+002C
phrase_boundary_step = 100
min_infix_len = 1
}
index hi1
{
source = test
path = <data_path/>/hi1
docinfo = extern
dict = keywords
}
index hi2
{
source = test
path = <data_path/>/hi2
docinfo = extern
dict = keywords
morphology = stem_enru
min_prefix_len = 1
index_exact_words = 1
html_strip = 1
}
</config>
<db_insert>select 1;</db_insert>
<custom_test><![CDATA[
$opts = array
(
'before_match' => '[B]',
'after_match' => '[A]',
'chunk_separator' => ' ... ',
'limit' => 255,
'around' => 2,
'query_mode' => 1
);
$text = 'Sphinx clusters scale to billions of documents, terabytes of data, and billions of queries per month.';
$queries = array
(
'^sphinx month$',
'^sphinx queries$',
'^clusters month$',
'^*inx *bytes',
'*i*',
'*on*',
'*s',
'"clusters scale"',
'"clusters do not scale"', // false claims don't get highlighted
'"of d*"',
'terabyte* << quer*',
'data << terabyte*',
'"sphinx scale"~3',
'"sphinx billions"~3',
'"silly documents"/1',
'"clusters scale to billions"',
'"queries per month" | month | "per month"',
'"of d*" | "of data"',
'"of data" -"of hedgedogs"',
'"documents terabytes"', // crosses boundary
'@title sphinx',
'@text sphinx',
'@text[3] sphinx',
'@text[3] documents',
'@text[7] documents',
// case shouldn't matter
'SPHINX',
'SPH*',
'*PHI*',
'*INX',
);
$results = array();
foreach ( $queries as $query )
{
$reply = $client->BuildExcerpts ( array($text), 'test', $query, $opts );
$results [] = $query;
$results [] = $reply;
}
// regressions fast-path query mode starred vs regular term matches
$query = ' "*mmitt* u" | ommitt* | "committed u" ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 1 ) );
$query = ' *ommitt* | "committed u" ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 1 ) );
$query = ' *ommitt* committed u ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 0 ) );
$query = ' committed* | "committed p" ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 1 ) );
$query = ' committed* committed p ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 0 ) );
$query = ' (support ("committed*")) ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 1 ) );
$query = ' (support ("code*" | "code test")) ';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( 'support is just committed to Sphinx code base' ), 'test', $query, array ( 'query_mode' => 1, 'limit' => 25 ) );
$doc = 'Prinal. Onenes din Pas onatif searst ang searst searst searst searst searst searst way as inge, as kin puble difute paii (for Unitio clas reappe Impand bants to a caly prommat to deady. A cous al fonsue abcingelonhe aaa bbb ccc abcingelonhe aaa bbb ccc abcingelonhe cc pheyse but the hing fiche lochns my produr in may bects of hatest herstat everre tor Scine face.';
$query = 'din abcingelonhe';
$results [] = $query;
$results [] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array ( 'limit' => 38 ) );
$doc = 'тест на подсветку начала документа в утф8';
$query = 'din';
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 4) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 5) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 6) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 7) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 8) );
$doc = 'text starred some begin begin begin some starred text and more in between starred some text end';
$query = 'some starr* text';
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 35) );
$doc = 'begin the text right mid mid mid the right text end';
$query = 'the the right right text text';
$results[] = $client->BuildExcerpts ( array ( $doc ), 'test', $query, array('limit' => 30) );
// regression passage generation vs new path of snippet generation
$doc = "Our company's core technology platform is based on Microsoft applications, including the Windows NT operating system and a SQL server relational database, all residing on scaleable hardware. The software is constructed using an advanced proprietary XML framework and resides on an N-tier architecture. The support of open systems allows integration with a large variety of existing commercial, proprietary and legacy applications.  Other applications, which are also operational in a Microsoft NT environment, have been developed using Power Builder and are dependent on an Oracle relational database. running fast and runs out";
$query = 'the the right right text text';
$results[] = $client->BuildExcerpts(
array($doc),
'hi1','microsoft & xml',
array('query_mode'=>true,
'allow_empty' => true,
'before_match' => '<b>',
'after_match' => '</b>',
'chunk_separator' => '...',
'limit' => 5000,
'limit_words' => 50,
'limit_passages' =>5,
'around' => 25,
'exact_phrase' => false,
'force_all_words' => false,
)
);
// regression duplicates removal from query
$query = 'on | on | the | software | an* | ap* | an* | al* | on | al* | running | runs | run';
$results[] = $client->BuildExcerpts ( array ( $doc ), 'hi2', $query, array('query_mode'=>true, 'limit'=>200) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'hi2', $query, array('query_mode'=>true, 'limit'=>0) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'hi2', $query, array('query_mode'=>false, 'limit'=>200) );
$results[] = $client->BuildExcerpts ( array ( $doc ), 'hi2', $query, array('query_mode'=>false, 'limit'=>0) );
]]></custom_test>
</test>
|