1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
|
<sect1 id="rel">
<title>Relevancy
<indexterm><primary>Relevancy</primary></indexterm>
</title>
<sect2 id="rel-order">
<title>Ordering documents</title>
<para><application>mnoGoSearch</application> sorts results first by <literal>relevancy</literal>
and second by <literal>popularity rank</literal>.</para>
<sect3 id="relevancy"><title>Relevancy calculation</title>
<para>Relevancy for every found document is calculated as 100% multiplied by the cosine of an angle formed by weights vectors for the request
and weights vectors for the document found. The number of vector coordinates is equal to the multiplication of the number of words forms in
the search query and the number of sections defined in <filename>indexer.conf</filename>. Every vector's coordinate corresponds to
a word in a search query that fits one of the document's sections. The values of this coordinate depend on the weight of this section,
defined by the <option>wf</option> parameter (see <xref linkend="search-changeweight"/>).
And this word is exactly the same as in the search query or its word form or synonym.
And one more coordinate is equal to the average distance between searched words in the document. For the query's vector, this coordinate is equal to 0.
</para>
<para>
In the default configuration search can produce quite small score values,
because it expects that the words will be found in up to 256 document
sections at the same time. Please see <xref linkend="cmdref-numsections"/>
<filename>search.htm</filename> command description how to specify
the real number of sections used, and thus increase score values.
</para>
<para>
Other commands affecting document order and/or score value are:
<xref linkend="cmdref-datefactor"/>,
<xref linkend="cmdref-docsizeweight"/>,
<xref linkend="cmdref-mincoordfactor"/>,
<xref linkend="cmdref-numdistinctwordfactor"/>,
<xref linkend="cmdref-numwordfactor"/>,
<xref linkend="cmdref-worddistanceweight"/>.
</para>
</sect3>
<sect3 id="poprank"><title>Popularity rank<indexterm><primary>Popularity rank</primary></indexterm></title>
<para>
The popularity rank calculation is made in two stages. At first stage, the value of the <option>Weight</option> parameter
for every server is divided by the number of links from this server. Thus, the weight of one link from this server is calculated.
At second stage, for every page we find the sum of weights of all links pointed to this page.
This sum is the popularity rank for this page. Self links, i.e. when a page
has a link to itself, do not affect popularity rank.
</para>
<para><indexterm><primary>Command</primary><secondary>Weight</secondary></indexterm>
By default, the value of the <option>Weight</option> parameter is equal to 1 for all servers indexed.
You may change this value by <command>Weight</command> command in the <filename>indexer.conf</filename> file or
directly in the <literal>server</literal> table, if you load the servers configuration from this table.
</para>
<para>If you place the
<option><indexterm><primary>Command</primary><secondary>PopRankSkipSameSite</secondary></indexterm>PopRankSkipSameSite yes</option>
command in the <filename>indexer.conf</filename> file, the <command>indexer</command> will take only inter-site links (i.e. links from a page on
one site to a page on another site) for popularity rank calculation.
</para>
<para>If you place the
<option><indexterm><primary>Command</primary><secondary>PopRankFeedBack</secondary></indexterm>PopRankFeedBack yes</option>
command in the <filename>indexer.conf</filename> file, the <command>indexer</command> will calculate the site weight before page rank
calculation. To do that, the <command>indexer</command> calculates the sum of popularity rank for all pages from the same site. If this sum is
greater than 1, the weight for the site is set to this sum, otherwise, the site weight is set to 1.
</para>
<para>If you place the
<option><indexterm><primary>Command</primary><secondary>PopRankUseTracking</secondary></indexterm>PopRankUseTracking yes</option>
command in the <filename>indexer.conf</filename> file, the <command>indexer</command> will calculate the site weight as the number of
tracked queries with restriction on this site.
</para>
<para>If you place the
<option><indexterm><primary>Command</primary><secondary>PopRankUseShowCnt</secondary></indexterm>PopRankUseShowCnt yes</option>
command in the <filename>search.htm</filename> file, then for every result shown to the user, the
corresponding <literal>url.shows</literal> value will be increased by 1, if relevancy for this result is great or equal to
the value specified by the
<option><indexterm><primary>Command</primary><secondary>PopRankShowCntRatio</secondary></indexterm>PopRankShowCntRatio</option>
command (default value is 25.0).
If you place <option>PopRankUseShowCnt yes</option> in the <filename>indexer.conf</filename> file, the <command>indexer</command>
will add to url's PopularityRank the value of <literal>url.shows</literal> multiplied by value, specified in the
<option><indexterm><primary>Command</primary><secondary>PopRankShowCntWeight</secondary></indexterm>PopRankShowCntWeight</option>
command (default value is 0.01).
</para>
</sect3>
</sect2>
<sect2 id="score-debug">
<title>Analyzing score values</title>
<para>Starting from version 3.3.7, it's possible to debug
score values calculated for the documents found. In order to
debug score value go through these steps:
<orderedlist>
<listitem>
Add this code into the bottom of the <literal><!--restop--></literal>
section of your search template:
<programlisting>
<--restop-->
....
[DebugScore: $(DebugScore)]
<--/restop-->
</programlisting>
</listitem>
<listitem>
Add this code into the bottom of the <literal><!--res--></literal>
section of your search template:
<programlisting>
<--res-->
....
[ID=$(ID)]
<--/res-->
</programlisting>
</listitem>
<listitem>
Open <program>search.cgi</program> in your browser and
run a search query consisting of multiple words.
You will see document ID after the usual document information.
</listitem>
<listitem>
Choose a document you want to see score debug information for.
Remember its ID (let's say the ID is 100).
</listitem>
<listitem>
Go to your browser's location bar, add
<command>&DebugURLID=100</command>
at the very end of the URL and press Enter.
<note>
<para>
URL will look approximately like this:
<programlisting>
http://hostname/cgi-bin/search.cgi?q=test+query&DebugURLID=100
</programlisting>
</para>
</note>
</listitem>
<listitem>
Find a line of this format in between the search form and the results:
<programlisting>
DebugScore: url_id=82 RDsum=98 distance=84 (84/1) minmax=0.99091089
density=0.00196271 numword=0.90135133 wordform=0.00000000
</programlisting>
It will give you an idea why score for the chosen document is
too high or too low and help to fine tune various
parameters like <xref linkend="cmdref-worddistanceweight"/>
or <xref linkend="cmdref-worddensityfactor"/>.
</listitem>
</orderedlist>
<note>
<para>
Score debugging currently works only for queries with multiple search
words. Queries with a single search word don't return debug information.
</para>
</note>
</para>
</sect2>
<sect2 id="rel-cwords">
<title>Crosswords
<indexterm><primary>Crosswords</primary></indexterm>
</title>
<para>This feature authorizes assignment of the words between
<literal><a href="xxx"></literal> and <literal></a></literal>
to the document given in the link.
To enable using Crosswords, use the <command><xref linkend="cmdref-crosswords"/>
<indexterm><primary>Command</primary><secondary>CrossWords</secondary></indexterm>
</command> command in
<filename>indexer.conf</filename> and
<filename>search.htm</filename>.</para>
</sect2>
<!-- sect2 id="rel-dr">
<title>$(Score) template variable</title>
<para>
<varname>$(Score)</varname> template variable displays number of words from the query found in a document.</para>
</sect2 -->
</sect1>
|