1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219
|
<html><head><title>C# bindings for Xapian</title></head>
<body>
<h1>C# bindings for Xapian</h1>
<p>
The C# bindings for Xapian are packaged in the <code>Xapian</code> namespace
and largely follow the C++ API, with the following differences and
additions. C# strings and other types are converted automatically
in the bindings, so generally it should just work as expected.
</p>
<p>
The <code>examples</code> subdirectory contains examples showing how to use the
C# bindings based on the simple examples from <code>xapian-examples</code>:
<a href="examples/SimpleIndex.cs">SimpleIndex.cs</a>,
<a href="examples/SimpleSearch.cs">SimpleSearch.cs</a>,
<a href="examples/SimpleExpand.cs">SimpleExpand.cs</a>.
</p>
<p>
Note: the passing of strings from C# into Xapian and back isn't currently
zero byte safe. If you try to handle string containing zero bytes, you'll
find they get truncated at the zero byte.
</p>
<h2>Unicode Support</h2>
<p>
In Xapian 1.0.0 and later, the Xapian::Stem, Xapian::QueryParser, and
Xapian::TermGenerator classes all assume text is in UTF-8. If you're
using Mono on UNIX with a UTF-8 locale (which is the default on most
modern Linux distributions), then Xapian appears to get passed Unicode
strings as UTF-8, so it should just work. We tested with Mono 1.2.3.1
using the Mono C# 2.0 compiler (gmcs).
</p>
<p>
However, Microsoft and Mono's C# implementations apparently take
rather different approaches to Unicode, and we've not tested with
Microsoft's implementation. If you try it, please report how well
it works (or how badly it fails...)
</p>
<h2>Method Naming Conventions</h2>
<p>
Methods are renamed to use the "CamelCase" capitalisation convention which C#
normally uses. So in C# you use <code>GetDescription</code> instead of
<code>get_description</code>.
</p>
<h2>Exceptions</h2>
<p>
Exceptions are thrown as SWIG exceptions instead of Xapian
exceptions. This isn't done well at the moment; in future we will
throw wrapped Xapian exceptions. For now, it's probably easier to
catch all exceptions and try to take appropriate action based on
their associated string.
</p>
<h2>Iterators</h2>
<p>
The C#-wrapped iterators work much like their C++ counterparts, with
operators "++", "--", "==", and "!=" overloaded. E.g.:
</p>
<pre>
Xapian.MSetIterator m = mset.begin();
while (m != mset.end()) {
// do something
++m;
}
</pre>
<h2>Iterator dereferencing</h2>
<p>
C++ iterators are often dereferenced to get information, eg
<code>(*it)</code>. In C# these are all mapped to named methods, as
follows:
</p>
<table title='Iterator deferencing methods'>
<thead><td>Iterator</td><td>Dereferencing method</td></thead>
<tr><td>PositionIterator</td> <td><code>GetTermPos()</code></td></tr>
<tr><td>PostingIterator</td> <td><code>GetDocId()</code></td></tr>
<tr><td>TermIterator</td> <td><code>GetTerm()</code></td></tr>
<tr><td>ValueIterator</td> <td><code>GetValue()</code></td></tr>
<tr><td>MSetIterator</td> <td><code>GetDocId()</code></td></tr>
<tr><td>ESetIterator</td> <td><code>GetTerm()</code></td></tr>
</table>
<p>
Other methods, such as <code>MSetIterator.GetDocument()</code>, are
available unchanged.
</p>
<h2>MSet</h2>
<p>
MSet objects have some additional methods to simplify access (these
work using the C++ array dereferencing):
</p>
<table title='MSet additional methods'>
<thead><td>Method name</td><td>Explanation</td></thead>
<tr><td><code>GetHit(index)</code></td><td>returns MSetIterator at index</td></tr>
<tr><td><code>GetDocumentPercentage(index)</code></td><td><code>ConvertToPercent(GetHit(index))</code></td></tr>
<tr><td><code>GetDocument(index)</code></td><td><code>GetHit(index).GetDocument()</code></td></tr>
<tr><td><code>GetDocumentId(index)</code></td><td><code>GetHit(index).GetDocId()</code></td></tr>
</table>
<h2>Non-Class Functions</h2>
<p>The C++ API contains a few non-class functions (the Database factory
functions, and some functions reporting version information), but C# doesn't
allow functions which aren't in a class so these are wrapped as static
member functions of abstract classes like so:
<ul>
<ul>
<li> <code>Xapian::version_string()</code> is wrapped as <code>Xapian.Version.String()</code>
<li> <code>Xapian::major_version()</code> is wrapped as <code>Xapian.Version.Major()</code>
<li> <code>Xapian::minor_version()</code> is wrapped as <code>Xapian.Version.Minor()</code>
<li> <code>Xapian::revision()</code> is wrapped as <code>Xapian.Version.Revision()</code>
</ul>
<ul>
<li> <code>Xapian::Auto::open_stub()</code> is wrapped as <code>Xapian.Auto.OpenStub()</code>
<li> <code>Xapian::Brass::open()</code> is wrapped as <code>Xapian.Brass.Open()</code>
<li> <code>Xapian::Chert::open()</code> is wrapped as <code>Xapian.Chert.Open()</code>
<li> <code>Xapian::Flint::open()</code> is wrapped as <code>Xapian.Flint.Open()</code>
<li> <code>Xapian::InMemory::open()</code> is wrapped as <code>Xapian.InMemory.Open()</code>
<li> <code>Xapian::Remote::open()</code> is wrapped as <code>Xapian.Remote.Open()</code> (both
the TCP and "program" versions are wrapped - the SWIG wrapper checks the parameter list to
decide which to call).
<li> <code>Xapian::Remote::open_writable()</code> is wrapped as <code>Xapian.Remote.OpenWritable()</code> (both
the TCP and "program" versions are wrapped - the SWIG wrapper checks the parameter list to
decide which to call).
</ul>
</ul>
<h2>Constants</h2>
<p>
The <code>Xapian::DB_*</code> constants are currently wrapped in a Xapian
class within the Xapian namespace, so have a double Xapian prefix!
So <code>Xapian::DB_CREATE_OR_OPEN</code> is available as
<code>Xapian.Xapian.DB_CREATE_OR_OPEN</code>.
The <code>Query::OP_*</code> constants are wrapped a little oddly too:
<code>Query::OP_OR</code> is wrapped as <code>Xapian.Query.op.OP_OR</code>.
Similarly, <code>QueryParser::STEM_SOME</code> as
<code>Xapian.QueryParser.stem_strategy.STEM_SOME</code>.
The naming here needs sorting out...
</p>
<h2>Query</h2>
<p>
In C++ there's a Xapian::Query constructor which takes a query operator and
start/end iterators specifying a number of terms or queries, plus an optional
parameter.
This isn't currently wrapped in C#.
<!-- FIXME implement this wrapping!
In C#, this is wrapped to accept any C# sequence (for
example a list or tuple) to give the terms/queries, and you can specify
a mixture of terms and queries if you wish. For example:
-->
</p>
<!--
<pre>
subq = xapian.Query(xapian.Query.OP_AND, "hello", "world")
q = xapian.Query(xapian.Query.OP_AND, [subq, "foo", xapian.Query("bar", 2)])
</pre>
-->
<h3>MatchAll and MatchNothing</h3>
<p>
These aren't yet wrapped for C#, but you can use <code>xapian.Query("")</code>
instead of MatchAll and <code>xapian.Query()</code> instead of MatchNothing.
</p>
<!-- FIXME: Need to define the custom output typemap to handle this if it
actually seems useful...
<h2>Enquire</h2>
<p>
There is an additional method <code>GetMatchingTerms()</code> which takes
an MSetIterator and returns a list of terms in the current query which
match the document given by that iterator. You may find this
more convenient than using the TermIterator directly.
</p>
-->
<h2>MatchDecider</h2>
<p>
Custom MatchDeciders can be created in C#; simply subclass
Xapian.MatchDecider, and define an
Apply method that will do the work. The simplest example (which does nothing
useful) would be as follows:
</p>
<pre>
class MyMatchDecider : Xapian.MatchDecider {
public override bool Apply(Xapian.Document doc) {
return true;
}
}
</pre>
<address>
Last updated $Date: 2005-12-12T02:56:23.742308Z $
</address>
</body>
</html>
|