1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206
|
<bmesing> I am currently pondering, on how to give the user a valid popcon database to use.
[19:43] <bmesing> It would require a package popcon-data, which regularly fetches the popcon data and would somehow need to trigger apt-xapian-index
[20:38] <enrico> indeed
[20:38] <enrico> a-x-i can already index popcon
[20:39] <enrico> the only thing to do is to download the data
[20:39] <enrico> uhm, I thought it could index popcon but I can't find the plugin, maybe it was some prototype in libept
[20:39] <enrico> anyway, it's trivial to index popcon in a-x-i
[20:40] <bmesing> question is, how (if) popcon-data should invoke axi
[20:40] <enrico> it shouldn't: it should just add a plugin to /usr/share/apt-xapian-index/plugins/ that reads the data and adds it to the indexed documents
[20:41] <enrico> similarly to how /usr/share/apt-xapian-index/plugins/sizes.py does
[20:41] <bmesing> Yeah, but then popcon data would only be available after the next cron job has run?
[20:42] <enrico> yes
[20:42] <bmesing> Besides, basically this would be two packages: "popcon-data" and "apt-capian-popcon-plugin"
[20:42] <enrico> although popcon-data can run update-apt-xapian-index on its first install
[20:42] <enrico> no, you just do popcon-data which includes a plugin
[20:43] <bmesing> I tried to keep things conceptually clean, hence two packages..
[20:43] <enrico> just like software-center includes some xapian plugins
[20:43] <enrico> well, the plugin would be less than 1kb, no point making a package just for it
[20:43] <enrico> it'd be like packages providing hooks for logcheck, for example
[20:43] <enrico> you don't do apache and apache-logcheck
[20:43] <bmesing> ok, sounds about right.
Current version of libept:
git clone git://git.debian.org/debtags/libept.git
[19:56] <enrico> queries are better being generated by Xapian's QueryParser, which does a better job than libept did
[19:56] <bmesing> it'S more like database design..
[19:57] <enrico> so now there's just an ept::axi namespace with 4 tiny functions to basically tell you where the Xapian database is and when it's last been updated
[19:58] <enrico> you can compare the axi timestamp and the apt timestamp to see if things are fully up to date, if you want
[19:59] <enrico> and if not, you call update-xapian-index -u
[19:59] <enrico> if the index was not generated at all, you can run update-xapian-indxe
[20:01] <enrico> the user runs packagesearch after install, and apt-xapian-index hasn't finished updating the index
[20:01] <enrico> in that case, you can display a "hang on a minute until the index is created" sort of message
Xapian-Discussion
[20:40] <bmesing> Hi. I am done with ripping out ept::debtags. Is there a faster way to compute companian tags for a given tag-set then to brute force (i.e. calculating the result-set for the tagset ANDed with each available tag and checking if it is empty)?
[20:43] <enrico> definitely doable with xapian, and you get better results, even
[20:43] <enrico> link coming
[20:43] <enrico> http://www.enricozini.org/2007/debtags/axi-query-expand/
[20:43] <bmesing> Better results? There is only one correct result.
[20:44] <enrico> better int hat you get them sorted by relevance
[20:44] <enrico> so you build a query using your tags (and maybe even other keywords, if you like) and then ask for the expand set, filtered to only give you XT terms
[20:45] <bmesing> Ok, didn't yet know about expand set and filtering. Thanks. The result is complete?
[20:46] <enrico> I think so, if you pull results until Xapian gives them
[20:50] <bmesing> Thanks, I will try it out. Currently I have the brute-forcing, which is still quite fast (approx. 1s).
Xapian-Discussion
[15:10] <enrico> you basically use libxapian instead of libept
[15:10] <enrico> which you almost already do
[15:11] <enrico> and with libxapian you get package names in the end, as the "document.get_data()"
[15:11] <bmesing> Ok, so I have to learn the xapian stuff. And besides I have to use libapt-pkg right?
[15:11] <enrico> yes. However, if you think ept::Apt is useful, just keep it for now
[15:12] <enrico> I'd like to remove the TextSearch bits from libept because I think they're useless, but I have less of a problem with ept::Apt
[15:12] <bmesing> Ok, so first I'll rip of everything except ept::pt?
[15:12] <enrico> yes, that'd be good
[15:12] <enrico> I think ept::apt is the most useful bit
[15:13] <bmesing> Are there any examples out there on how to use apt-xapian?
[15:13] <bmesing> Perhaps inside libept or so?
[15:13] <enrico> ept::debtags is less useful now with apt-xapian-index, so I'd like at least to get rid of all the debtags custom indexing code
[15:13] <enrico> examples? Sure. http://www.enricozini.org/2007/debtags/apt-xapian-index/
[15:13] <enrico> from there, every post links to a "next" post
[15:14] <enrico> it uses the python API, but it's very, very similar to the C++ one
[15:14] <bmesing> Great, I think I can go from there.
[15:14] <enrico> also, there's a "jibel" in #xapian who's redoing synaptic's quick search function with xapian
[15:15] <enrico> he's worked more with xapian's queryparser and had interesting results, but I don't know the details
[15:16] <bmesing> I would also like to take it the other way round - e.g. get the tags for a given package. As far as I understand it now, this is not what xapian is for?
[15:16] <enrico> you can do it with xapian
[15:17] <enrico> but if you don't have the xapian index, you can still do it by loading /var/lib/debtags/package-tags
[15:17] <bmesing> I'd really rather stick to libraries...
[15:18] <enrico> to get the tags with xapian, you search for "XPpackagename" and get all the terms starting with "XT" in the resulting document
[15:19] <enrico> but well, for now just get rid of ept::textsearch
[15:19] <enrico> then I'll see if I can get rid of the Debtags custom indexing in libept but still keep the functions to load /var/lib/debtags/package-tags
[15:19] <bmesing> Let's see if I am using that one at all..
[15:20] <bmesing> Ok, it is there, shouldn't be much of a problem..
[15:21] <bmesing> uhh, there is some xapian code in there, within a "#if 0" block. You did this some time ago I suppose ;)
[15:22] <enrico> you really can't use ept::textsearch without using xapian things
[15:22] <enrico> basically, in ept::Textsearch I put a kind of queryparser
[15:22] <enrico> but xapian has a better one
[15:22] <enrico> so...
[15:23] <enrico> in fact, ept::textsearch makes matters more complicated, because it introduces a useless layer and one in the end uses xapian anyway
[15:23] <enrico> and the xapian API is, IMHO, really quite nice. Doesn't need extra layers on top
[15:42] <bmesing> Do I understand it right, that each package is a document and has some terms assigned to it (e.g. XTrole::program and Zgraphic). Quering happens using those terms and returns a set of package names?
[15:51] <enrico> querying is a boolean expression of those terms, and returns a list of documents sorted by relevance
[15:51] <enrico> "document" is an object from which you can retrieve the indexed terms, as well as the "document data", which is an opaque (to xapian) piece of information, which in case of apt-xapian-index I fill with the package name
[15:51] <bmesing> What I meant was the idea of packages being the documents
[15:52] <enrico> "sorted by relevance" is very important: it means you don't have to make exact queries and yhou'll still get good results
[15:52] <bmesing> Ok, so I got this right
[15:52] <enrico> for example, for stemming you just build an OR query with the terms and their stemmed forms. The result will be the best match, that is the document that match most OR terms
[15:53] <enrico> so it's quite allright to put all sorts of things in OR, and let Xapian satisfy it the best it can
[15:53] <bmesing> I do not like to much hidden logic
[15:53] <enrico> OR queries really are approximate AND queries
[15:53] <enrico> it's not very hidden, really
[15:53] <enrico> its behaviour is predictable and reasonable
[15:53] <bmesing> No, but you might not get what you'd expect
[15:54] <enrico> just, it's important (imo) to consider xapian OR queries as approximate AND queries, because it allows you to be more creative
[15:54] <enrico> like, you can OR searches with sets of tags, and still get reasonable results if the description matches well but a tag is missing
[15:55] <enrico> one of the examples in my blog series is first using the terms to look for tags, then add those tags to the proper package query. that way you get packages that make sense even if they don't contain the searched keywords in the description
[15:55] <enrico> (the "gimp is not an image editor" problem)
[15:56] <bmesing> Yeah, that's what you did with your smart search
[15:56] <enrico> xapian is fast enough that you can do all that on the fly, as one types
[15:57] <enrico> and, search-as-you-type you can just do it by taking the last, partially typed word, expand it with xapian as a prefix, then OR all the expanded terms
[15:57] <enrico> I think xapian's queryparser does that transparently if you do I don't know what
[15:57] <enrico> I really should study xapian's queryparser better. It does many useful things
[16:06] <bmesing> Damn, coding time finished, time to wake up my daugther :)
[16:07] <enrico> have fun :)
[16:08] <bmesing> Yeah, thanks for your help
[19:55] --- Disconnected (Connection reset by peer).
[21:09] <bmesing> Done
[21:10] <bmesing> I'll have to rip out the "Whole words only" and "Case sensitive" feature though. Probably they didn't make much sense after all
[21:13] <bmesing> And I'll have to tolower() all my search terms
[20:14] * Loaded log from Sun Mar 7 21:21:43 2010
[20:14]
[20:18] <bmesing> So you will keep the ept::debtags interface?
[20:19] <enrico> I'd like to keep it very, very minimal
[20:20] <bmesing> Main classes I am using are Vocabulary and Debtags, what are to you want to change there?
[20:20] <enrico> Vocabulary can stay, but I intend to get rid of the index
[20:20] <enrico> Debtags can stay, but I intend to also get rid of the index and just load /var/lib/debtags/package-tags in memory
[20:21] <bmesing> You mean by inedx the int values (ID)?
[20:21] <enrico> yes
[20:21] <enrico> Debtags I'd like it to be bascically a two-way string->set<string> mapping
[20:21] <enrico> Vocabulary, just indexed by facet or tag name
[20:22] <bmesing> Ok, in that case I believe there is no need for doing changes in my vocabulary
[20:23] <bmesing> s/vocabulary/program(
[20:23] <enrico> cool
[20:23] <enrico> also, it may take me 6 months to get any time to do it
[20:23] <bmesing> On the other hand, I could change the search to use Xapian, right?
[20:23] <enrico> so a good stategy is also use libept for squeeze as it is, and postpone thinking about it after squeeze
[20:24] <enrico> you could change the search to use xapian indeed
[20:24] <enrico> which is the best bet if you can still work (maybe without debtags) if xapian isn't there
[20:24] <bmesing> This way I would only depend on vocabulary and I would use a common search mechanism...
[20:24] <enrico> exactly
[20:24] <bmesing> I've added a depend on xapian
[20:25] <bmesing> I am also exiting packagesearch when the index is not present
[20:25] <bmesing> ... don't like to many ifs..
[20:25] <enrico> ok. which makes sense, because packagesearch is not indended for embedded which are pretty much the only platforms where a-x-i is too heavy to make sense
[20:26] <enrico> even aptitude's too heavy for my freerunner, to give an idea
[20:26] <bmesing> No, the GUI is to large for embedded ;)
[20:28] <bmesing> Great, I think I will try Xapian. Using the combined search means merging the search strings of both the apt and the debtags plugins in the central application... Just some implementation thoughts ;)
[20:33] <bmesing> Does every Debian system has the debtags database available or does this depend on debtags?
[20:36] <bmesing> And does the apt-xapian system accesses all sources listed in /etc/debtags/sources.list or only those from the package database?
[20:36] <enrico> it depends on debtags
[20:37] <enrico> apt-xapian gets data from the package database
[20:37] <enrico> or whatever is pulled by its plugins
[20:37] <enrico> but so far only the package database
[20:37] <bmesing> Ok, so if you have no debtags installed, you won't have a vocabulary, but the package tags from the database, right?
[20:56] <bmesing> which will also mean, that when I display the vocabulary and someone has added custom sources, the search results will be wrong, because the additional sources are not considered by xapian :(
[22:05] <enrico> with no debtags, you'll have the tags from the Packages file, but no vocabulary, yes
[22:05] <enrico> xapian considers all the package sources, though, because apt merges all of them in a single Packages file
[22:06] <enrico> update-apt-xapian-index' plugins simply iterate the package database with apt
[22:06] <bmesing> Yeah, but not the debtags sources
[22:06] <bmesing> I am totally getting rid of ept::Debtags now, because otherwise I will work with inconsistent data sets.
Xapian discussion
[12:38] <bmesing> enrico: Actually that being an index search, it is quite logical.
[12:38] <enrico> bmesing: it has upsides and downsides: you can search for "mc" easily, but you have a problem searching for foo in libfoo
[12:38] <bmesing> Probably some kind of tree structure.
[12:39] <enrico> bmesing: I'm exploring the idea of a debian-specific stemmer that would handle cases like libfoo, but I haven't found a decent solution on how to make the stemmer code usable from all sorts of laguuages
[12:39] <bmesing> I have to do some testing on the performance gain, if it is worth keeping it in.
[12:39] <enrico> programming lancuages, that is
[12:39] <enrico> bmesing: synaptics in experimental has apt-xapian-index search-as-you-type support
[12:40] <bmesing> enrico: That's cool.
[12:40] <enrico> bmesing: if you see apt-xapian-index, you should consider using it for searhc-as-you-type
[12:40] <enrico> bmesing: and for suggesting keywords and tags, possible also while doing search-as-you-type (it's performing enough for that)
[12:40] <enrico> link coming...
[12:41] <enrico> http://www.enricozini.org/2007/debtags/axi-searchasyoutype.html
[12:41] <enrico> http://www.enricozini.org/2007/debtags/axi-query-expand.html
[12:41] <enrico> bmesing: ^
you can gdb it and set a breakpoint on __cxa_throw, to see where the exception is actually thrown
"break __cxa_throw"
# to generate the release version use "CONFIG += my_release" as arguemnt for qmake on
# the command line - this was neccessary because "release" is declared
# by default - so it must be removed manually using "CONFIG -= release"
# however using kdevelop this is not possible so I've created this workaround
libtagcoll
tagcoll
Vocabulary - can read the tag database including the implications and description, it maps the
tags to the associated data
Open questions for libapt-front
How to update the database?
- is it simply aptFront::cache::Cache.reopen()?
- will anybody be notified?
- what are the observers observing?
- what does aptFront::cache::component::Packages::packageByName return if there is no
such package?
getCompanionTags : shouldn't it contain the tag itself?
|