1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
|
This file merges information from the various README.r[5-8] notes
shipped with Ben Jackson and Jay Carlson's "rogue" server patches.
The information below didn't fit well into the changelog format.
From README.r5:
1.8.0r5 is a collection of unofficial patches to Erik Ostrom's
LambdaMOO 1.8.0p6 server release. They're primarily bug fixes and
speedups. For logistical reasons they're packaged as a tar file
rather than as a collection of diffs.
It's difficult to measure MOO server performance. All we can say is
that some plausible synthetic benchmarks are now two to four times
faster. Users have noted that production systems running this code
feel much more responsive at computationally expensive tasks.
[...]
All files were run through GNU indent with settings given in
.indent.pro in an attempt to normalize coding style.
code_gen.c:
Fixed bizarre bug where uninitialized memory was accessed; usually
multiplied by zero immediately, so nobody ever noticed.
eval_env.c, db_io.c, objects.c, utils.c:
Type identifiers (TYPE_STR et al) now contain a bit flag indicating
whether additional work needs to be done when a Var of their type is
freed. This allows free_var to run inline without a case statement
when "simple" Vars are freed. Code to translate between the internal
TYPE_STR and the previous external representation added.
db_verbs.c, db_objects.c:
(This part is primarily Jay's fault, so we'll let him talk about it
using the first person.)
The verb lookup cache. Traditionally, the server has spent large
amounts of time searching for what verbcode to run. MOO verbs can
have aliases ($object_utils:descendents/descendants), incomplete
specification ($room:l*ook), and command-line verbs distinguished by
args...and verb definition order matters during lookup! These
features ruled out the naive speedup of just dumping all verbdefs in a
hash table per object.
I decided not to work too hard on improving the performance of command
line verb lookups. Any solution that addressed them looked to be many
times more complex than just fixing verbs calling verbs
(db_find_callable_verb), and the later appeared more significant to
overall performance.
Originally I built a 7 element per object table to cache lookups but
this significantly inflated the server size relative to the
performance increase. If you're interested in this, it's in the
moo-cows archive as one of the steak patches.
My current solution to lookup performance is to build a global hash
table mapping
(hash(object_key x target_verbname), object_key, target_verbname)
=> (verbdef, handle)
used only for callable verb lookups.
Any action on the db that could affect the validity of this table
clears the whole table by calling
db_priv_affected_callable_verb_lookup(). Here's a list:
recycle()
renumber()
chparent(): in some circumstances
add_verb()
delete_verb()
set_verb_info(): name changes, flag changes
set_verb_args()
Since a good number of objects don't have verbs on them (inheriting
all behavior from parents) I decided to use "first parent with verbs"
as the object_key. This means that all those kids of $exit don't need
to have separate table entries for :invoke or whatever. All kids of a
player class get a single entry for :tell unless the player has verbs
on emself. (Sadly, on LambdaMOO, the lag reduction feature object
places a trivial :tell on anyone using it. Since the verb is
immediately at hand the lookup is short but unavoidable for every
player using it.)
Since I use "first parent with verbs" as object_key, chparent() does
not need to clear the table that often. If the object has no verbs,
it can't be mentioned in the table directly; however, if it has
children it could indirectly affect lookup of its kids that do have
verbs. Transient objects going through the usual
$recycler:_create()/$recycler:_recycle() life cycle avoid both of
these problems and in this release no longer trigger a flush.
For this release, Ben added negative caching---failed verb lookups are
stored in the table as well.
The table itself is implemented as a fixed number of hash chains. The
compiled-in default is 7507 (DEFAULT_VC_SIZE in db_verbs.c).
Statistics on occupancy are available through two new wiz-only
primitives. log_cache_stats() dumps formatted info into the server
log; verb_cache_stats() returns a list of the form:
{hits, negative_hits, misses, table_clears, histogram}
where histogram is a 17 element list. histogram[1] is the number of
chains with length 0; histogram[2] is the number of chains with length
1 and so on up to histogram[17] which counts the number of chains with
length of 16 or greater.
hits, negative_hits, misses, and table_clears are counters only zeroed
at server start. The histogram is a snapshot of current cache
condition. If you're running a really busy server you can overflow
the hits counter in a few weeks; your server won't crash but values
reported by these functions will be wrong. Yes, LambdaMOO executes
*billions* of verbs in a typical run.
If you start fretting about how much memory the lookup table is using,
write a continuously running verb that forces one of the table clear
conditions.
extensions.c, db_tune.h:
The functions in extensions.c that provide verb cache stats need to
talk to the db layer's internals in order to gather information, but
they aren't part of the db layer proper. db_tune.h was invented as a
middle ground between db.h and db_private.h for source files that
needed access to implementation-specific interfaces provided by the db
layer.
Comments (and suggestions on a better name!) on this are solicited.
decompile.c, program.c:
When errors are thrown, the line number of the error is included in
the traceback information. Mapping between bytecode program counter
and line number is expensive, so each Program now maintains a single
pc->lineno cache entry---hopefully most programs that fail multiple
times usually fail on the same line.
eval_env.c, execute.c:
To avoid calling malloc()/free() as often, the server now keeps a
central pool of rt_stacks and rt_envs of given sizes. They revert to
malloc()/free() for large requests.
execute.c:
General optimization; Ben can write more extensively about this. One of
the more significant is that OP_IMM followed by OP_POP is "peephole
optimized"; this makes verb comments like
"$string_utils:from_list(l, [, separator])";
"Return a string etc";
do_some_work();
"and do some more work";
do_more_work();
much cheaper.
An important memory leak involving failed property lookups was closed.
execute.c, options.h
Because very few sites actually use protected builtin properties and
using them is a very substantial performance hit, a new options.h
define, IGNORE_PROP_PROTECTED, allows them to be disabled at
compile-time. This is the default.
functions.c, server.c:
Doing property lookups per builtin function call to determine whether
the function needs the $server_options.protect_foo treatment is
extremely expensive. A protectedness flag was directly added to the
builtin function struct; the value of these flags are loaded from the
db at startup time, or whenever the new builtin function
load_server_options() is called.
list.c:
There's now a canonical empty list.
The regexp pattern cache wasn't storing the case_matters flag, causing
many patterns to be impossible to find in the cache.
decode_binary() was broken on systems where char is signed by default.
doinsert reallocs lists with refcount 1 when appending rather than
calling var_ref/free_var on all the elements. (The general case could
be sped up with memcpy as well.)
my-types.h:
sys/time.h may be necessary for FD_ZERO et al definitions.
parse_cmd.c, storage.h:
parse_into_words was incorrectly allocating an array of (char *) as
M_STRING. This caused a million unaligned memory access warnings on
the Alpha. Created a new M_STRING_PTRS allocation class for this.
pattern.c:
fastmap was allocated with mymalloc() but freed with the normal
free(). Fixed.
ref_count.c:
Refcounts are now allocated as part of objects that can be
addref()'d. This allows macros to manipulate those counts and makes a
request for the current refcount of an object much cheaper. This
completely replaces the old hash table implementation.
storage.c:
There's now a canonical empty string.
myrealloc(), the mymalloc/myfree analog of realloc() is now available.
As a result of the changes, the memory debugging code is no longer
available. Also, since we now hold pointers to only the interior of
some allocated objects, tools such as Purify will claim a million
possible memory leaks.
tasks.c:
If a forked task was killed before it ever started, it leaked some
memory. Fixed.
utils.c:
var_refcount(Var v) added. Returns the refcount of any Var.
From README.r6:
The two big changes in r6 over r5 are:
o Bytecode optimizations to try to modify lists in-place whenever
possible. List manipulation and mutation should be orders of
magnitude faster in some cases.
o String "interning" during load; initially, there will be one and
only one in-memory copy of each identical string. (In JHCore that
means we only allocate memory for "do" once...)
From README.r7:
r7 fixes BYTECODE_REDUCE_REF. It's now safe to turn on.
[This turned out to be false.]
The default input and output buffer sizes in options.h are now 64k.
From README.r8:
r8 adds more fixes to BYTECODE_REDUCE_REF. It's now safe to turn on.
However, suspended tasks are a problem for switchover. From options.h:
* This option affects the length of certain bytecode sequences.
* Suspended tasks in a database from a server built with this option
* are not guaranteed to work with a server built without this option,
* and vice versa. It is safe to flip this switch only if there are
* no suspended tasks in the database you are loading. (It might work
* anyway, but hey, it's your database.) This restriction will be
* lifted in a future version of the server software. Consider this
* option as being BETA QUALITY until then.
|