1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
|
A couple of emails accompanying the patches from Jens Alfke.
From: Jens Alfke <jens@mooseyard.com>
Date: 13 January 2008 01:01:10 GMT
To: stig@brautaset.org
Subject: Cocoa JSON optimizations
Stig,
Thanks for writing the JSON framework for Cocoa!
> (Note that both our libraries puts up a really bad show compared to
> all the Perl modules Marc measured. A bit embarrassing that...)
I took a look at optimizing the generation of JSON. With a couple of
tweaks I made it about 11x faster on the large test string from Yahoo.
The patch (from SVN top-of-tree) is enclosed.
The main problem was that the way the code was structured, with each
level of recursion returning its result as a string, results in large
numbers of temporary NSStrings being generated and appended to each
other. A more optimal pattern for this is to instead pass an
NSMutableString to each generator method, which it can append to. That
way a single mutable string gets re-used.
The next bottleneck was the way -[NSString JSONFragmentWithOptions:]
iterates over every character in the string. Getting each character is
slow, and appending the characters one at a time to the output is even
slower. Fortunately this process is only necessary if the string has
characters that need escaping, so I built a static NSCharacterSet of
those characters, and then I test every string to see if it contains
any of them. If not, it can simply be appended as-is.
A lot of strings were still being escaped because they contained "/"
characters. I looked at the JSON RFC, and it says that only control
characters, double-quotes and backslashes need to be escaped, so I
took out the special case for "/". That helped.
Finally, I changed the array iterations in the NSArray and
NSDictionary methods to use the new Leopard "for...in" syntax (but
only if being compiled for Leopard.) That shaved a few percent off the
time, since for...in uses a new, more efficient iteration mechanism.
I didn't benchmark with any of the pretty-printing options turned on.
The code for those could be optimized to reduce the number of string
operations; but on the other hand, any client of the library wanting
maximum performance is probably going to turn off pretty-printing
anyway!
If my enthusiasm persists, I might look at the JSON parsing code too,
but no promises :)
--Jens
From: Jens Alfke <jens@mooseyard.com>
Date: 13 January 2008 06:15:19 GMT
To: stig@brautaset.org
Subject: Re: Cocoa JSON optimizations
Stig,
As promised, I looked at the parser as well. This was trickier to
optimize, but by throwing several tricks at it I got a nice 5x
speedup.
The main problem was just that NSScanner itself is really slow, for
some reason. So part of what I did was just to call -scanString: fewer
times. I wrote a faster -scanJSONChar: method that scans for a single
character, since that's what most of the -scanString: calls were
doing.
I also changed the top-level scanning sequence so that -scanJSONValue
gets the next non-whitespace character and then tests that to see
which type of entity to scan. That avoids a bunch of repeated scans
for a '{', a '[', a '"', etc.
I re-ordered some of the logic in -scanJSONObject: to reduce the
number of strings that get scanned for.
I restructured -scanJSONString: to append substrings in chunks instead
of character-by-character. I also used a lower-level string iterator
from CFString that's faster than -characterAtIndex, as well as another
CFString function to append unichars to a string.
This was fun, actually! Kind of like solving a logic puzzle. I kept on
running 'sample' and figuring there must be one more thing I could do
to shave off more time...
I think there's a bit more room, but it would involve rewriting the
code to stop using NSScanner at all. Instead, a couple of fast scan
functions based on CFStringInlineBuffer would do the job. I think this
would also make the code clearer, since I've left it as a mishmash of
NSScanner and direct string access.
The source changes were extensive, so the entire file is probably
clearer than a diff:
--Jens
|