• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Performance test comparing Cairo and Dakar on a large data set: Search "peter" in TWiki.org's Main web with 21K topics returns 1452 hits. Search was started simultaneously, server had a load average of 1.7 ... 2.0 with some CPU cycles to spare. Measured twice, average in seconds:

Start page load End page load Delta
Cairo: 12 27 15
Dakar: 30 58 28

Delta is difference from start to end page download.

-- PTh

What is "Start page load"? "End page load"? Do you mean you ran both searches simultaneously on the same server?

AFAIK the only significant differences between Search in Cairo and Search in Dakar are:

  1. Pattern skin growths
  2. Sandbox

CC

I started both searches at the same time on the same server, twiki.org. That is Cairo and Dakar had to work with the same server load.

Start page load: Time taken until browser receives first content. End page load: Time taken to fully load page. Tests were done using Firefox on Windows client.

Not sure why Dakar performs poorly on search. In Cairo and before I spent a lot of time optimizing the search performance. Some of that might have gone lost with the store refactor?

-- PTh

I think it's due to a combination of factors.

  1. Sandbox
  2. Pattern skin growths
  3. Permissions checking
  4. Results rendering
from what I have seen so far in analysing this, (2) and (3) are by far the biggest contributor to the slowdown. To avoid access control exceptions during searches, and to minimise security risks, Dakar checks topic level permissions before returning search results. And results are formatted using a simplified rendering engine that might be eating time.

I need to get time to analyse this more carefully. Right now I'm under pressure in other areas, so I don't know when this will be.

CC

Why would patternskin size increase searching? Unless the results are rendered using pattern skin.

-- AC

Latest results with 7493. Note that these numbers need to be taken with a grain of salt, they fluctuate based on server CPU load and network condition.

Skin Release Start page load End page load
Pattern Cairo: 21 30
Dakar: 27 62
factor: 1.3 2.0
Classic Cairo: 24 27
Dakar: 54 79
factor: 2.3 2.7

Dakar with the pattern skin performs now better, although is still around 30% slower. This is with "start page load".

I noticed that the "end page load" is mainly due to browser rendering, the CPU of the notebook goes into 100% between start and end page rendering. This is probably browser specific. Using Firefox (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0)

I am puzzled by the classic vs pattern numbers. Classic is actually slower than pattern, e.g. the opposite when rendering a small topic.

-- PTh

"Racing" two processes simultaneously aganst each other on the same machine, even if it is a multi core, is a very poor way to make measurements.

A clear demonstration of this is to run something non trivial from the command line. Run it a few time conurrently or in rapid sequence, even on an otherwise unloaded machine and you will see a spread in timings.

cd ~apache/twiki/data/TWiki
cat *.txt | tr " " "\n" | tr "[:upper:]" "[:lower:]" | tr -d "[:punc:]" | sort | uniq | wc -l

When I run that as a single process under time I get

real user sys
3.07 3.01 0.05
I've deliberately run this on a slower machine to make the results more 'measurable'. On my dual core machine user and sys are both 0.00 so all I'm measurieng is disk access time.

But when I run two in parallel (on the slower machine) I get

real user sys
5.90 3.01 0.06
6.15 3.01 0.06
and for five
real user sys
14.21 3.02 0.05
14.93 3.01 0.07
15.23 3.01 0.07
15.31 3.02 0.06
15.42 3.01 0.06

  • Now add in variations over the network.
  • Now add in variation at the browser doing the rendering with its caching of images, fonts etc

If you are going to run tests like this you at least need to use wget or cURL

I'm sorry, Peter, I don't see your test as meaningful. Your observation about 'end page load' pretty much says why you shouldn't be using a browser for making these measurements.

  • I disagree. TWiki is primarily used by humans, not machines. An increased gap between start & end page load time is an indication that the amount of data returned to the browser is larger and/or the browser is struggling with the rendering the page (such as many nested tables). That is something that can be optimized so that users don't have to wait too long... -- PTh

-- AJA

While I agree that racing processes isn't good benchmarking practice, the fact is that Peter's results are borne out by my own benchmark runs. The reason pattern is faster for small topics is (I think) that "new" pattern sends fewer bytes to the browser. However big pages take longer to compile, because of internationalisation and increased use of variables to build the pages.

  • Actually, it is the other way around. Pattern is slower than Classic for small topics, and Classic is slower than Pattern for the large search result (by a factor of 2 for Dakar / same range for Cairo). -- PTh

The hit to search was coming from 2 places, I think; first, the Prefs handling (that was one of the first things I refactored, and I didn't appreciate that Perl is quite unlike any other language when it comes to optimising. (Perl is optimised for bad code, so operations that should be slow have been totally optimised, while good coding practice has been all but ignored. So I had to nasty up the code again to make it faster). The second place is results rendering. I'm not quite sure yet why, but I'm investigating. It's really difficult without a decent profiler.

CC

Yeah, as anticipated the rendering in the search was eating time. Note that 99% of the search module is identical to Cairo; the code is so complex, and the reasons for some design decisions unfathomable, that I really didn't want to touch it.

You should compare formatted search and bookview as well, since they take different code paths.

SVN 7505

CC

According to my benchmarks (notoriously unreliable due to the load on my machine) Dakar is on a par with Cairo, PatternSkin "normal" searching over a large number of topics. Closing this; if further benchmarks indicate other issues, please re-open.

CC

ItemTemplate
Summary Search performance issue with large data set
ReportedBy PeterThoeny
AppliesTo Engine
Component

Priority Urgent
CurrentState Closed
WaitingFor

Checkins 7482 7483 7484 7505 7507 7508
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r15 - 2005-11-18 - CrawfordCurrie
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback