Basically every file has to be checked to see if it's a directory, if I'm reading this right. Given that the standard WebLeftBar uses
WEBLIST
, this essentially renders hierarchical webs useless. See
TWiki:Support.VerySlowPerformanceOfDakar.
sub getWebNames {
my $this = shift;
my $dir = $TWiki::cfg{DataDir}.'/'.$this->{web};
if( opendir( DIR, $dir ) ) {
my @tmpList =
sort
map { TWiki::Sandbox::untaintUnchecked( $_ ) }
grep { !/$TWiki::cfg{NameFilter}/ &&
!/^\./ &&
-d $dir.'/'.$_ } readdir( DIR );
closedir( DIR );
return @tmpList;
}
return ();
}
-- ??
This can be made much faster with a grep { !/\.(txt|txt,v|lease)$/ } before the directory test.
Mindset of performance performance performance.
--
PTh
I doubt so, PTh, because all directories were opened and read anyway.
So why must TWiki discover all of its (sub)webs
again and again? Creating a web is so infrequent compared to the number
of pure view accesses. How about storing the list of all known webs in
a plain text file, updating it when we create/delete/rename a web. Maybe
this could be generalized for similar time-space tradeoffs in the code.
--
MD
grep
is evil. If you have 10,000 topics and only one sub-seb, you are still going to have to perform that RE 10,000 times is you use grep. Filtering .txt is not going to make that "much" faster. Michael's suggestion of caching the subweb names makes much more sense.
CC
TWiki:Main.PeterJones confirmed that the one line fix makes TWiki significantly faster (as I expected):
"Yesterday I disabled sub-webs at 10:00 and today at 12:00 I re-enabled subwebs and added the line that was suggested. As you can see
without subwebs performance is far better and is a little better than by enabling sub-webs with the added line."
But agreed, caching the web structure intelligently is an even faster solution.
--
PTh
I commited the one line fix to TWiki 4 and DEVELOP, 10866.
--
PTh
Shall we close this item and open another one for a cache web names enhancement?
--
PTh
SVN 10881 has further speed improvement on WEBLIST: Grep away any name with a dot before applying name filter and directory check (web names cannot contain a dot by definition).
Benchmarks with a test topic that has a WEBLIST, and the usual sidebar with the weblist. The topic is in a big web with two sub webs. Benchmarks taken with 10 sequential requests (
ab -n 10
) on a web with 1K, 5K, 10K and 50K topics. Time is mean time per request in msec.
|
1K |
5K |
10K |
50K |
original 4.0.4, hier. webs disabled |
833 |
|
837 |
|
832 |
|
831 |
|
original 4.0.4, hier. webs enabled |
938 |
(1.0) |
1152 |
(1.0) |
1399 |
(1.0) |
3503 |
(1.0) |
SVN 10881 fix, hier. webs enabled |
880 |
(1.07) |
927 |
(1.24) |
986 |
(1.42) |
1432 |
(2.45) |
The number in parenthesis is the factor in speed improvement.
--
PTh
I am setting this to "Waiting for Release". If someone wants to cache the weblist, please open a new item.
--
PTh
Nice one! I really didn't expect it to have that much effect; but then I remembered how horrendously slow -d is in perl. Good catch!
CC
Closed in 4.0.5
KJL