[1]. We have observed many times process of indexing breaks while Indexing (kinoindex and kinoupdate scripts) the attachments to the topics. It would be good idea to use Error Perl module to capture the error and continue indexing.
[2]. Change the place of keeping information of "Skip attachments". Currently this information is handled by KINOSEARCHINDEXSKIPATTACHMENTS variable. I think the best way would be to use architecture of
TagMePlugin. Use the directory {TWIKI_ROOT}/working/work_areas/SearchEngineKinoSearchAddOn to keep the file names. Following snippet can give good idea, "/var/www/twiki" is TWIKI_ROOT in this example.
bash-3.2$ pwd
/var/www/twiki/working/work_areas/SearchEngineKinoSearchAddOn
bash-3.2$ ls
_skip_.Attachmentsweb.MyTopic.txt
bash-3.2$ cat _skip_.Attachmentsweb.MyTopic.txt
ProblematicAttachment.pdf
bash-3.2$
In this snippet,
ProblematicAttachment.pdf is attached to the topic "MyTopic" from "Attachmentsweb" Web. The kinoindex crashes while indexing because of this file, so skip from indexing it.
This also means - some API is required to manage these files. SearchEngineKinoSearchPlugin (which is part of addon) would be the best place to handle this.
[3]. The Error module as mentioned in [1] should be able to capture the error and update the work area mentioned in [2] to skip the attachments while indexing next time.
[4]. On fresh implementations of the TWiki, "index" directory is expected to be empty if "kinoindex" or "kinoupdate" is never run. Modify the templates related to
SearchEngineKinoSearchAddOn to display meaningful messages when some one runs kinosearch query.
[5].Some way to monitor the recent "kinoindex" or "kinoupdate" logs through browser. I think the best way would be to improve the SearchEngineKinoSearchPlugin to display the last few lines of log files. Use Perl modules like File::Tail to monitor the files. If log files does not show lines "Indexing Complete" - gives indication to "admin" to take next action.
[6]. The "rest" handlers included with SearchEngineKinoSearchPlugin gives way to start the index/update using browser. The current code allows anyone to run these jobs. OR, May be I am not aware of setup of these restrictions. May be adding access control - like only "admin" or members of "TWikiAdmin" group should be able to start these jobs.
[7]. Some way of cleaning/empty "index" directory using browser? This can go with SearchEngineKinoSearchPlugin with access restricted to "admin" or "TWikiAdminGroup" members.
--
TWiki:Main.SopanShewale
- 30 Jan 2009
Added module for converting docx file into text. The module depends on
docx2txt
.
--
TWiki:Main.SopanShewale
- 11 Aug 2009
Added support for conversion of pptx and xlsx files. The xlsx support is based on Spreadsheet::XLSX perl module from cpan repository.
The pptx conversion depends on pptx2txt. I am planning to upload that soon on sourceforge.net repository. Currently attached in
SearchEngineKinoSearchAddOnDev
Feedback on the tool - most welcome.
--
TWiki:Main.SopanShewale
- 18 Aug 2009
In process of building new release of Add on
--
TWiki:Main.SopanShewale
- 08 Oct 2009
added -
KINOSEARCH_ATTACHMENT_INDEX_SIZELIMIT
variable support
--
TWiki:Main.SopanShewale
- 30 Jan 2010