• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item6177: Enhancements in SearchEngineKinoSearchAddOn

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Extension SearchEngineKinoSearchAddOn Enhancement Confirmed TWiki:Main.SopanShewale n/a  

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

[1]. We have observed many times process of indexing breaks while Indexing (kinoindex and kinoupdate scripts) the attachments to the topics. It would be good idea to use Error Perl module to capture the error and continue indexing.

[2]. Change the place of keeping information of "Skip attachments". Currently this information is handled by KINOSEARCHINDEXSKIPATTACHMENTS variable. I think the best way would be to use architecture of TagMePlugin. Use the directory {TWIKI_ROOT}/working/work_areas/SearchEngineKinoSearchAddOn to keep the file names. Following snippet can give good idea, "/var/www/twiki" is TWIKI_ROOT in this example.

bash-3.2$ pwd
/var/www/twiki/working/work_areas/SearchEngineKinoSearchAddOn
bash-3.2$ ls
_skip_.Attachmentsweb.MyTopic.txt
bash-3.2$ cat _skip_.Attachmentsweb.MyTopic.txt
ProblematicAttachment.pdf
bash-3.2$

In this snippet, ProblematicAttachment.pdf is attached to the topic "MyTopic" from "Attachmentsweb" Web. The kinoindex crashes while indexing because of this file, so skip from indexing it.

This also means - some API is required to manage these files. SearchEngineKinoSearchPlugin (which is part of addon) would be the best place to handle this.

[3]. The Error module as mentioned in [1] should be able to capture the error and update the work area mentioned in [2] to skip the attachments while indexing next time.

[4]. On fresh implementations of the TWiki, "index" directory is expected to be empty if "kinoindex" or "kinoupdate" is never run. Modify the templates related to SearchEngineKinoSearchAddOn to display meaningful messages when some one runs kinosearch query.

[5].Some way to monitor the recent "kinoindex" or "kinoupdate" logs through browser. I think the best way would be to improve the SearchEngineKinoSearchPlugin to display the last few lines of log files. Use Perl modules like File::Tail to monitor the files. If log files does not show lines "Indexing Complete" - gives indication to "admin" to take next action.

[6]. The "rest" handlers included with SearchEngineKinoSearchPlugin gives way to start the index/update using browser. The current code allows anyone to run these jobs. OR, May be I am not aware of setup of these restrictions. May be adding access control - like only "admin" or members of "TWikiAdmin" group should be able to start these jobs.

[7]. Some way of cleaning/empty "index" directory using browser? This can go with SearchEngineKinoSearchPlugin with access restricted to "admin" or "TWikiAdminGroup" members.

-- TWiki:Main.SopanShewale - 30 Jan 2009

Added module for converting docx file into text. The module depends on docx2txt.

-- TWiki:Main.SopanShewale - 11 Aug 2009

Added support for conversion of pptx and xlsx files. The xlsx support is based on Spreadsheet::XLSX perl module from cpan repository.

The pptx conversion depends on pptx2txt. I am planning to upload that soon on sourceforge.net repository. Currently attached in SearchEngineKinoSearchAddOnDev

Feedback on the tool - most welcome.

-- TWiki:Main.SopanShewale - 18 Aug 2009

In process of building new release of Add on

-- TWiki:Main.SopanShewale - 08 Oct 2009

added - KINOSEARCH_ATTACHMENT_INDEX_SIZELIMIT variable support

-- TWiki:Main.SopanShewale - 30 Jan 2010

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r19 - 2010-01-30 - SopanShewale
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback