• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item7498: LdapContrib May 2014 Update

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Extension LdapContrib Normal Closed   n/a  

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

LdapContrib May 2014 Update

In this topic, I will describe the changes implemented into the official LdapContrib add-on for TWiki, as a result of my internship at CERN from February 2013 to April 2014. During this time I worked with the CERN TWiki, and LdapContrib in particular. CERN had requirements which led to the add-on being enhanced on numerous areas, as discussed here. Even though my appointment with CERN has ended, I will still maintain this add-on, especially in case of bugs related to the changes discussed here.

Do not hesitate to contact me: terje.andersen+twiki@gmailNOSPAM.com.

Overview

As a result of the changes to LdapContrib described in this document, configuration parameters and methods of the LdapContrib add-on has been added and removed.

In the following pictures (click to enlarge), added and removed entities are marked in green and red respectively:

LdapContribChanges2.jpg

LdapContribChanges.jpg

Error in the picture: {Precache} is not replaced with {PreCache}. {Precache} is still in use, with possible values being all, existing or off instead of 1 (true) or 0 (false).

Sections explaining new and removed methods in TWiki::Contrib::LdapContrib:

  • emptyIgnoredGroups(), emptyIgnoredUsers(), getAllIgnoredGroups(), getAllIgnoredUsers(), getAllUnknownGroups(), getAllUnknownUsers(), removeIgnoredGroups() and removeIgnoredUsers(): * Hopefully selfexplanatory methods related to the ignored users- and groups lists. These contains values that have previously been looked up in LDAP wihout result. * To avoid spamming the LDAP servers, these lists are present to not ask for these values again.
  • getDateOfLogin(), see section 10.6.1.
    • See section Adding created date and updated date for WikiNames in the LdapContrib database
  • getCacheTie() and untieCache():
    • See section Adding locking mechanisms to the database.
  • initCache() (Removed):
    • See section Option to require CLI environment for database refresh. and Preserving users which are removed from LDAP to not lose their WikiName.
  • getDnOfGroup()
    • See section Regular expressions for users and groups being looked up.
  • getRefreshMode()
    • See section Option to require CLI environment for database refresh.
  • getTWikiUserMapping()
    • See section Option to preserve existing users before LdapContrib.
  • isIgnoredGroup() and isIgnoredUser() is a refactoring of the code. Prior to the project, all ignored users and groups were fetched and then controlled in-place. We separated this logic to make the code more coherent. Instead of spending X lines on determining if a value is either a ignored group or user, only one is needed.
  • isValidCacheGroupName() and isValidCacheLoginName():
    • See section Regular expressions for users and groups being looked up.
  • refreshIgnoredCache():
    • See section Refreshing ignored users and groups list at database refresh.

Changes to TWiki::Contrib::LdapContrib

Adding locking mechanisms to the database.

BerkeleyDB has no locking mechanism. When two parallel processes are doing a transaction of reads and writes to the LdapContrib database, this can lead to race conditions and a corrupted database.

The Perl module DB_File has been replaced with DB_File::Lock, and two methods getCacheTie() and untieCache() has been added to the LdapContrib class.

DB_File::Lock is already a part of the TWiki distribution by being present in /lib/CPAN/lib/DB/File/Lock.pm.

Also, every method reading from or writing to the LdapContrib has been altered as following:

  1. At the beginning of the method, the lock status is read.
  2. Before the method writes to or reads from the database:
    • If the method wants to read from the database, it sets a new read lock unless a lock was set prior the method being called.
    • If the method wants to write to the database, it sets a new write lock unless a write lock was set prior to the method being called.
  3. Then, the lock is brought to it's initial state.

For example:

  1. Method A calls Method B.
  2. Method B sees that no lock is present. Since it's going to write to the database, it ties a write lock.
  3. Method B ties a write lock (exclusive lock)
  4. Some database reads and writes are done.
  5. Method B calls method c.
  6. Method C uses the already existing write lock, even though it only reads from the database.
  7. Method C returns.
  8. Method B does some additional logic.
  9. Method B returns.

This ensures that from the moment method B is called, an exclusive transaction is being carried out. The process can do multiple reads and writes spanning over several methods without other processes being able to disturb. Note that if Method B and C only performed database reads, a read lock should have been used instead of a write lock.

Preserving users which are removed from LDAP to not lose their WikiName.

By default, users which are deleted from LDAP does also get deleted from the local LdapContrib database, given that = {Precache}= is turned on. This has to do with the way precaching is carried out. To not disturb the users, a temporary database is created each time the database is being refreshed. All users are then retrieved from LDAP and iterated over. The users who already has a WikiName stored locally keeps that, the users that doesn't gets a new one. But the deleted users, which is present locally but not in LDAP never get treated, since they are not part of the LDAP results. They are not written to the temporary database, and at the end of the refreshing process, when the temporary database replaces the production database, these users "perish".

Consequences:

  • The deleted WikiNames might be used in access control lists, and with the WikiName deleted, other users might take it in the future and inherit these access rights.
  • The deleted WikiNames might be present in content signatures, and with the WikiName deleted, other users might take in the future and inherit ownership of the content.

Solution: Instead of creating a temporary database, and writing to it based on the users returned from LDAP, the following scheme will be followed:

  1. Write to memory during the refresh in order not to disturb the users, instead to a separate database file.
  2. When all the rules have been updated, the rules written to memory are dumped into to the production database.
    • Any users already present in the production database but not in memory will be preserved.

The refreshCache() method was altered to achieve this, replacing the temporary database file with a HASH in memory.

This affects the following methods in TWiki::Contrib::LdapContrib

  • initCache(): Logic to create the temporary cache before refreshCache() is obsolete and removed.
  • refreshUsersCache()

Database is force-refreshed at first run

Previously, when you set up LdapContrib, if you turned set {MacCacheAge} to zero immediately in configure, the database would refresh itself the first time you visited TWiki. You had to append ?refreshldap=on to the request, either by browser or by command line. There is a slight design now. Regardless of the {MaxCacheAge} setting, the database -will- be refreshed the first time it's created, given that {Precache} is not disabled that is.

Adding created date and updated date for WikiNames in the LdapContrib database.

When a new user enters the LdapContrib database, it is given a Wiki-Name. Also, the users email(s) and LDAP path is stored. However, prior to this implementation, the time when a user enters the database is not stored. This is of interest for a TWiki administrator overseeing a LdapContrib database. Knowledge of when a user entered TWiki can prove useful when solving problems related to TWiki identity management.

By adding two possible database keys, U2CREATED::$loginName, and U2UPDATED::$loginName, information about when a user is created and updated can be stored.

U2CREATED is set whenever a user is created, and U2UPDATED whenever it is updated.

A getter, getDateOfLogin() is implemented to retrieve the created and updated timestamp for a given user.

This affects the following methods in TWiki::Contrib::LdapContrib

  • getDateOfLogin(): Returns the created or the updated timestamp for a given user.
  • cacheUserFromEntry(): Sets U2UPDATED::$loginName if the $loginName is having its information updated, and U2CREATED::$loginName if its being created.
  • removeUserFromCache(): Removes U2UPDATED::$loginName and U2CREATED::$loginName for a given $loginName when a user is being removed from the database.
  • renameWikiName(): Sets U2UPDATED::$loginName for a given $loginName when its WikiName is being renamed.

Refreshing ignored users and groups list at database refresh.

If a LDAP search returns false for a value, the value is put in one of two ignore lists; UNKWNUSERS or UNKNWNGROUPS, keeping LDAP from being asked for this values again. Occasionally, it may happen that a invalid user or group suddenly becomes valid. For example a group which was deleted, but then re-added to LDAP.

With the design change described in the section above, these lists do not get reset at each database refresh. We could reset them, But I think a better solution is to remove the entries which is not valid at each refresh, instead of purging the lists.

Solution: add the method refreshIgnoredCache() to be run at the end of each database refresh. It looks at all the values in UNKWNUSERS and UNKWNGROUPS and removes the entries which could be found when all users and groups were retrieved from LDAP during the refresh.

This affects the following methods in TWiki::Contrib::LdapContrib:

  • refreshCache()
  • refreshIgnoredCache()

Option to preserve existing users before LdapContrib.

Support for migration from TWikiUserMapping to LdapUserMapping is already in place in the LdapContrib add-on, but with some bugs fixed, I will give an explanation of the logic.

If {PreserveTWikiUserMapping} is set to true in config, the users from Main.TWikiUsers will be taken into consideration when building the first database (and never again). This also will include the login names from Main.TWikiUsers which are deleted from LDAP If {PreserveWikiNames} is set.

So:

  • If the LdapContrib database is not created yet.
  • If the Main/TWikiUsers.txt topic is present with WikiName to login name mappings.
  • If {PreserveTWikiUserMapping} is true

Then:

  1. The Main.TWikiUsers.txt topic will be iterated over the first time LdapContrib is run, meaning the first time user visits the TWiki with LdapContrib enabled.
    • Rules with invalid WikiNames (determined by =TWiki::Func::isValidWikiWord) will be skipped
    • Rules with invalid login names (determined WikiNames (determined by {Exclude},{NormalizeLoginNames} and {LoginPattern} configure flags) will be skipped.
    • Rules with WikiNames that have been seen before in the list will be skipped.
    • Rules with login names that have been seen before in the list will be skipped.
    • Rules with invalid dates will be skipped.
  2. All users will be retrieved from LDAP, but not stored immediately.
  3. All the users from Main/TWikiUsers.txt (including users deleted from LDAP depending on the {PreserveWikiNames} flag) is added, then the other users from LDAP.

This affects the following methods in TWiki::Contrib::LdapContrib:

  • _refreshUsersCache(): Gets the Main.TWikiUsers topic rules from getTWikiUserMapping() and stores them into the LdapContrib database prior to other users in order to take precedence..
  • getTWikiUserMapping(): Reads the Main.TWikiUsers topic and returns all valid entries (no duplicate WikiName, no duplicate login name, and a valid WikiName, login name and date.

Option to not add users during cache / database refresh.

In businesses where a large portion of its users use the local TWiki, it is acceptable to create WikiNames for all possible users. But in other cases, only a fraction of the LDAP users are using TWIki. If LDAP has 20000+ users, but only 1000 of them use TWiki, why generate a WikiName for all of them?

  • Because WikiNames are unique, this would increase the difficulty of finding a unique WikiName.
  • There would be an unneeded amount of data stored, since most of the WikiNames would not belong to users actually using TWiki.

One solution could be to turn precaching off, but this would also have undesirable consequences:

  • Existing users would not have their emails and LDAP DN paths updated.
  • Group memberships would not be updated.

Solution: To change the Precache configuration flag from a Boolean value to a string value, of three choices: off, existing and all. Off is the old false, all is the old true, while existing is a new value, where:

  1. All users are retrieved from LDAP.
  2. For each user, the refresh logic checks if the user is present in the local LdapContrib database.
    • If it is, the already existing WikiName is used.
    • If it is not, the user is skipped.
  3. The WikiName, along with emails and the LDAP DN Path is stored in the local database.

In order to achieve this, two simple conditional tests is implemented to the refreshUsersCache() method:

  • Does Precache equal existing?
  • Does this user already have a WikiName?

If these conditional tests does not return true, the user is skipped. This leads to only existing users getting updated, with group memberships also updated.

This affects the following methods in TWiki::Contrib::LdapContrib:

  • TWiki::Contrib::LdapContrib::refreshUsersCache(): If the configuration flag Precache equals existing, and the current user retrieved from LDAP is not already present in the LdapContrib database, skip the user.

Option to backup the whole database after database refresh.

Configuration flags {BackupCacheFile} and {BackupFileAge} is added.

  • If {BackupCacheFile} is true, the main database will be backed up after itís refreshed.
  • If {BackupFileAge} is set, the database will only be backed up when the newest backup file is older than {BackupFileAge.}

This affects the following methods in TWiki::Contrib::LdapContrib:

  • refreshCache()
  • backupCacheFile()

Option to require CLI environment for database refresh.

Prior to this, any user coulld invoke a cache refresh by appending the HTTP GET parameter refreshldap to the TWiki URL it was visiting.

A cache refresh is potentially a heavy operation, and with a large LDAP database it may take up to several minutes.

Solution: To restrict access to refresh the database, a new configuration flag is added: {CLIOnlyRefresh}. By default it is turned off, but if a TWiki administrator decides to turn it on, the only scenario the LdapContrib cache / database can be refreshed is if the refresh was requested from the command line.

This affects the following methods in TWiki::Contrib::LdapContrib:

  • initCache() Logic to determine refresh mode moved to method getRefreshMode.
  • getRefreshMode() Method to determine if the database / cache should be refreshed or not.

Regular expressions for users and groups being looked up.

I have seen that the TWiki software engine has the tendency to ask for information about users and groups which are obviously not that. For example, we saw that the method getWikiNameOfLogin(TerjeAndersen) was called several times in a topic with author TerjeAndersen.

In some environments a value like TerjeAndersen may be a login name, but in my case it could not be, because my business had lowercase usernames.

By preventing invalid values to be queried against the LdapContrib database, we save resources, especially during peak hours in a large business where the database is very active.

Solution: add two configuration flags, {LoginPattern} and {GroupPattern}, along with two methods isValidCacheLoginName() and isValidCacheGroupName().

The methods will check the parameter passed to it against the respective regular expression in the configuration flag. If the conditional test fail, and the value is determined not to be a valid user or group name, the method returns false. This way, a look-up to the LdapContrib database or to the LDAP server can be avoided, saving resources.

This affects the following methods in TWiki::Contrib::LdapContrib:

  • isValidCacheLoginName()
  • isValidCacheGroupName()
  • All methods which asks either the LdapContrib database or LDAP for information about users or groups. (Calling either isValidCacheLoginName() or isValidCacheGroupName())
  • getDnOfGroup(): Previously, getDnOfLogin() was used to get the DN of a group, despite what the method name may indicate. However, getDnOfLogin() now checks the {LoginPattern} against the incoming parameter. So we need to create a separate method getDnOfGroup(), which checks the incoming parameter against {GroupPattern}.

Adding database key-value pair for cache refresh process.

As discussed in the section Preserving users which are removed from LDAP to not lose their WikiName, we remove the use of a temporary database in order to not loose information.

However, when we do this, an important question can no longer be answered: Is the database being refreshed right now?.

This questions was previously answered by looking for the temporary database file. If it existed, the database were currently being refreshed. When we donít have this check, two processes can come in danger of refreshing the database simultaneously.

Solution: A new database key-value pair with key CACHEREFRESHPROCESS.

The value equals the process id of the process currently refreshing it. This way, a new process wanting to refresh the cache can check this value to determine if it is already being refreshed or not.

This affects the following methods in TWiki::Contrib::LdapContrib:

  • refreshCache(): Prior to a cache refresh, the CACHEREFRESHPROCESS will be set, unless it is already set. In that case, the refresh process aborts.
  • finish(): Before object destruction, the CACHEREFRESHPROCESS flag will be removed if it was set by the same process.

Changes to TWiki::Users::LdapUserMapping

Not quering LdapContrib database for the same information multiple times.

TWiki::Users::LdapUserMapping has a tendency to ask for the same information multiple times.

For example: what is the login name for WikiName TerjeAndersen?

Each time a question like that is asked, the database is asked, and a answer is given.

Research has shown that this especially affects the methods getWikiNameOfLogin(), getLoginOfWikiName(), isGroup() and getGroupMembers() in TWiki::Contrib::LdapContrib.

Solution: By adding an extra layer of logic to this class, remembering an answer once it has been given by using a HASH, the stress on the LdapContrib database is reduced.

This affects the following methods in TWiki::Users::LdapUserMapping:

  • _getWikiNameOfLogin()
  • _getLoginOfWikiName()
  • _isGroup()
  • _getGroupMember()

All these methods follow the same logic. If the question has not been asked before, forward the request to TWiki::Contrib::LdapContrib. If it has, return the answer already given which is stored in memory.

Decode CUID's before querying the LdapContrib database.

TWiki stores META information escaped for non-alphanumeric characters. One example is the author where its login name is escaped before its stored. When TWiki later wants to know the WikiName for this login name, it asks the! UserMappingManager with the login name not yet decoded.

TWiki::Users::LdapUserMapping did not decode the CUID before asking the LdapContrib database. If a encoded login name was asked for, a undefined result was given back, since login names are stored in the LdapContrib cache without being escaped for non-alphanumeric characters.

Solution: Implement a extra layer of logic in the affected methods discussed in the section above (Not quering LdapContrib database for the same information multiple times). The incoming login name are un-escaped / decoding before itís checked against the LdapContrib database.

ldapdbtest - new tool to run tests on the LdapContrib database

The LdapContrib database is a BerkeleyDB v1 database. It is very simple, you cannot set primary keys and foreign keys like in relational databases which "arrests" you if you update one part of the database which another part of the database depends on.

I found myself doing this at numerous times during my work with LdapContrib. For example, If you want to change a login name, you have to change U2W, U2DN, U2EMAILS, U2CREATED, U2UPDATED, LOGINNAMES, W2U, DN2U etc.. If you forget just one, the database becomes corrupt, unreliable.

So I created a tool, ldapdbtest, which does a number of tests on the LdapContrib database. If it uncovers something, it means that some logic writing to the database is wrong, and has to be investigated:

/twiki/bin# perl ../tools/ldapdbtest

Test 1: Checking that each WikiUser maps to only one LoginName:
  12536 WikiUsers counted ...
  No duplicates found.

Test 2: Checking that each LoginName maps to only one WikiUser:
  12536 LoginNames counted ...
  No duplicates found.

Test 3: Given U2W::$loginName = $wikiName, and W2U::$wikiName = $reverseLoginName, check that $loginName eq $reverseLoginName:
  12536 U2W and 12536 W2U entries tested , No errors found.

Test 4: Given U2DN::$loginName = $dn, and DN2U::$dn = $reverseLoginName, check that $loginName eq $reverseLoginName:
  4 U2DN and 4 DN2U entries tested , No errors found.

Test 5: Checking that each LoginName in LOGINNAMES has an U2W entry, and vice versa
  12536 LoginNames counted from LOGINNAMES, 12536 LoginNames counted from U2W::$LoginName
  No errors found.

Test 6: Checking that each WikiName in WIKINAMES has an W2U entry, and vice versa
  12536 WikiNames counted from WIKINAMES, 12536 WikiNames counted from W2U::$WikiName
  No errors found.

Test 7: Checking that each group in GROUPS has an GROUP entry, and vice versa
  2 groups counted from GROUPS, 2 groups counted from GROUPS::$group
  No errors found.

Test 8: If we find a $loginName in U2CREATED, U2UPDATED or U2EMAILS, control that the $loginName is present in U2W and in LOGINNAMES:
  login names found in 12536 U2CREATED, 1 U2UPDATED and 2 U2EMAIL entries. Every login name was found in U2W and Loginnames. No errors found.

Test 9: Test that each $timestamp found in U2CREATED::$user = $timestamp and U2UPDATE::$user = $timestamp is valid:
  12536 U2CREATED and 1 U2UPDATED entries tested , No errors found.

Test 10: Testing for unknown keys in cache:
  Valid keys (16): WIKINAMES LOGINNAMES GROUPS UNKWNUSERS UNKWNGROUPS GROUPS GROUP2UNCACHEDMEMBERSDN EMAIL2U U2EMAIL U2W W2U DN2U U2DN U2CREATED U2UPDATED LASTUPDATED
  37628 keys tested, No errors found.

Changes to documentation topic

All the pod documentation for the Perl classes has been removed.

I believe that those cluttered the topic, and that it's an utopia that people will update it whenever the pod documentation in the classes are changed.

People that are interested in the pod documentation should have the knowledge to open the class files themselves and have a look.

I also had a go at reorganizing the topic, making more use of sub and subsub sections to organize the content more logically. I added a package diagram for LdapContrib, and provided examples on how to write your own class writing to and reading from the database.

Further work

Refactoring LdapContrib

With roughly 2500 lines of code and 48 methods in it prior to these enhancements, one might be tempted to call TWiki::Contrib::LdapContrib a so-called god class.

It is clear that this class is in need of re-factoring. It should be divided into a set of smaller classes instead of a class that does everything, ensuring:

  • Low coupling, where each class is logically independent from one another.
  • High cohesion, where the class variables and methods in each class are clearly belonging together.

Switching to a relational DBMS

In my opinion, BerkeleyDB is too simple for LdapContrib. To many lines of code goes into database control, which could have been redused with a relational DBMS like MySQL.

ItemTemplate
Summary LdapContrib May 2014 Update
ReportedBy TWiki:Main.TerjeAndersen
Codebase 6.0.0, 5.1.4
SVN Range TWiki-6.0.1-trunk, Fri, 16 May 2014, build 27393
AppliesTo Extension
Component LdapContrib
Priority Normal
CurrentState Closed
WaitingFor

Checkins TWikirev:27462
TargetRelease n/a
ReleasedIn

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg LdapContribChanges.jpg r1 manage 732.3 K 2014-05-20 - 16:35 TerjeAndersen  
JPEGjpg LdapContribChanges2.jpg r1 manage 451.3 K 2014-05-20 - 16:36 TerjeAndersen  
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2014-06-10 - HideyoImazu
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback