As I've tweeted I have spent the last couple of days (and the weekend) helping out a customer that exceeded the hard 64 gb database size limit in Lotus Domino. Before discussing how we solved the problem and got the customer back in business I would like you to think about how situations like this could be avoided. And avoiding it is key as once you exceed the size you're doomed.
First --- how and why database platform would EVER allow a database to cross a file size that makes it break. Why doesn't Domino start to complain at 50gb and make the warnings progressively harder to ignore as the database gets closer to 64gb. Why doesn't it refuse data once it reaches 60gb? I find it totally unacceptable that a software product allows a database to exceed a size it knows it cannot handle.
Now I know that there are considerations for such a warning and that it could be done in application code (e.g. database script, QueryOpen event) but it really isn't something an application developer should think about. Also it should be applied to backend logic as well and really doesn't lend itself to a UI computation. I also know that DDM or similar could warn about it but it still doesn't change my stance. The 64gb limit is a hard limit and reaching, and exceeding it, shouldn't depend on me configuring a specific piece of functionality.
Second -- having the option of keeping the view index in another location/file than the database would have helped. This has been brought up a number of times including at Lotusphere Ask-The-Developers sessions. One could argue that externalizing the view index from the database would just have postponed the problem but the view index takes up a substantial amount of disk for databases of this size.
Now on to how we saved the data.
The bottom line in this is that the customer was lucky. VERY lucky. The customer uses Cisco IP telephones and keeps a replica of the database in question on a secondary server for phone number lookup using a Java servlet. Due to the way the way the servlet is written only as single, very small, view was built on the secondary server. This is turn meant that the database that had exceeded 64 gb on the primary server was "only" 55 gb on the secondary server. The database on the primary server was toast and gave out very interesting messages if attempting the access or fixup the database:
**** DbMarkCorruptAgain(Both SB copies are corrupt)
So thank God they had the secondary server otherwise the outcome of the story would have been less pleasant because using the secondary server we were able to:
Take the database offline (restrict access using ACL)
Purge all view indexes (using Ytria ViewEZ)
Create a database design only copy to hold archived documents
Delete all views to avoid them accidentally being built
Build a very simple view to prepare for data archiving
Write a LotusScript to archive documents (copy then delete) from the database
Use Ytria ScanEZ to delete deletion stubs from the database (this works for them because the database isn't replicated to user workstations or laptops)
Do a compact to reclaim unused space
Make the database available on the primary server
Whew! They are now back in business after building views in the database. They were lucky - VERY lucky. If they hadn't had that secondary replica the data would probably have been lost to much distress. To them and me.
So what are the main take aways from this?
UI check -- in the future all databases that I develop will have a database script check on the database size to try and prevent situations like this
DAOS -- enable DAOS for databases to keep attachments out of the database and keep the size down
Monitoring -- monitor databases either using DDM or other tools to try and prevent sitations like this
And so concludes a story from the field. 4 days later where my hair have turned gray from watching copy/fixup/compact progress indicators the customer is back in and happy once again. Whew!!
Authentication vs. Authorization
Wed, May 8th 2013 7:39a Mikkel Heisterberg When ever I talk to customers and partners about single-sign-on (SSO) and the concepts of "authentication" I'm quite often baffled by the level of misunderstanding, misconception and lack of knowledge about just how "authentication" works. Now the reason I put "authentication" is quotes is that when we talk about authentication it's really not just authentication we're talking about. When we talk about confirming the identity of a user and confirming that the user is allowed to access a [read] Keywords: acl
Websphere Application Server WIM LDAP adapter log trace
Thu, May 2nd 2013 12:50a Mikkel Heisterberg When debugging LDAP login issues for Websphere Application Server (WAS) you're actually debugging the WIM (Websphere Identity Manager) part of WAS. The actual login piece is part of the adapters (database, ldap, file) which is the repository specific piece that WIM delegate the actual authentication to. The best debug string to use is "com.ibm.ws.wim.adapter.ldap.*=finest" as it limits the debugging to the LDAP piece of WIM. [read] Keywords: ibm
Setting up LDAP failover for Websphere Application Server
Wed, May 1st 2013 2:16a Mikkel Heisterberg As you may know LDAP is crucial to Websphere Application Server (WAS) when using it for IBM Connections so it makes good sense to configure failover for LDAP. If the LDAP server becomes unavailable you can no longer log in (actually you can't even log into ISC) and WAS can have a hard time reconnecting to the LDAP. Failover is set up using either the ISC Federated Security UI or by editing wimconfig.xml directly (or using wsadmin commands). Using wimconfig.xml have some advantages as you can se [read] Keywords: connections
Fixing IBM Connections help for IE users
Mon, Apr 15th 2013 5:13a Mikkel Heisterberg At a customer site they were actually using the IBM Connections help documents (a first I know) but it didn't work for the users in Internet Explorer. After some research it turned out to be due to a missing compatability statement in the generated HTML documents (this statement is present in HTML generated for other features). I've previously reported this issue to IBM but it still hasn't been fixed in version 4.0 CR3 so I took it upon me to find a solution. The solution turned out to be sim [read] Keywords: connections