203 Lotus blogs updated hourly. Who will post next? Home | Blogs | Search | About 
 
Latest 7 Posts
Gone, but not forgotten
Mon, Nov 13th 2017 30
Cannot get past Context Roots page in Engagement Center
Fri, Nov 10th 2017 5
Cannot update RunAs role in Connections 6.0 with WAS FP10
Fri, Oct 20th 2017 7
Limiting resources used by IBM Cloud private and Orient Me
Tue, Jul 4th 2017 4
IBM Connections Files plugin not working within Notes when TLSv1.2 is enforced
Mon, Jun 19th 2017 4
Touchpoint problem due to no search index
Thu, Jun 1st 2017 5
test
Fri, May 19th 2017 3
Top 10
Gone, but not forgotten
Mon, Nov 13th 2017 30
LDAP error code 49 – Failed, invalid credentials – user cannot log in to Connections
Thu, Jan 8th 2015 14
Sametime 9 business cards revisited and beaten!
Thu, Aug 21st 2014 9
IBM Sametime Video Manager start up scripts
Fri, Apr 22nd 2016 8
Sametime audio and video failing due to business cards
Tue, Feb 4th 2014 7
Audio and video not woriking in a web browser due to LtpaToken “undefined”
Mon, Feb 17th 2014 7
Solution for Sametime connection issue with iNotes when SSL is used
Thu, Jun 26th 2014 7
IBM Connections Metrics graphs stop working after 365 days with CAM-CRP-1098 error about CSK
Fri, May 6th 2016 7
Cannot update RunAs role in Connections 6.0 with WAS FP10
Fri, Oct 20th 2017 7
Stproxyconfig.xml is overwritten with incorrect APNs port and changed when applying an update
Fri, Aug 15th 2014 6


CCM/FileNet search index fails in IBM Connections 4.5 due to special character
Twitter Google+ Facebook LinkedIn Addthis Email Gmail Flipboard Reddit Tumblr WhatsApp StumbleUpon Yammer Evernote Delicious
collaborationben    

The customer told me that his search index never completed correctly when Connections was initially deployed and now users are complaining that search results do not contain CCM documents.

The customer had tried recreating the index but to no avail and called me to take a look.

I first enabled trace on one of the infrastructure nodes (*=info: com.ibm.connections.search.index.indexing.*=all: com.ibm.connections.search.seedlist.*=all: com.ibm.connections.httpClient.*=all: com.ibm.connections.search.index.indexing.EcmFilesIndexer=all) as detailed in http://www-01.ibm.com/support/docview.wss?uid=swg21636559

I then created a back ground index as detailed in, Creating a back ground index and tailed the trace.log and SystemOut.log. To create the background index I ran the following commands on the Windows server.

cd c:IBMWebSphereAppServerprofilesDmgr01bin

wsadmin.bat -lang jython -username wasadmin -password ********

execfile(“searchAdmin.py”)

SearchService.startBackgroundIndex(“c:/IBM/Connections/background/crawl”, “c:/IBM/Connections/background/extracted”, “c:/IBM/Connections/background/index”, “ecm_files”)

I found that the indexing process finished abruptly about 3500 documents in (with another 6500 odd remaining).

[10/09/14 09:15:59:293 BST] 0000007a SeedlistPagin < com.ibm.connections.search.seedlist.parser.impl.SeedlistPaginationHandler resolve RETURN https://connections.acme.com/dm/atom/library/8DB6D184-AAF5-41F3-A28D-D1B7BEF17967%3BC11D230C-66A5-4CEB-8906-EAB19DFE0B8D/document/%7B5DEBC165-CDF6-4672-8300-A3345507867F%7D/media/%33%35%20%28%32%30%31%34%29%20%34%33%2d%38%35%20%54%68%65%20%53%79%73%74%65%6d%73%20%54%61%6e%74%6164%66?follow=true
[10/09/14 09:15:59:293 BST] 0000007a SystemErr     R   [Fatal Error] :23466:346: An invalid XML character (Unicode: 0x2) was found in the element content of the document.
[10/09/14 09:15:59:293 BST] 0000007a SeedlistEntry 2 com.ibm.connections.search.seedlist.crawler.impl.SeedlistEntryIterator hasNext CLFRW0063E: SAX parser error.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x2) was found in the element content of the document.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.ibm.connections.search.seedlist.crawler.impl.SeedlistPage.parse(SeedlistPage.java:86)
at com.ibm.connections.search.seedlist.crawler.impl.SeedlistEntryIterator.hasNext(SeedlistEntryIterator.java:102)
at com.ibm.connections.search.index.process.work.IndexingWork.run(IndexingWork.java:205)
at com.ibm.connections.search.index.process.initial.InitialProcess.index(InitialProcess.java:493)
at com.ibm.connections.search.index.process.initial.InitialProcess.index(InitialProcess.java:444)
at com.ibm.connections.search.index.process.initial.InitialProcess.run(InitialProcess.java:332)
at com.ibm.ws.asynchbeans.J2EEContext$RunProxy.run(J2EEContext.java:265)
at java.security.AccessController.doPrivileged(AccessController.java:229)
at com.ibm.ws.asynchbeans.J2EEContext.run(J2EEContext.java:1165)
at com.ibm.ws.asynchbeans.WorkWithExecutionContextImpl.go(WorkWithExecutionContextImpl.java:199)
at com.ibm.ws.asynchbeans.CJWorkItemImpl.run(CJWorkItemImpl.java:236)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1690)

I took the URL (which has been edited) and logged in using an administrative account and was provided with a pdf. I initially believed that it must have been the contents of the document that caused the problem so I uploaded the same document to a 4.5 CR4 server I run in the lab and couldn’t reproduce the problem.

I raised a PMR and they came back and said that problem is likely to be due a special character in the description and not in the document itself.

I looked at the trace.log and found reference to the seedlist xml that was being processed at the time.

[10/09/14 09:52:26:121 BST] 0000007a SeedlistPersi > com.ibm.connections.search.seedlist.crawler.impl.SeedlistPersistenceManager getSeedlistDirs ENTRY ecm_files
[10/09/14 09:52:26:121 BST] 0000007a SeedlistPersi < com.ibm.connections.search.seedlist.crawler.impl.SeedlistPersistenceManager getSeedlistDirs RETURN ecm_files, [c:IBMConnectionsbackgroundcrawlseedlists-ecm_files-initial-1410267828454]
[10/09/14 09:52:26:121 BST] 0000007a SeedlistPersi < com.ibm.connections.search.seedlist.crawler.impl.SeedlistPersistenceManager getSeedlistDir RETURN c:IBMConnectionsbackgroundcrawlseedlists-ecm_files-initial-1410267828454
[10/09/14 09:52:26:121 BST] 0000007a SeedlistFetch 3   seedlistFile = [c:IBMConnectionsbackgroundcrawlseedlists-ecm_files-initial-14102678284541410267828454-00007.xml]
[10/09/14 09:52:26:121 BST] 0000007a SeedlistFetch 2   Retrieving seedlist content: https://connections.acme.com/dm/atom/seedlist/myserver?useLocalFS=true&Start=3500&Action=GetDocuments&Format=xml&Range=500
[10/09/14 09:52:26:121 BST] 0000007a SeedlistFetch 3   Retrieving seedlist from file: 1410267828454-00007.xml

I opened the xml in Notepad++ and searched for the document name which I obtained from the URL previously and found a match. In one of the fields I see the following.

1

I provided the community and library that the document resided in and the customer couldn’t view the description data in the web browser. The customer made some changes to the field via the FileNet interface and once the special character was removed the data showed in the web browser.

To check whether the index is created correctly after this change I ran the background index again but wrote the files to a new location. If you run the command again to the same location as the initial background index then it will fail  because the seedlist will not have been recreated and the original special character is retained.

To speed things up, copy the extracted files from the previ0us location to the new extracted files. This customer had over ten thousand CCM documents so extracting them all again was time consuming.

I had to iterate this process four times until all the special characters were removed. Once you have an INDEX.READY file then I repeated the process for all the applications by copying over the extracted files and using SearchService.startBackgroundIndex(“c:/IBM/Connections/background/crawl”, “c:/IBM/Connections/background/extracted”, “c:/IBM/Connections/background/index”, “all_configured”) which built an index successfully.

I then used the steps in the IBM wiki to replace the current with the new index.

It turns out that the customer used a scripted import facility to import all the documents into CCM and this process introduced these characters.




---------------------
http://collaborationben.com/2014/09/22/ccmfilenet-search-index-fails-in-ibm-connections-4-5-due-to-special-character/
Sep 22, 2014
7 hits



Recent Blog Posts
30
Gone, but not forgotten
Mon, Nov 13th 2017 10:45a   Ben Williams
A
5
Cannot get past Context Roots page in Engagement Center
Fri, Nov 10th 2017 4:13p   Ben Williams
A
7
Cannot update RunAs role in Connections 6.0 with WAS FP10
Fri, Oct 20th 2017 10:01a   Ben Williams
A
4
Limiting resources used by IBM Cloud private and Orient Me
Tue, Jul 4th 2017 3:26p   Ben Williams
A
4
IBM Connections Files plugin not working within Notes when TLSv1.2 is enforced
Mon, Jun 19th 2017 2:46p   Ben Williams
A
5
Touchpoint problem due to no search index
Thu, Jun 1st 2017 4:39p   Ben Williams
A
3
test
Fri, May 19th 2017 7:40p   Ben Williams
A
1
Sametime file transfer not working due to chat logging settings
Thu, Apr 27th 2017 7:11a   Ben Williams
A
3
Orient Me and mongoDB connection failures
Thu, Apr 20th 2017 3:16p   Ben Williams
A
3
Orient Me and some things I’ve come across and wrestled with
Thu, Apr 13th 2017 7:04p   Ben Williams
A




Created and Maintained by Yancy Lent - About - Planet Lotus Blog - Advertising - Mobile Edition