Spell Check Lucene in AEM 5.6.1

I am trying to implement Spell Check for one of content serach application when they do search since application is heavily based on JCR query using Query builder users pay for the content.

They need smart search features also “do you means this” features just like google.

Adobe has invested good amount of effort and and money to update/create the knowledge document but they still lacks in many module which are not core of the AEM.

Sales force connector is one the module which i believe has never worked with unlimited version of sales force. Not only me, I talked to couple of my friends they promised customer to provide the Sales force integration out of box using sales force Template it works with Free SF edition but never with unlimited edition . Adobe support or Adobe forum users are clueless and end up writing custom Sales force rest API or WSDL to talk to AEM..that’s the one story.

Coming back to the Spell Check out of the box in AEM 5.6.1.. I wanted to give “Do mean this this Feature” to the search user and i enabled spellchecker module in workplace.xml.

<SearchIndex class=”com.day.crx.query.lucene.LuceneHandler”>
            <param name=”path” value=”${wsp.home}/index”/>
            <param name=”resultFetchSize” value=”50″/>
           <param name=”indexingConfiguration” value=”${wsp.home}/indexing_config.xml”/>
           <param name=”tikaConfigPath” value=”${wsp.home}/tika-config.xml”/>
           <param name=”supportHighlighting” value=”false”/>
           <param name=”spellCheckerClass” value=”com.day.crx.core.query.spell.CRXSpellChecker$OneMinuteRefreshInterval”/>
        </SearchIndex>

I ran the index which took couple of hours and then finally i got this created crx-quickstart\repository\workspaces\crx.default\index\spellchecker.

I wrote a Querybuilder code like this..

final QueryManager manager = session.getWorkspace().getQueryManager();
           Query query = manager.createQuery(“/jcr:root[rep:spellcheck(‘”+term+”‘)]/(rep:spellcheck())”, Query.XPATH);
            RowIterator rows = query.execute().getRows();
            // the above query will always return the root node no matter what string we check
            Row r = rows.nextRow();
            // get the result of the spell checking
            Value v = r.getValue(“rep:spellcheck()”);
            if (v == null) {
               termNew = term;
            } else {
                 termNew = v.getString();
            }
       }
       catch(Exception ex){
           System.out.println(ex.getMessage() +”111″);
           log.error(“error caught in getSpelledChecked”,ex.getMessage());
       }
        System.out.println(” Source >> “+ term + ” Suggestion>> “+termNew);

And here the funny Output. If see some of the suggestion is good but learning comes as earning and manger comes as wagner :). This dictionary is not at all usable.

suggestion

Here is the most painful area.

Custom Spell Check Solution Did you means in AEM

Since Out of box solution in AEM for Spell Check seems not usable , I decided to use Lucene Spell Check API direly. rather CQ5 search APIw hich is wrapper on the of Lucene.

Here is the details of existing CQ5 search bundle.
Bundle-Name: Social UGC Search Collections – Bundle which provides out of box spell check capabilities and lucene-core-3.6.1.jar is included in that bundle as supporting jar.

Here is the Custom Code

package com.xyz.util;
import java.io.File;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.spell.PlainTextDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Dictionary {
public static void main(String[] args) throws Exception {
File dir = new File(“B:/Projects/download/dic”);
Directory directory = FSDirectory.open(dir);
SpellChecker spellChecker = new SpellChecker(directory);
spellChecker.indexDictionary(new PlainTextDictionary(new File(“B:/Projects/dictionary/fulldictionary00.txt”)),
new IndexWriterConfig(Version.LUCENE_CURRENT,new StandardAnalyzer(Version.LUCENE_CURRENT)), false);
String wordForSuggestions = “mv money” ;
int suggestionsNumber = 1;
String[] suggestions = spellChecker.
suggestSimilar(wordForSuggestions, suggestionsNumber);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println(“Did you mean:” + word);
}
}
else {
System.out.println(“No suggestions found for word:”+wordForSuggestions);
}
}
}

Dictionary sample

marketing on demand
compliancemax
enhanced trading
learning center
move money
compliance max

When you would deploy above code in AEM OSGI bundle, You would face org.apache.lucene.analysis.standard are not resolved. To resolve that you need to make changes in your maven
<plugin>
<groupId>org.apache.felix</groupId>
<artifactId>maven-bundle-plugin</artifactId>
<extensions>true</extensions>
<configuration>
<instructions>
<Bundle-Category>xyz</Bundle-Category>
<Import-Package>
*
</Import-Package>
<Export-Package>
com.xyz.*,org.apache.lucene.*,org.tartarus.snowball.*
</Export-Package>
</instructions>
</configuration>
</plugin>

So now if you search mv money then suggestion would come like move money.

hope this will help.

AEM/CQ5, NoSQL Blogs

JCR NoSQL

Spell Check Lucene in AEM 5.6.1

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply