Category: AEM

Spell Check Lucene in AEM 5.6.1

I am trying to implement Spell Check for one of content serach application when they do search since application is heavily based on JCR query using Query builder users pay for the content.

They need smart search features also “do you means this” features just like google.

Adobe has invested good amount of effort and and money to update/create the knowledge document but they still lacks in many module which are not core of the AEM.

Sales force connector is one the module which i believe has never worked with unlimited version of sales force. Not only me, I talked to couple of my friends they  promised customer to provide the Sales force integration out of box using sales force Template it works with Free SF edition but never with unlimited edition . Adobe support or Adobe forum users are clueless and end up writing custom Sales force rest API or WSDL to talk to AEM..that’s the one story.

Coming back to the Spell Check out of the box in AEM 5.6.1.. I wanted to give “Do mean this this Feature” to the search user and i enabled spellchecker module in workplace.xml.

<SearchIndex class=”com.day.crx.query.lucene.LuceneHandler”>
            <param name=”path” value=”${wsp.home}/index”/>
            <param name=”resultFetchSize” value=”50″/>
            <param name=”indexingConfiguration” value=”${wsp.home}/indexing_config.xml”/>
            <param name=”tikaConfigPath” value=”${wsp.home}/tika-config.xml”/>
            <param name=”supportHighlighting” value=”false”/>
            <param name=”spellCheckerClass” value=”com.day.crx.core.query.spell.CRXSpellChecker$OneMinuteRefreshInterval”/>
        </SearchIndex>

I ran the index which took couple of hours and then finally i got this created crx-quickstart\repository\workspaces\crx.default\index\spellchecker.

I wrote a Querybuilder code like this..

final QueryManager manager = session.getWorkspace().getQueryManager();
            Query query = manager.createQuery(“/jcr:root[rep:spellcheck(‘”+term+”‘)]/(rep:spellcheck())”, Query.XPATH);
            RowIterator rows = query.execute().getRows();
            // the above query will always return the root node no matter what string we check
            Row r = rows.nextRow();
            // get the result of the spell checking
            Value v = r.getValue(“rep:spellcheck()”);
            if (v == null) {
                termNew = term;
            } else {
                 termNew = v.getString();
            }          
        }
        catch(Exception ex){
            System.out.println(ex.getMessage() +”111″);
            log.error(“error caught in getSpelledChecked”,ex.getMessage());
        }
        System.out.println(” Source >> “+ term + ” Suggestion>> “+termNew);

And here the funny Output. If see some of the suggestion is good but learning comes as earning and manger comes as wagner :). This dictionary is not at all usable.

suggestion

Here is the most painful area.

Custom Spell Check Solution Did you means in AEM

Since Out of box solution in AEM for Spell Check seems not usable , I decided to use Lucene Spell Check API direly. rather CQ5 search APIw hich is wrapper on the of Lucene.

Here is the details of existing  CQ5 search bundle.
Bundle-Name: Social UGC Search Collections – Bundle which provides out of box spell check capabilities and lucene-core-3.6.1.jar is included in that bundle as supporting jar.

Here is the Custom Code

package com.xyz.util;
import java.io.File;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.spell.PlainTextDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Dictionary {
public static void main(String[] args) throws Exception {
File dir = new File(“B:/Projects/download/dic”);
Directory directory = FSDirectory.open(dir);
SpellChecker spellChecker = new SpellChecker(directory);
spellChecker.indexDictionary(new PlainTextDictionary(new File(“B:/Projects/dictionary/fulldictionary00.txt”)),
new IndexWriterConfig(Version.LUCENE_CURRENT,new StandardAnalyzer(Version.LUCENE_CURRENT)), false);
String wordForSuggestions = “mv money” ;
int suggestionsNumber = 1;
String[] suggestions = spellChecker.
suggestSimilar(wordForSuggestions, suggestionsNumber);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println(“Did you mean:” + word);
}
}
else {
System.out.println(“No suggestions found for word:”+wordForSuggestions);
}
}
}

 

Dictionary sample

marketing on demand
compliancemax
enhanced trading
learning center
move money
compliance max

When you would deploy above code in AEM OSGI bundle, You would face org.apache.lucene.analysis.standard are not resolved. To resolve that you need to make changes in your maven
<plugin>
<groupId>org.apache.felix</groupId>
<artifactId>maven-bundle-plugin</artifactId>
<extensions>true</extensions>
<configuration>
<instructions>
<Bundle-Category>xyz</Bundle-Category>
<Import-Package>
*
</Import-Package>
<Export-Package>
com.xyz.*,org.apache.lucene.*,org.tartarus.snowball.*
</Export-Package>
</instructions>
</configuration>
</plugin>

So now if you search mv money then suggestion would come like move money.

hope this will help.

 

 

 

Advertisement

Middle level technology solution company challenge: Embracing Enterprise Application Framework

I worked in start up open source technology company for 7 years and was amazing journey in building platform using Apache ServiceMix,  Lifreay Portal, JBoss middle ware suites, Alfresco and many others as  integrated solution to achieve business need for large banking, social care  in Africa, Europe and Indian market.

I recently had three days in-house TIBCO Training includes TIBCO Business works, EMS, Designer, Active Space , spot fire and other tools related with deployment , Continuous integration and logging framework. Team did great jobs in terms of delivering basic building blocks and architectural nuts and bolts which are required to develop service Web Service, Rest Full Service, querying Database using TIBCO Designer no custom code, it’s all ready made TIBCO provided pallets and Hops and I simply said wow because I am coming from ETL background where ready-made hops and tools saves developers time and reduces risk of long and buggy code for file reading/transformation/querying Database and many more.

Here is one of the sample service designs in TIBCO designer to read some data from database (No Java/.net class no custom code)

TIBCO Designer

As i software solution architect I am convinced that developing a service is much easier with TIBCO than  writing Custom Java or .Net code and writing JSON and other transformation code, it helps to reduce buggy code and faster turnaround time for solution. It can’t be denied Initial development time would be higher for those who are new to this.

NoSQL/Data Grid (Active Space) in TIBCO

I always built robust /scalable low-cost solution based on liferay, alfresco on JBoss and many more .. no software license but supporting thousand concurrent public users. Map/reduce, Lucene, Indexing , SOLR are basic ingredient the solution we developed. I am keen follower of this scalable platform such as Amazon, Netflix , E-Bay, price-line etc .

Active space is new addition to TIBCO suite to support high number of concurrent users in competition of NoSQL/Data-grid/Cache technology which TIBCO claims runs on commodity server, but in Demo I found they are running two nodes on 512 GB memory which are beyond my thoughts why they solution in such a way, later I realized its in memory non-persistent solution where data resides in RAM not on Disk.

Challenge in Adoption Non Relational Approach?

I got a chance to speak few architects who have developed financial platform Microsoft .net for more than decades and very often I hear in leadership meeting reducing downtime , upgrade etc. I always ask question to myself why can’t we move to new generation database  and here is reply

“ This new technology and framework are bad adoption and does not fit in our solution, we are very happy with our existing solution and we can scale in this only. New solution are very slow , no documentation so not acceptable blah blah blah”.

Fitting Everything together?

One most important philosophies underpin that one size does not fit all, for many year traditional Database server are used for storing all types, it doesn’t matter all data types fits for relational Data model. General perception is that media, Analytic and social media application are suitable for NoSQL Database and structured relations data but that’s not correct.

Following can be implemented in exciting software to scale and reduce downtime.

  1. Memcached is used by Both Amazon and Netflix for frequently  requested content/data same can be achieved by Active Space to reduce load on database and faster response to user requests.
  2. Let us stick using highly secure data in RDBMS but active/active set up can be done to reduce downtime and scale it.
  3. Move other kind of information for example user profile and other data in distributed  NoSQL Database and redesign the service.

 

AEM/CQ5 e-Commerce and Hybris get Started

It took couple of week for me to get started Adobe CQ5 and Hybris together on my local desktop since Adobe Wiki is very useful but i think has not been updated for some time. Hybris is eCommerce platform now by SAP and to get the license is not easy either you have to be customer or partner. Thanks to Adobe they have created OSGI Package of Hybris server and shared in package share and also Geomtrixx outdoor eCommerce related product content.

Day wiki http://dev.day.com/docs/en/cq/current/ecommerce/eCommerce-hybris.html is usefull but need to be updated with new information.

This is really cool if you want to start up AEM eCommerce Integration framework.
https://seminars.adobeconnect.com/_a227210/p85ixyvw3zp/?launcher=false&fcsContent=true&pbMode=normal

Brief synopsis to get started.

If you have CQ5 running instance  i.e 5.6.1, please create a Adobe CQ share package credentials and login into that and download    Hybris 500 MB+ package that has inbuilt Hybris server and also hybris content that has hybris importer and certain content.

Install both packages in your local package manager and can access hybris  console on this http://localhost:9001/hmc/hybris.and your CQ on 4502.

Image

By default outdoor catalog would be available in hybris, you can manually synchronize that content from here http://localhost:4502/etc/importers/hybris.html that will recreate product catalog here /etc/commerce/products. Remember untill you are not customer or Partner with Hybriss you would not get any resource or document about Hybris for functional knowledge.

Generate MVN Project

mvn -Padobe-public archetype:generate     -DarchetypeRepository=http://repo.adobe.com/nexus/content/groups/public/     -DarchetypeGroupId=com.day.jcr.vault     -DarchetypeArtifactId=multimodule-content-package-archetype     -DarchetypeVersion=1.0.2

Provide all other information i.e artifict id, name, group and folders and say yes in the last and finally your maven structure would be created.

Import Project in Eclipse

Download code from github https://github.com/paolomoz/cq-commerce-impl-sample in this case you don’t need to create archetype and maven project creation etc. build and deploy it to your local CQ.  Changed ecommorce provide in JCR for Geomtrexx_outdoor and your code would work..

 

 

JCR Content Model Some Key Rule

Data Store

The data store holds large binaries. On write, these are streamed directly to the data store and only an identifier referencing the binary is written to the Persistence Manager (PM) store. By providing this level of indirection, the data store ensures that large binaries are only stored once, even is they appear in multiple locations  within the content in the PM store. In effect the data store is an implementation detail of the PM store. Like the PM, the data store can be configured to store its data in a file system (the default) or in a database. The minimum object length default is 100 bytes;; smaller objects are stored inline(not in the data store). The maximum value is 32000 because Java does not support strings longer than 64 KB in writeUTF.

Cluster Journal

Whenever CRX writes data it first records the intended change in the journal. Maintaining the journal helps ensure data consistency and helps the system to recover quickly from crashes. As with the PM and data stores, the journal can be stored in a file system (the default) or in a database.

Persistence Manager
Each workspace in the repository can be separately configured to store its data through a specific persistence manager (the class that manages the reading and writing of the data). Similarly, the repository-wide version store can also be
independently configured to use a particular persistence manager. A number of different persistence managers are available, capable of storing data in a variety of file formats or relational databases.

Query Index

CRX’s inverse index is based on Apache Lucene. This allows for:

Most index updates are synchronous. Long full text extraction tasks are handled in background. Other cluster nodes will update their indexes at next cluster sync Everything indexed by default. You can tweak the indexing configuration for improvements in indexing functionality, performance and disk usage. There is one index per workspace (and one for the shared version store) Indexes are not shared across a cluster, indexes are local to each cluster node.

Jackrabbit
The Apache Jackrabbit™ content repository is a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and JSR 283. Note the next release of the JCR specification is JSR 333, which is currently under work.

Sending email Using Apache common and CQ5 mail API — Adobe CQ5

Apache mail common comes as out of box in Adobe CQ5 platform as OSGI bundle and and CQ has its own API on top of that can be used to send all kind of email

  • Plain text
  • HTML Formatted
  • and any kind of attachment.

Code snippets to send email. Please refer to the document

URL http://commons.apache.org/proper/commons-email/userguide.html

When you send email as plain text , you would not see much difference in terms of formatting for different-2 mail client , but when it is HTML formatted you would find it varies across  outlook, Gmail, Yahoo even on Smartphone.

Retest it correct your HTML CSS class etc to ensure it looks uniform across all kind of mail client. Also if you are mail content has java script or long hash based query string data are not going work in email ..remember mail box is not browser, its Application and mail server apply their own set of security.

One of the primary reason why i wrote this blog is that outlook consistently shows this attachment  (ATT00001.htmattachment  when we sent HTML formatted email content but same was not appearing in G-Mail and Yahoo. refer to the below image attached.

Image

I referred multiple post and forum and everybody has their own understand and interpretation but none has worked for us.
Few URL worth sharing here.

http://stackoverflow.com/questions/1821464/attachmentshtml-generated-with-commons-email-doesnt-show-in-some-email-clients

https://discussions.apple.com/message/10966873#10966873

What’s Solution:

it’s code issue and nothing else.

  • Email object was not created on right place.
  • Session object too
  • also email recipients

When finally code was rescanned and arranged that attachment stopped coming in outlook.