Data Store
The data store holds large binaries. On write, these are streamed directly to the data store and only an identifier referencing the binary is written to the Persistence Manager (PM) store. By providing this level of indirection, the data store ensures that large binaries are only stored once, even is they appear in multiple locations within the content in the PM store. In effect the data store is an implementation detail of the PM store. Like the PM, the data store can be configured to store its data in a file system (the default) or in a database. The minimum object length default is 100 bytes;; smaller objects are stored inline(not in the data store). The maximum value is 32000 because Java does not support strings longer than 64 KB in writeUTF.
Cluster Journal
Whenever CRX writes data it first records the intended change in the journal. Maintaining the journal helps ensure data consistency and helps the system to recover quickly from crashes. As with the PM and data stores, the journal can be stored in a file system (the default) or in a database.
Persistence Manager
Each workspace in the repository can be separately configured to store its data through a specific persistence manager (the class that manages the reading and writing of the data). Similarly, the repository-wide version store can also be
independently configured to use a particular persistence manager. A number of different persistence managers are available, capable of storing data in a variety of file formats or relational databases.
Query Index
CRX’s inverse index is based on Apache Lucene. This allows for:
Most index updates are synchronous. Long full text extraction tasks are handled in background. Other cluster nodes will update their indexes at next cluster sync Everything indexed by default. You can tweak the indexing configuration for improvements in indexing functionality, performance and disk usage. There is one index per workspace (and one for the shared version store) Indexes are not shared across a cluster, indexes are local to each cluster node.
Jackrabbit
The Apache Jackrabbit™ content repository is a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and JSR 283. Note the next release of the JCR specification is JSR 333, which is currently under work.