Solr Schema and API

Solr is the XWiki Standard default search engine and it is based on Apache Solr. Solr supports full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document handling, and geo-spatial search. The plugin architecture allows the setup of different type of analyzers using XML configuration files. 

Solr Schema

According to the XWiki Data Model, the supported entities are:

  • wiki
  • space
  • page
  • class 
  • object
  • object property
  • attachment

Entity types have field they share, as well as specific fields.

Fields Shared by All Indexed Entities

The wiki, space and name information is shared because each indexed entity is either a document or held by a document. According to the Solr schema, these fields are:

NameDescription
idA keyword field that holds a unique string which identifies a document across the index. This field is used for finding old versions of a document to be indexed.
typeThe type of entity to be indexed: DOCUMENT, ATTACHMENT, OBJECT, OBJECT_PROPERTY.
wikiA keyword field holding the name of the virtual wiki a document belongs to.
spaceThe name of the space the document belongs to. This field is analyzed and used for free text search.
This field is deprecated since version 7.2. You should use the spaces multi-valued field instead.
spacesThe space names - this field is analyzed and thus mostly used for free text search. E.g. for a document A.B.C.Page the value is ['A', 'B', 'C']
space_exactThe unanalyzed and not stored version of the document's space. We index the local space reference (e.g. A.B\.1.C) verbatim for exact matching.
space_facetDedicated field for hierarchical faceting on nested pages used to implement a facet.prefix-based drill down. E.g. for a document A.B.C.Page this field will hold ['0/A.', '1/A.B.', '2/A.B.C.'].
space_prefixA field used to match descendant documents. For instance, a query such as space_prefix:A.B will match the documents from space A.B and all its descendants like A.B.C. This is possible because this field holds the local references of all the ancestor spaces of a document i.e. all the prefixes of the space reference. E.g. for a document A.B.C.Page this field will hold ['A', 'A.B', 'A.B.C']. As a consequence, searching for space_prefix:A.B will match A.B.C.Page.
We don't use the PathHierarchyTokenizer because it doesn't support specifying an escaping character. Instead, we compute the values ourselves at index time as a workaround.
nameThe name of the document. This field is analyzed and mostly used for free text search.
name_exactThe unanalyzed and not stored version of the document's name.
localeThe real/calculated locale of the document i.e. the default locale in default document entry case.
localesThe list of Locales covered by the entity. The list is dynamically determined from the list of enabled Locales and the various Locales of the associated wiki document.
languageThe language of the document.
hiddenA document hidden flag. Only documents can be made hidden explicitly because attachments, objects and object properties are automatically hidden if the document that holds them is hidden.

Document Static Fields

NameDescription
fullnameThe document full name: SpaceName.PageName.
titleA multilingual and virtual field representing the document title.
title_The localized title which is indexed based on the document locale. E.g. title_en.
title_sortThe dedicated field for sort which is necessary because analyzed fields cannot be used for sorting.
doccontent_The rendered document content i.e. with no transformations executed (e.g. doccontent_pt_BR ). This allows to use a different boost value for document content than for the object (objcontent) and the attachment content (attcontent).
doccontentraw_
versionThe document version which needs to be indexed in order to be able to detect whether the index is synched with the database.
comment_The version summary. E.g. comment_en.
doclocaleThe technical locale of the document. 
authorThe last author. This field is used for faceting.
author_displayThe last analyzed author. This field is used for free text search. 
author_display_sortThe dedicated field used for sorting.
creatorThe document creator, stored verbatim (unanalyzed) for faceting.
creator_displayThe analyzed document creator, used for free text search.
date
creationdate  

In order to be able to mix the single entity approach with the multiple entities one and to avoid joins, we have to duplicate information. This means that for each entity we have to index and duplicate information about other related entities.

Object Data

NameDescription
classThe type of objects stored by the document. E.g. XWiki.TagClass. You can also use object in search queries as an alias to class.
objcontent_A collection of values from all the properties of all the objects found on the indexed document.
property.spaceName.className.propertyName_sort*A dedicated field for sorting on property values, which is needed because Sol doesn't support sorting on multivalued fields. E.g. property.Blog.BlogPostClass.publishDate_sortDate.
object.spaceName.className_A dynamic multivalued field indexing the entire content of the objects of the specified type. All values are indexed as localized text, using the document locale. E.g. object.Blog.BlogPostClass_fr.
property.spaceName.className.propertyName_A dynamic multivalued field indexing the value of the specified property. For static lists, both the raw value which is saved in the database and the display value are indexed. Property values are indexed based on their type. E.g.:
  • property.Blog.BlogPostClass.published_boolean
  • property.Blog.BlogPostClass.publishDate_date 
  • property.Blog.BlogPostClass.category_string
  • property.Blog.BlogPostClass.summary_en

Attachment Data

NameDescription
filenameThe names of the files attached to the document. E.g. ['file.pdf', 'image.jpg'].
mimetypeA list of attachment media types. E.g. ['text/plain', 'image/png'].
attauthorThe absolute references of the users that uploaded the last version of each of the document attachments. This field is used for faceting. E.g. ['xwiki:XWiki.Admin', 'projects:XWiki.JaneDoe'].
attauthor_displayThe real user names of the users that uploaded the last version of each of the document attachments. This field is used for free text search. E.g. ['Admin', 'Jane Doe'].
attdateThe dates when the last version of each attachment have been uploaded.
attcontent_The content of each attachment indexed based on the document locale. E.g. attcontent_en : ['content of first attachment', 'content of second attachment'].
attsizeThe size of each attachment in bytes.

Attachment Static Fields

NameDescription
filenameThe attachment file name. E.g. ['file.pdf'].
filename_sortThe attachment file name used for sorting.
mimetypeThe attachment media type.
attversionThe attachment revision which is used in order to tell whether the Solr index is synched with the database.
attauthorThe absolute reference of the user that uploaded the last version of the attachment. This field is used for faceting. E.g. ['xwiki:XWiki.Admin', 'projects:XWiki.JaneDoe'].
attauthor_displayThe real user name of the user that uploaded the last version of the attachment. This field is used for free text search. E.g. ['Admin', 'Jane Doe'].
attdateThe date when the last version of the attachment has been uploaded.
attdate_sortA dedicated field for sorting which is needed because attadate is multivalued whereas Solr doesn't support multivalued fields.
attsizeThe size of the attachment in bytes.
attsize_sortA dedicated field for sorting because attsize is multivalued.

Object and ObjectProperty Static Fields

NameDescription
classThe object type.
numberThe object number which identifies an object when there are multiple objects of the same type on a document.
objcontent_A collection of the values from all the properties of the indexed object. The used format is propertyName : propertyValue". This field is analyzed based on the document locale. E.g. objcontent_ro.
property.propertyName_A dynamic multivalued field indexing the value of the specified property. For static lists, both the raw value which is saved in the database and the display value are indexed. Property values are indexed based on their type. E.g.:
  • property.published_boolean
  • property.publishDate_date 
  • property.category_string
  • property.summary_en

Solr Search Query API

The Solr search query API is exposed using the Query Module API. 

Common Query Parameters

q

The q parameter is the main query for the request. Example: q=foo __INPUT__* bar.

fq

The fq parameter stands for Filter Query and it can be used to specify a query for restricting the super set of returned documents without influencing the score. 

Queries specified with fq are cached independently from the main query. Also, the fq parameter can be specified multiple times and documents will only be included in the result if they are in the intersection of the document sets resulting from each fq.

Example: 

fq=type:ATTACHMENT
fq=wiki:xwiki
fq=space_exact:Main
fq=class:FAQCode.FAQClass

fl

The fl parameter is used to specify a set of facet fields to return, limiting the amount of information in the response. The set of fields to be returned can be specified as a space or comma separated list of field names. 

The string score can also be used to indicate that the score of each document for the particular query should be returned as a field. 

The string * can be used to indicate all stored fields the document has. 

Example: fl=creator, creationdate, author, date, mimetype, attauthor, attdate, attsize

qf

The qf parameter stands for Query Field and it is used to specify in which fields to look as well as their importance. This is actually the query boost.

Example: qf=title^3 property.FAQCode.FAQClass.answer

sort

Sorting can be done:

  • by score on any multiValued="false" indexed="true" field provided the field is either non-tokenized or it uses an Analyzer that only produces a single term.
  • by index id using sort=_docid_ asc or sort=_docid_ desc.

A sort ordering must include:

  • a field name or the pseudo-field score followed by 
  • a white-space escaped as either + or %20 in URL strings followed by 
  • a sort direction: asc or desc

Multiple sort orderings can be separated by a comma: sort=<field name>+<direction>[,<field name>+<direction>]

Custom Query Parameters

As shown in the solrconfig configuration file, there are 4 available query parameters.

NameDescriptionDefault Value
xwiki.multilingualFieldsRepresents the list of multilingual fields that will be expanded in the search query. This way, a user can write a query on the title field and all the title_<language> variations of the field will be used in the query. The list of languages for which a field is expanded is taken from the xwiki.supportedLocales query parameter. If this parameter is not defined, the ROOT locale is used instead.title, doccontent, doccontentraw, comment, objcontent, propertyvalue, attcontent, property.*, object.*
xwiki.supportedLocalesThe list of supported locales used to expand the fields specified by xwiki.multilingualFields in search queries.An empty list which means only the ROOT locale is used for expanding multilingual fields in search queries.
xwiki.typedDynamicFieldsThe list of typed non-string dynamic fields that will be expanded in the search query. The names of these fields are suffixed with the name of their data type.property.*
xwiki.dynamicFieldTypesDynamic field definitions which allow using convention over configuration
for fields via the specification of patterns to match field names. The complete list is available in the schema.xml file.
boolean, int, long, float, double, string, date.

Examples

Faceting on Object Properties

To add facets to an "Test.TestClass" object, just use the code below

#set ($discard = $query.bindValue('facet.field', ['field2', 'property.Test.TestClass.staticList1_string']))

which will be triggered by the query object:Test.TestClass

The _string suffix means the property was indexed verbatim.

Sorting on Object Properties

You can sort the document search results based on a property value using the Query Module API:

#set ($discard = $query.bindValue('sort', "property.Test.TestClass.staticList1_sortString asc"))

where:

  • Test.TestClass is the name of the wiki class
  • staticList1 is the name of the "Test.TestClass" property of type Static List
  • sortString is a suffix representing the dynamic type used for sorting as explained in the above section

  

Related Pages

Search this space

 

Most popular tags

Failed to execute the [groovy] macro
  1. access rights
  2. activity stream
  3. annotation
  4. attachment
  5. comment
  6. Document Tree Macro
  7. export
  8. Extension Manager
  9. Flamingo skin
  10. global user
  11. Groovy event listener
  12. group
  13. nested page
  14. search
  15. skin
  16. syntax
  17. user
  18. user profile
  19. velocity macros
  20. wiki
  21. wysiwyg
  22. XWiki Applications
  23. xwikiattachment_archive table
  24. xwikiattachment table
  25. xwikiattrecyclebin table
  26. xwikiproperties table

[Display all tags from this space]