Part 2-Using Azure search for website content search

Please follow the simple step the part-1 the series to setup a azure search instance and configure use that with your sitecore instance.

Before start implementing the search functionalities it’s worth to read the Features of Azure Search,Limitations of Azure Search, and Unsupported Azure Search features.

https://doc.sitecore.com/developers/90/platform-administration-and-architecture/en/sitecore-azure-search-overview.html

The important limitation that is noticed is the below one.

Fields – an Azure Search index can only contain up to 1000 fields. This may be an issue for the and Master Web indexes that both have a default setup that starts with ~550 fields. If you reach the 1000 fields limit, create a new index that is specifically dedicated to indexing your custom templates and fields, then exclude your custom fields from the Master and Web indexes.

For most of the implementation we will definitely hit this  number in index documents  when we include the custom templates we used in our projects.

We are able to control the fields that needs to be included in the index tweaking the  azure search related configuration files.

The default out of the configuration files can be found in the below folder location on you indtance ….\Website\App_Config\Sitecore\ContentSearch.Azure

Create custom Default Index configuration

I have created a custom index patch configuration to merge the default index configuration coming out of the box from sitecore. This will override the  fields that needs to used in the web and master indexes.

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:search="http://www.sitecore.net/xmlconfig/search/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" >
<sitecore>
<settings>
<setting name="Indexing.FullRebuildItemCountThreshold">
<patch:attribute name="value">100001</patch:attribute>
</setting>
<setting name="ContentSearch.FullRebuildItemCountThreshold">
<patch:attribute name="value">100001</patch:attribute>
</setting>
</settings>
<contentSearch search:require="Azure">
<configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch" role:require="Standalone or ContentManagement">
<indexes hint="list:AddIndex">
<index id="sitecore_web_index" type="Sitecore.ContentSearch.Azure.CloudSearchProviderIndex, Sitecore.ContentSearch.Azure">
<param desc="name">$(id)</param>
<param desc="connectionStringName">cloud.search</param>
<param desc="totalParallelServices">5</param>
<!-- This initializes index property store. Id has to be set to the index id -->
<param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
<configuration ref="contentSearch/indexConfigurations/defaultCloudIndexConfiguration" />
<schemaBuilder ref="contentSearch/searchServiceSchemaBuilder" />
<searchService type="Sitecore.ContentSearch.Azure.Http.CompositeSearchService, Sitecore.ContentSearch.Azure" />
<strategies hint="list:AddStrategy">
<!-- NOTE: order of these is controls the execution order -->
<strategy role:require="Standalone OR ContentManagement" ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsyncSingleInstance" />
</strategies>
<commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
<policies hint="list:AddCommitPolicy">
<policy type="Sitecore.ContentSearch.ModificationCountCommitPolicy, Sitecore.ContentSearch">
<Limit>1000</Limit>
</policy>
</policies>
</commitPolicyExecutor>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
<Database>web</Database>
<Root>/sitecore/content</Root>
</crawler>
</locations>
<enableItemLanguageFallback>false</enableItemLanguageFallback>
<enableFieldLanguageFallback>false</enableFieldLanguageFallback>
</index>
<index id="sitecore_master_index" type="Sitecore.ContentSearch.Azure.CloudSearchProviderIndex, Sitecore.ContentSearch.Azure">
<param desc="name">$(id)</param>
<param desc="connectionStringName">cloud.search</param>
<param desc="totalParallelServices">5</param>
<!-- This initializes index property store. Id has to be set to the index id -->
<param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
<configuration ref="contentSearch/indexConfigurations/defaultCloudIndexConfiguration" />
<schemaBuilder ref="contentSearch/searchServiceSchemaBuilder" />
<searchService type="Sitecore.ContentSearch.Azure.Http.CompositeSearchService, Sitecore.ContentSearch.Azure"/>
<strategies hint="list:AddStrategy">
<patch:delete />
</strategies>
<strategies hint="list:AddStrategy">
<!-- NOTE: order of these is controls the execution order -->
<strategy ref="contentSearch/indexUpdateStrategies/intervalAsyncMaster" />
</strategies>
<commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
<policies hint="list:AddCommitPolicy">
<policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
<policy type="Sitecore.ContentSearch.ModificationCountCommitPolicy, Sitecore.ContentSearch">
<Limit>1000</Limit>
</policy>
</policies>
</commitPolicyExecutor>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
<Database>master</Database>
<Root>/sitecore</Root>
</crawler>
</locations>
<enableItemLanguageFallback>false</enableItemLanguageFallback>
<enableFieldLanguageFallback>false</enableFieldLanguageFallback>
</index>
</indexes>
</configuration>
<indexUpdateStrategies>
<syncMaster>
<CheckForThreshold>true</CheckForThreshold>
</syncMaster>
<intervalAsyncMaster type="Sitecore.ContentSearch.Maintenance.Strategies.IntervalAsynchronousStrategy, Sitecore.ContentSearch">
<param desc="database">master</param>
<param desc="interval">00:60:00</param>
<!-- Whether or not a full index rebuild should be triggered when the number of items in the history engine exceeds the number specified 
in ContentSearch.FullRebuildItemCountThreshold. -->
<CheckForThreshold>true</CheckForThreshold>
</intervalAsyncMaster>
<onPublishEndAsyncSingleInstance>
<CheckForThreshold>true</CheckForThreshold>
</onPublishEndAsyncSingleInstance>
</indexUpdateStrategies>
<indexConfigurations>

<!-- If no configuration is specified for an index, it uses the default configuration. The configurations are not merged if the index also has a
configuration. The system uses either the default configuration or the index configuration. -->
<defaultCloudIndexConfiguration type="Sitecore.ContentSearch.Azure.CloudIndexConfiguration, Sitecore.ContentSearch.Azure">

<!-- Should index Initialize() method be called as soon as the index is added or wait for an external trigger -->
<initializeOnAdd>true</initializeOnAdd>

<!-- DEFAULT FIELD MAPPING
The field map allows you to have full control over the way your data is stored in the index. This affects the way data is queried,
the performance of searching, and the way that data is retrieved and cast to a proper tetype in the API.
-->
<fieldMap type="Sitecore.ContentSearch.Azure.FieldMaps.CloudFieldMap, Sitecore.ContentSearch.Azure">
<!-- This setting allows you to map a field name in Sitecore to the index and store it in the appropriate way.
fieldName: The name of the field in the index.
retrievable: means that field value shows up in search results list
searchable: means that a field is included in full text search.
filterable: whether a field can be used in a filter navigation structure
facetable: whether a field can be used in a facet navigation structure
sortable: whether a field can be used in a sort navigation structure
boost: The boost factor globally indicates the relative importance of this field at index and query time.
type: The type to cast the value back to in the API.
settingType: The class implementation to store the information specified in all other fields.
-->
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="Title" cloudFieldName="title" indexType="TOKENIZED" searchable="YES" retrievable="YES" facetable="YES" filterable="YES" sortable="YES" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure"/>
<field fieldName="_datasourcecontent" cloudFieldName="datasourcecontent" indexType="TOKENIZED" searchable="YES" retrievable="YES" facetable="NO" filterable="NO" sortable="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure" cloudAnalyzer="en.microsoft"/> 
</fieldNames>
</fieldMap>

<documentOptions type="Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilderOptions,Sitecore.ContentSearch.Azure">

<!-- This flag will index all fields by default. This allows new fields in your templates to automatically be included into the index.
You have two choices :

1) Set this to 'true' and place all the fields that you would like to remove in the 'ExcludeField' list.
2) Set to false and place all fields you would like to be indexed in the 'IncludeField' list below.
-->
<indexAllFields>false</indexAllFields>

<!-- COMPUTED INDEX FIELDS
This setting allows you to add fields to the index that contain values that are computed for the item that is being indexed.
You can specify the storageType and indextype for each computed index field in the <fieldMap><fieldNames> section.
-->
<fields hint="raw:AddComputedIndexField">
<!-- ContentFromDataSources includes single line, multiline text, rich text etc for all data source items of a page. -->
<field fieldName="_datasourcecontent" type="XXX.Business.Search.ComputedFields.ContentFromDataSources, XXX.Business.Search"/> 
</fields>

<include hint="list:AddIncludedField">
<!-- Title-->
<fieldId>{C06867FE-9A43-4C7D-B739-48780492D06F}</fieldId>
<!--Forms feld-->
<fieldId>{558F9307-EBAF-480D-88B5-DFE9E63A66DE}</fieldId>
<fieldId>{39C4902E-9960-4469-AEEF-E878E9C8218F}</fieldId>
</include>
</documentOptions>
</defaultCloudIndexConfiguration>
</indexConfigurations>
<cloudBatchBuilder type="Sitecore.ContentSearch.Azure.CloudBatchBuilder, Sitecore.ContentSearch.Azure">
<maxDocuments>10</maxDocuments>
</cloudBatchBuilder>
</contentSearch> 
</sitecore>
</configuration>

The web and master index configuration will use the fields that is merge with fields that is coming out of the box index configuration.

The important flag set in the custom configuration is indexAllFields. This will tell sitecore 

<!-- This flag will index all fields by default. This allows new fields in your templates to automatically be included into the index.
You have two choices :

1) Set this to 'true' and place all the fields that you would like to remove in the 'ExcludeField' list.
2) Set to false and place all fields you would like to be indexed in the 'IncludeField' list below.
-->
<indexAllFields>false</indexAllFields>

Once that flag is set it would be easy to tell sitecore to what fields should include in the indexes using below configuration.

 <include hint="list:AddIncludedField">
<!-- Title-->
<fieldId>{C06867FE-9A43-4C7D-B739-48780492D06F}</fieldId>
<!--Forms feld-->
<fieldId>{558F9307-EBAF-480D-88B5-DFE9E63A66DE}</fieldId>
<fieldId>{39C4902E-9960-4469-AEEF-E878E9C8218F}</fieldId>
</include>

I have also included an example of adding a custom index field in your indexes.

<fields hint="raw:AddComputedIndexField">
<!-- ContentFromDataSources includes single line, multiline text, rich text etc for all data source items of a page. -->
<field fieldName="_datasourcecontent" type="XXX.Business.Search.ComputedFields.ContentFromDataSources, XXX.Business.Search"/> 
</fields>

Once the fields are defined we should be able to map the fields with azure search document fields using the configuration below.

https://doc.sitecore.com/developers/90/platform-administration-and-architecture/en/configure-azure-search.html#UUID-6db2d23d-88f1-df71-5144-82fd14c811d3_id__Map_the_Azure

<fieldMap type="Sitecore.ContentSearch.Azure.FieldMaps.CloudFieldMap, Sitecore.ContentSearch.Azure">
<!-- This setting allows you to map a field name in Sitecore to the index and store it in the appropriate way.
fieldName: The name of the field in the index.
retrievable: means that field value shows up in search results list
searchable: means that a field is included in full text search.
filterable: whether a field can be used in a filter navigation structure
facetable: whether a field can be used in a facet navigation structure
sortable: whether a field can be used in a sort navigation structure
boost: The boost factor globally indicates the relative importance of this field at index and query time.
type: The type to cast the value back to in the API.
settingType: The class implementation to store the information specified in all other fields.
-->
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="Title" cloudFieldName="title" indexType="TOKENIZED" searchable="YES" retrievable="YES" facetable="YES" filterable="YES" sortable="YES" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure"/>
<field fieldName="_datasourcecontent" cloudFieldName="datasourcecontent" indexType="TOKENIZED" searchable="YES" retrievable="YES" facetable="NO" filterable="NO" sortable="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.Azure.CloudSearchFieldConfiguration, Sitecore.ContentSearch.Azure" cloudAnalyzer="en.microsoft"/> 
</fieldNames>
</fieldMap>

I will go through some of the other configuration in detail of subsequent post.

Leave a comment