Sitecore Search: Custom Computed Field To Search For All Possible Content Page Data

High Score Labs News • Feb 2, 2018
Let’s imagine that you have a Sitecore content page with with different renderings which populate data from data sources, which come from the current page item. Each data source may have several descendant items. In addition, each item might have different field types, for example, list types or link types, which can point to other items in the content tree. You can see the challenge that this may present to the developer; creating a maintainable, yet searchable information architecture.
Then, let’s say, that you also want to have a generic solution to search for pages with specific content. Out of the box, Sitecore does not provide this, requiring custom calculated fields development.
The basic idea is to create a custom search computed field which will contain all the useful page item data: the data from page item fields, the data from page item rendering (if any) data sources and any of their descendant items. Each field may have references to other items, so we need to extract this data from those items as well. The following snippet bellow will provide an example of a computed field:
using Sitecore.ContentSearch.ComputedFields;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Sitecore.ContentSearch;
using Sitecore.Data.Items;
using System.Text;
using Sitecore.Data.Fields;
using Sitecore.Diagnostics;
using Sitecore.Data;
namespace Sitecore.Foundation.Search.ComputedFields
{
public class DeepContent : IComputedIndexField
{
public string FieldName { get; set; }
public string ReturnType { get; set; }
public object ComputeFieldValue(IIndexable indexable)
{
Item obj = (indexable as SitecoreIndexableItem);
if (obj != null)
{
//extract content for a current item
var @out = ExtractContent(obj);
// extract content for item datasources
@out.AddRange(ExtractDataSourceContent(obj));
//join all the content
var s = string.Join(” “, @out).Trim();
//remove all the excessive characters
s = RemoveSpecialCharacters(s);
if (!string.IsNullOrWhiteSpace(s))
{
return s;
}
}
return null;
}
private List<string> ExtractDataSourceContent(Item item)
{
var @out = new List<string>();
//if item has a layout
if (item.Visualization.Layout != null)
{
var device = DeviceItem.ResolveDevice(item.Database);
if (device != null)
{
var renderings = item.Visualization.GetRenderings(device, false).ToList();
//if item has renderings
if (renderings.Any())
{
foreach (var rendering in renderings)
{
//get rendering datasource
var dsItem = item.Database.GetItem(rendering.Settings.DataSource, item.Language);
if (dsItem != null)
{
//extract content for datasource
@out.AddRange(ExtractContent(dsItem));
//extract content for each datasource descendant
foreach (var _desc in dsItem.Axes.GetDescendants())
{
@out.AddRange(ExtractContent(_desc));
}
}
}
}
}
}
return @out;
}
private List<string> ExtractContent(Item item)
{
//extract content for all item fields, excluding standart fields
var list = item.Fields.Where(field => IsDataField(field)).Select(f => ExtractFieldContent(item, f)).Where(x => !string.IsNullOrWhiteSpace(x)).ToList();
return list;
}
private string ExtractFieldContent(Item item, Field field)
{
var @out = string.Empty;
try
{
if (!string.IsNullOrWhiteSpace(field.Type))
{
var fieldType = FieldTypeManager.GetFieldType(field.Type);
if (fieldType?.Type != null)
{
switch (fieldType.Type.FullName)
{
case “Sitecore.Data.Fields.HtmlField”:
{
//remove html tags from Richt Text field value
@out = StringUtil.RemoveTags(field.GetValue(true, true));
break;
}
//extract content for link and list types fields
case “Sitecore.Data.Fields.LookupField”:
case “Sitecore.Data.Fields.DatasourceField”:
case “Sitecore.Data.Fields.MultilistField”:
case “Sitecore.Data.Fields.GroupedDroplinkField”:
case “Sitecore.Data.Fields.GroupedDroplistField”:
case “Sitecore.Data.Fields.ReferenceField”:
{
var val = field.GetValue(true, true);
if (!string.IsNullOrWhiteSpace(val))
{
//extract content for each ID
foreach (var sIdOrPath in val.Split(“|”.ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
{
var sourceItem = item.Database.GetItem(sIdOrPath, item.Language);
if (sourceItem != null)
{
var s = string.Join(” “, ExtractContent(sourceItem)).Trim();
if (!string.IsNullOrWhiteSpace(s))
{
@out += string.Concat(” “, s);
}
}
}
}
break;
}
default:
{
//for the rest of the field (simple types) just get it values
@out = field.GetValue(true, true);
break;
}
}
}
}
}
catch (Exception ex)
{
Log.Error(ex.Message, ex, this);
}
return @out.Trim();
}
private bool IsDataField(Field field)
{
// the standart fields starts with “_” and we don’t want the values from this fields to be indexed
var rez = field.Name.StartsWith(“_”) ? false : true;
return rez;
}
public static string RemoveSpecialCharacters(string str)
{
var sb = new StringBuilder();
bool lastWasSpace = true; // True to eliminate leading spaces
foreach (char c in str)
{
//skip special characters
if ((c >= ‘0’ && c <= ‘9’) || (c >= ‘A’ && c <= ‘Ż’) || (c >= ‘a’ && c <= ‘ż’) || c == ‘.’ || c == ‘ ‘)
{
//remove exсessive white spases
if (char.IsWhiteSpace(c) && lastWasSpace)
{
continue;
}
sb.Append(c);
lastWasSpace = char.IsWhiteSpace(c);
}
}
return sb.ToString();
}
}
}
Next, you will want to add this custom computed field to your search index. Let’s use the Lucene index as an example:
Add this field to the index config document section:
<documentOptions type=”Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilderOptions, Sitecore.ContentSearch.LuceneProvider”>
<indexAllFields>true</indexAllFields>
<include hint=”list:AddIncludedTemplate”>
</include>
<include hint=”list:AddIncludedField”>
<fieldId>_uniqueid</fieldId>
</include>
<exclude hint=”list:AddExcludedTemplate”>
</exclude>
<exclude hint=”list:AddExcludedField”>
</exclude>
<fields hint=”raw:AddComputedIndexField”>
<field fieldName=”deepcontent”>Sitecore.Foundation.Search.ComputedFields.DeepContent, Sitecore.Foundation.Search</field>
</fields>
</documentOptions>
Add this field to the fieldMap config section:
<fieldMap type=”Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch”>
<fieldNames hint=”raw:AddFieldByFieldName”>
<field fieldName=”deepcontent” storageType=”YES” indexType=”TOKENIZED” vectorType=”WITH_POSITIONS_OFFSETS” boost=”1f” emptyString=”_EMPTY_” nullValue=”_NULL_” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>
<Analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider”/>
</field>
</fieldNames>
</fieldMap>
Now, you can add this field to your search index model and use it to search every piece of content, no matter that it is componetized into various renderings. The search index model with some search query examples will be provided in the next article.