SharePoint 2013 Search – Continuous Crawling

One gorgeous gem of SharePoint 2013 is Continuous Crawling. Together with the Content by Search web part you can build awesome search-driven solutions. How does Continuous Crawling work? First, let’s take a moment and walk down memory lane…

SharePoint 2010

In SharePoint 2010 we have Full Crawls and Incremental Crawls. Both crawls you can schedule and especially the Incremental Crawl is scheduled often. With Incremental Crawl all content changed since last crawl is retrieved and indexed. But the results may not be what you might expect: the index is not as accurate as you hoped to be. Imagine this situation:

You have scheduled to perform an Incremental Crawl every 15 minutes.

In the first schedule (A) the crawler retrieves a content set and can handle that set within 15 minutes. No problem here. The next schedule (B) is executed and this time the crawler retrieves a bigger content set, but now it cannot be handled within 15 minutes, but it takes 20 minutes. Now a problem arises. Because only one crawler at a time can be executed, the third schedule will be killed immediately. We have to wait for the next window that the schedule takes place. This will be (C) at 45 minutes. So, in this case we had 30 minutes between 2 Incremental Crawls instead of the scheduled 15 minutes. So, the bigger the content set to handle by the crawler to more risk there is to miss scheduled crawling windows. This results in a lesser up to date search index.

SharePoint 2013

Now Continuous Crawling comes to the rescue. With Continuous crawling you don’t have to schedule anymore. Crawlers are running in parallel and the crawler gets changes from the SharePoint sites every 15 minutes. This is the default setting and it can be changed using PowerShell. So, when a crawl is executed and finished, it continues to crawl immediately in spite of the length. In our previous situation when a crawl is taking more then 15 minutes to handle the content set, another crawler is started.

It is even possible that a second crawl is started only a few minutes after the first crawl. This way your newly added or changed data is almost immediately available through search. How cool is that!

Now, a few things to bear in mind. Continuous crawls can only be enabled for content sources based on SharePoint sites.

Also, it can be resource intensive for your server running all the crawlers. Be prepared for that.

Summary

With continuous crawling data is almost immediately for search. No more issues with incremental schedules and big data which can cause missing schedule windows. Using continuous crawls and the Content by Search web part, for example, gives you lots of opportunities to build awesome search-driven solutions.

Some useful references to TechNet articles:

SharePoint 2010:

http://technet.microsoft.com/en-us/library/cc280343(v=office.14).aspx#section1
http://technet.microsoft.com/en-US/library/ee792876(v=office.14).aspx

SharePoint 2013:

http://technet.microsoft.com/en-us/library/jj219802.aspx