SharePoint Indexing Performance Tuning Tips
There are many factors involved in the SharePoint crawling process that can impact indexing performance. There are also some steps you can take to improve that. Here are the common causes and their resolution:
- Indexing Performace is set at reduced - common mistake on the configuration screen for the index service. See Central Administration > Operations > Services on Server > Office SharePoint Server Search Service Settings and set to Maximum.
- Number of Connections - by default the indexer will run a limited number of simultaneous threads (6 usually) per host. This can be increased manually by adding specific Crawler Impact Rules for each host. You can really improve speed by setting a large file server up to 64 connections. This number is just a suggestion btw to SharePoint, it also looks at other factors like the number of processors (8 * #procs). And also watch your network for bottlenecks and those pesky RPC errors you may get in your logs (dial it back of you see those)
- Crawled systems are slow or hosted on remote networks. - not a lot to be done here, except by moving those files closer.
- Overlapping Crawls - SharePoint gives priority to the first running crawl so that if you already are indexing one system it will hold up the indexing of a second and increase crawl times.
- Solution: Schedule your crawl times so there is no overlap. Full crawls will take the longest so run those exclusively.
- IFilter Issues - the Adobe PDF IFilter can only filter one file at a time and that will slow crawls down, and has a high reject rate for new PDFs
- Solution: Use a retail PDF filter from pdflib.com or Foxit
- Not enought Memory Allocated to Filter Process - an aspect of the crawling process is then the filtering deamons use up to much memory (mssdmn.exe) they get automatically terminated and restarted. There is of course a windup time when this happend and can slow down your crawling. The current default setting is pretty low (around 100M) so is easy to trip when filter large files. You can and should increase the memory allocation by adjusting the following registry keys
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set DedicatedFilterProcessMemoryQuota = 200000000 Decimal
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set FilterProcessMemoryQuota = 200000000 Decimal
- Bad File Retries - there is a setting in the registry that controls the number of times a file is retried on error. This will severly slow down incremental crawls as the default is 100. This retry count can be adjust by this key:
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set DeleteOnErrorInterval = 4 Decimal
- General Architecture Issues - Ensure that you have at least 2 Gig of free memory available before your crawl even starts and that you have at least 2 real processors available.
- Disk Health - the nature of the indexing process causes extensive fragmentation of the file system for both the index server and the database server. Schedule defrags routinely and after all full crawls. Ensure you have enough diskspace always.
- Run 64 bit OS - school is still out on this one, i personally haven't seen much difference as long as there is enough memory and the same processor types, but MS recommends this for large deployments.
- Proper SQL Server configuration (new) - For large (>5 Million) item indexing you will need to plan ahead for the correct SQL Server configuration in order to scale to these numbers. There is one table in particular that grows at 40x the number of items and can severely hinder peformance if you do not treat your SQL Environment like a very large data wharehouse. Here are my recommendations based on experience:
1. Raid 10 Direct Attached Storage Only – minimum 4 arrays – 16 disks2. Multiple File Groups- pre-allocate all database files and partition on dedicated separate arrays and assign 1-1 for:a. Indexes for SharedServices1_search dbb. Temp and system databases/tablesc. transaction log for SharedServices1_search dbd. Table content for SharedServices1_searchi. For every 5 million items have additional file from dedicated drive in dedicated file group. Content and load is spread across and will improve performance3. When intially crawling, be sure to pause your crawls every day or so and rebuild/reorg the indexes on the SharedSevice1_search_db database (especially the indexes on MSSDocProps table)NEW FROM MS: http://technet.microsoft.com/en-us/library/cc298801.aspx Whitepaper on just this topic!!!!
NOTE: It is a good idea to open up perfmon and look at the gatherer stats while indexing. There is a statistic called Performance Level and this reflects the actual level that the indexer is running at where 5 is max and 3 is reduced. Even if you set everything to max the indexer may decide to run at reduced anyways based an some unknown factors.
This is a good read too: http://technet.microsoft.com/en-us/library/cc262574.aspx (Estimate performance and capacity requirements for search environments)
Here is another good read from MS (http://technet.microsoft.com/en-us/library/cc850696.aspx (Best practices for Search in Office SharePoint Server)