One of the most common missed configuration settings in SQL Server is the cost threshold for parallelism. By default it is set to 5, meaning that when SQL estimates that the cost of a query will be greater than 5 running a serial plan, SQL will split the plan into parallel streams in an effort to make the query return faster.

In theory, this is a great thing. We can benefit from additional processors and make our queries run faster. The problem is that this estimate is based on how long a query would take to run on hardware that is 20 years old. Microsoft hints at this here by saying that the cost estimate “refers to an estimated elapsed time in seconds required to run the serial plan on a ‘specific hardware configuration’.”

The general consensus as to why it is set this low is that a single developer’s workstation was used as the hardware configuration baseline when developing the optimizer for SQL Server 7. There are numerous blog posts about this; however, I cannot confirm the story or the specific configuration of the hardware. If this belief is correct, it can be concluded that the default value of 5 is equal to a query that would have to run for 5 seconds if the query were to run with a serial plan on the developer’s machine 20+ years ago. Running the same query on current hardware is likely to take .1 seconds or less while still having an estimated cost of 5 or greater.

Given that there is a cost in parallel processing, both in splitting the stream and re-combining it once processing is complete, the default of 5 can actually make a large number of queries run slower than if there were a more appropriate value set. This default value essentially can cause a large number of fast queries to run in parallel, that would be faster if they run serially.

In addition to the overhead of parallelism, having this value set so low can create additional issues such as high CPU usage and excessive CXPACKET wait types. This single setting can also cause the server to appear to be under memory pressure and cause longer read and write times on the disk. By changing this setting, you can often resolve CPU, RAM, and disk bottlenecks. Given the speed and size of today’s hardware, a better default setting is 45 or 50 for the cost threshold for parallelism.

This setting is configurable on the fly and does not require a restart of services. You can adjust this setting to meet your needs based on your hardware profile without causing or needing an outage. In my opinion, this setting is one of the most important and is one of the most often overlooked. To make this change you can run the script below.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE; 
GO
sp_configure 'cost threshold for parallelism', 50;
GO
RECONFIGURE;
GO

In some instances, you may still see high CXPACKET waits even after setting the cost threshold. In this case, you may need to drill a little deeper and may also need to evaluate your MAXDOP setting. Jonathan Kehayias wrote a fantastic blog titled Tuning ‘cost threshold for parallelism’ from the Plan Cache. He provided a script that among other things would query the plan cache and give you the execution count and estimated subtree cost of plans that have gone parallel. I have modified his script to take the top 50% of plans that have a high usage (larger number of executions) and then take the average estimated subtree cost to get what a more finely tuned value would be of the higher cost plans that are most often executed. The value that this script returns will give you a more specific value to use if one of the default values does not seem to be giving you the results you desire.


WITH XMLNAMESPACES   
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')  
SELECT round(avg(cast(statementsubtreecost as decimal(18,2))),0)
from (SELECT  top 50 percent
     n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost, 
     ecp.usecounts, 
     ecp.size_in_bytes 
FROM sys.dm_exec_cached_plans AS ecp 
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp 
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n) 
WHERE  n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 
Order by ecp.usecounts desc
) t1

As with anything you should always test in a non-production environment. That being said, adjusting the cost threshold for parallelism can make a significant performance improvement on your system by a simple on the fly change and at no time do I recommend leaving this at the default value.