What is Runstats?
Runstats is a utility in Db2 that is used to collect statistics about the data in Db2 tables and indexes. This statistical information is critical to Db2 database performance. Runstats is executed on a table by table basis, so scripting it is quite common.
One of the powers of SQL is that you don’t have to know where the data is when you query it. All you have to know are things like the table or view name and the column names. You don’t have to know what disk, or SAN, or LUN, or sector. This means that when a query is executed, Db2 must decide how to access the actual data on disk and how to bring it together to form one result set that looks like the described result set.
Db2 has a powerful cost-based optimizer that makes these decisions. Dozens or hundreds of potential access plans are evaluated based on how expensive the optimizer thinks they will be, and the optimizer then chooses the cheapest access plan. The most critical input to this process is the statistical information about the data being returned. A cost cannot be accurately estimated if there is no estimate of rows in the table.
Runstats can be done fully online, though the activity may impact server resources like CPU utilization, and may also impact buffer pools as it involves full table scans. Even during high-impact or high-volume times, the cost of runstats is usually worth the impact.
Db2 offers automatic runstats that can be useful. However, there are some situations where they do not kick in when we need them to. It is therefore critical to augment automatic runstats with targeted runstats after high-impact operations, and to regularly make sure that statistics are updated on everything. Automatic runstats is configured by default in new databases. The following parameters in the database configuration should be on for automatic runstats:
Automatic maintenance (AUTO_MAINT) = ON Automatic table maintenance (AUTO_TBL_MAINT) = ON Automatic runstats (AUTO_RUNSTATS) = ON Real-time statistics (AUTO_STMT_STATS) = ON Statistical views (AUTO_STATS_VIEWS) = ON
If using automatic runstats, it makes sense to either do a full runstats on a periodic basis (daily, weekly, or monthly) or to do a periodic runstats targeted to hit tables that have not had runstats done in a while. There are some non-changing tables, particularly in analytics environments, where it may make sense to never do runstats. They are the exception, and not the rule.
You can also control when automatic runstats happen and exclude tables by using a policy file.
This is a useful query for understanding, overall, how current runstats are within a database:
SELECT date(stats_time) AS stats_date , COUNT(*) AS num_tables FROM syscat.tables WHERE (type='T' OR (type='V' AND SUBSTR(property,13,1) = 'Y')) GROUP BY date(stats_time) WITH UR STATS_DATE NUM_TABLES ---------- ----------- 08/08/2017 1 12/10/2017 41 12/12/2017 1 12/15/2017 2 12/16/2017 2 12/17/2017 1 01/10/2018 2 01/16/2018 1 01/17/2018 2 - 121 10 record(s) selected.
The output above tells us that 121 tables or statistical views in this database have never had runstats done on them. The others have various runstats times spanning over a few months. Note that this doesn’t dig deeper, but is a good starting point for evaluating runstats on a database and knowing where to dig deeper.
It is also possible to list out all runstats actions taken, whether they are automated or they are manual over a specified period of time:
select pid, tid, substr(eventtype, 1, 10), substr(objtype, 1, 30) as objtype, substr(objname_qualifier, 1, 20) as objschema, substr(objname, 1, 10) as objname, substr(first_eventqualifier, 1, 26) as event1, substr(second_eventqualifiertype, 1, 2) as event2_type, substr(second_eventqualifier, 1, 20) as event2, substr(third_eventqualifiertype, 1, 6) as event3_type, substr(third_eventqualifier, 1, 15) as event3, substr(eventstate, 1, 20) as eventstate from table(sysproc.pd_get_diag_hist ('optstats', 'EX', 'NONE', current_timestamp - 3 days, cast(null as timestamp))) as sl where eventstate='success' order by timestamp(varchar(substr(first_eventqualifier, 1, 26), 26)); PID TID 3 OBJTYPE OBJSCHEMA OBJNAME EVENT1 EVENT2_TYPE EVENT2 EVENT3_TYPE EVENT3 EVENTSTATE -------------------- -------------------- ---------- ------------------------------ -------------------- ---------- -------------------------- ----------- -------------------- ----------- --------------- -------------------- 41353228 172976 COLLECT TABLE AND INDEX STATS Db2NT IDC_PRIVAT 2017-03-02-04.00.27.162299 BY User - - success 41353228 172976 COLLECT TABLE AND INDEX STATS Db2NT IDC_SPEC_P 2017-03-02-04.00.27.669589 BY User - - success 41353228 271592 COLLECT TABLE AND INDEX STATS Db2NT INTERNET_S 2017-03-02-04.08.35.500213 BY User - - success 41353228 303750 COLLECT TABLE AND INDEX STATS Db2NT ICARO_MAX_ 2017-03-02-04.22.13.016770 BY User - - success
There are a vast number of options available to make statistics more detailed and relevant or to make execution of runstats faster or less impactful. The full syntax for the runstats command is available in the IBM Db2 Knowledge Center. Full coverage of every option is beyond the scope of this article.
Generally using syntax like this is a good default place to start:
runstats on table schema.table with distribution and detailed indexes all;
If doing runstats on statistical views, the proper syntax on 9.7 is:
runstats on table schema.view with distribution;
With 10.1 and higher, the following syntax works for statistical views:
runstats on view schema.view with distribution;
Distribution statistics and detailed index statistics give the best set of data for Db2 to work with. Sampling can be used at the table and/or the index level to speed up runstats. Sampling is acceptable to use if runstats simply do not complete within a reasonable time. Depending on database size, runstats may take hours to run for all tables.
The runstats utility can also be throttled. This will cause it to run slower, but to have less impact on the database server while it is running.
Special Statistical Techniques
Column group statistics can be used to calculate correct distribution statistics on a group of columns within a table. These can solve specific problems related to the optimizer incorrectly estimating the cardinality for a section of a query. If this is used, then a statistical profile should be set for the table. This will allow scripts and utilities to respect the special statistics used.
Statistical views fill a similar role to support distribution statistics that may have specific relationships across more than one table. They require regular runstats to make sure the data is current.
Best Practices for Runstats
- Automated Runstats should minimally be enabled through setting the database configuration parameters for AUTO_MAINT,AUTO_TBL_MAINT, and AUTO_STMT_STATS. It is optional to enable AUTO_RUNSTATS and AUTO_STATS_VIEWS.
- If full automatic statistics are used, check for missed tables on a monthly or weekly basis and do runstats on them.
- If full automatic statistics are not used, script daily or weekly runstats for all tables following the scripting best practices below.
- Runstats should be collected on system tables
- Runstats should be executed for the whole database before any REORGCHK operation
- Runstats should be executed for a table after any REORG operation
- Statistics profiles should be used for any tables needing a runstats command different than the standard. It doesn’t hurt to set statistics profiles for everything.
- Distribution statistics should generally be collected
- Consider excepting volatile tables from runstats.
Runstats Scripting Best Practices
The best runstats script should:
- Generate a list of tables to do runstats on in a dynamic fashion so that any new tables are picked up
- Respect any statistics profiles
- Handle table names or schema names with mixed case or space
- Allow for exclusion of specific tables
- Generate runstats both on tables and statistical views
- Perform an rbind after collecting statistics
- Log the time that runstats starts and stops on each table
- Minimally log the output of each command. Ideally handle errors properly and retry for certain errors.
Following these best practices for regular runstats will prevent performance degradation. It will also make sure that when a performance problem occurs, you have a stable foundation from which to investigate and engage expert help. With XTIVIA’s Virtual-DBA services, we are happy to implement our scripts and best practices for keeping statistics up to date in your databases.