Wednesday, 13 February 2008

Notification Server Basic Health Check

Article ID: 33202

Question

What steps should be taken to perform a basic Notification Server performance health check?


Answer

The following steps are focused on eliminating Notification Server console timeouts and or slow responses, but are applicable to resolving many Notification Server performance problems.

  1. Upgrade to the latest version of the solution that is experiencing a timeout or performance issue. This is particularly true in the case of Patch Management Solution and Software Delivery Solution. Both have had issues in prior versions with handling very large tables and or contained inefficient SQL queries.
  2. Ensure that the database used space is appropriate to the managed computer count. A general rule of thumb is 1 MB of space per managed computer (5,000 computers equals approx. 5 GB of space in the database). Don’t forget that the physical file size of the database will be larger than the actual space used. Use the appropriate SQL management tool to see actual space used in the database.
    Article
    21310, "How do I determine what the database table sizes are per solution"
    If used data space is higher than expected, check the individual table sizes and prune the Event tables as necessary.  Extremely large event tables indicate an ongoing database timeout problem during the nightly purging process.  Several Notification Server reports depend upon aggregating event data, and millions of rows from 6 or more months ago will drastically increase report generation times.
  3. Check the SQL Index fragmentation levels (article 18828 for SQL 2000 and article 25784 for SQL 2005) and rebuild indexes as necessary. Heavily fragmented indexes can have a severe impact on performance. Rebuilding indexes will also help free wasted space in the database.
  4. Review IIS logs for heavy traffic consumers. Any standard IIS log parser can aggregate visits by IP address and URL. Any IP address visiting the same URL more than 100 times over an 8-hour period indicates an unhealthy managed agent, a broken Notification Server Web service, or an agent configuration policy with an overly aggressive interval.
  5. Review agent configuration intervals and collection update intervals. The top two sources of processing load are caused by agent configuration requests, and collection rebuilds. 
    • For production purposes, Altiris agent configuration intervals should be no less than 1 hour, with most enterprise environments using 4–6 hour check-in intervals.
    • For production purposes, Delta and Policy Change collection update intervals should be no less than 30 minutes, with most enterprise environments using 1–3  hour update intervals. A general rule of thumb is update collections twice as frequently as the Altiris Agent interval. Stagger the start times on the collection update schedules by 10–15 minutes to avoid concurrency problems. The full collection update schedule should remain at once per day.
  6. Review the SQL tuning and configuration articles as listed in article 30821, "NSEs are not being processed and the Altiris Console is too slow."
  7. For very large environments, also review article 26878, "Common problems for very large environments."

No comments: