Please use docs.servicenow.com for the latest documentation.

This site is for reference purposes only and may not be accurate for the latest ServiceNow version

Troubleshooting Performance

From Wiki Archive
Jump to: navigation, search
Note
Note: This article applies to Fuji. For more current information, see Platform Performance at http://docs.servicenow.com

The ServiceNow Wiki is no longer being updated. Please refer to http://docs.servicenow.com for the latest product documentation.



Role required
Functionality described here requires {{ #if: Admin | the Admin role. | specific roles. }}

Overview

The perceived performance of your ServiceNow instance is made up of these components.

  • Application Server response: Time for the application server to process a request and render the resultant page
  • Network latency and throughput: Time for the network to pass your request to the server and the response back
  • Browser rendering and parsing: Time for your browser to render the HTML and parse/execute Javascript
  • Instance Cache: The amount of system resources available for processing

This document will outline basic troubleshooting steps to try to isolate the source of slow response times.

Transaction Log Response Times

Checking application server and network response is easy. The instance automatically logs the vital statistics of every transaction it processes, and that information is available to you as a system administrator. To look at the log, navigate to System Logs > Transaction Log. This will bring up a list of transactions. If you would like to see the average response time of all transactions brought back by the list filter, right-click on the column name in the list header, select Configure > List Calculations (Personalize > List Calculations in versions prior to Fuji), and select the Average value check box.

In practice, you'll probably want to limit the list to those transactions that took place during the time period you are interested in. Default filter on the module only returns today's transactions.

Error creating thumbnail: Unable to save thumbnail to destination

For each completed transaction, the following information is available (times are in milliseconds):

  • Date/time, User ID, IP address and URL of the transaction
  • Total response time (does not include the browser time, which the server doesn't know)
  • Network time (network transmission time, both from and to the user)
  • SQL time (time spent executing SQL commands)
  • SQL count (number of SQL commands executed)
  • Business rule time (time spent processing business rules)
  • Business rule count (number of business rules executed)
  • Output length (how many bytes the transaction returned, after any compression)

Response Time on Forms

A response time indicator may appear at the bottom right of forms and lists. This indicator provides the processing time, including the total time and the time for each step, for a completed transaction. The following example shows the response time for retrieving a filtered list in a demo instance.

Response Time Indicator


The response time text is:

Response time(ms): 499, server: 155 network: 172 browser: 172

In this example, the transaction took the following amount of processing time:

  • 499 milliseconds total time
  • 155 milliseconds on the server
  • 172 milliseconds moving data across the network
  • 172 milliseconds in the browser, rendering the HTML and parsing and executing JavaScript

Response time appears on most pages. However, it does not appear for simple operations (such as paging through a set of records or changing the sort order of a list) or for the first transaction in a session.

To hide the response time, click the clock (ResponseClock.png). Click the clock again to show the response time.

Point to the clock to view a tool tip with the response time.

To view a detailed breakdown of the browser processing time on forms, click browser.

Form Browser Response Time

Administrators can disable the response time by setting the glide.ui.response_time property to false.

What to look for

Generally speaking, you want to look for one of two things:

  1. A period of time where all transactions took an unusually long period of time e.g. "from 11:00 AM through 11:20 AM all my transactions, which normally take 1 second or so, instead took an average of 15 seconds". That usually indicates some sort of unusual load running on that app server during that period of time (a large report, the backup window, an LDAP refresh, etc)
  2. A specific transaction which repeatedly took an unusually long time e.g. "every list of all closed incidents, sorted by short description, took a long time". Usually that indicates a particular transaction that put an unusual database load on the system (like causing it to sort 500,000 records on an unindexed field).

Client Transaction Timings Plugin

The Client Transaction Timings Plugin provides extra information on the amount of time spent on both the client and server side, and by the browser and network. This not only helps find long-running processes, but provides information on where in the process the performance issue may be caused.

What to do

  1. If you find a window of slow response time, look for a particular transaction (or transactions) which span the entire window. e.g., "It was slow for six minutes, and there was this one six minute long transaction that ran the whole time". Usually that particular transaction is the one that's loading down the system. Often, but not always, these sorts of things can be resolved by adding additional indexing to the database to make that transaction faster, although certain types of queries are always going to be slower than others, regardless of indexing.
  2. Ensure that a cache flush is not being run during business hours. Scheduled cache flushes, using cache.do, can affect overall performance and degrade system response times. Cache flushes are intended to prevent older data from interfering with changes and updates. Cache flushes should not be run during business hours and automatically triggered cache flushes are not recommended. Cache flushes are performed automatically when using update sets.
  3. If you can't find any specific culprits, but you're still seeing overall slow response time, you can contact us and we can see if there's anything globally going on with the application server hardware itself.

Network Response Times

Troubleshooting a poor network response time can be tricky, but there are certain quick tests you can perform. One clear indicator of a network issue is if users in one location have very good performance, and users in another location have very poor performance. That tells you the server and application are fine, since the only meaningful difference in that case is the network (assuming browser settings are identical).

Ping Times

The coarsest measure of network response time is a ping, with measures the total time for a packet to make it from the source machine to the target and back again. To do a ping in Windows, bring up a command window (DOS prompt) and type:

ping -t <yourinstancename>.service-now.com

A sample output:

PingResponse1.png

What's a normal ping time?

Generally, you'd like to see something under 100ms if you're stateside or 150ms if you're in Europe or Asia. In practice, though, anything less than 250ms is probably not worth worrying about as it's not generally a major component in your perceived response time.

Traceroute

If you *are* seeing slow ping times, you can (usually) run a traceroute. I say usually because some networks refuse to forward ICMP, and as such your traceroute request may not work. Nonetheless, if it does work, it's a great tool for identifying network bottlenecks. To run a traceroute on windows, you'll need to bring up a command window and run:

tracert <yourinstancename>.service-now.com

A sample output:

 C:\dev\mysql5\bin>tracert mycompany.service-now.com
 Tracing route to mycompany.service-now.com [70.87.98.130]
 over a maximum of 30 hops:
 1     1 ms     1 ms     1 ms  12.192.116.193
 2     4 ms     4 ms     4 ms  12.116.227.37
 3    32 ms    32 ms    32 ms  gbr1-p90.sd2ca.ip.att.net [12.123.145.178]
 4    33 ms    33 ms    33 ms  tbr1-p013503.phmaz.ip.att.net [12.122.2.142]
 5    34 ms    33 ms    33 ms  tbr2-cl1521.phmaz.ip.att.net [12.122.10.194]
 6    32 ms    33 ms    33 ms  tbr2-cl1592.dlstx.ip.att.net [12.122.10.81]
 7    31 ms    50 ms    31 ms  gar1-p370.dlrtx.ip.att.net [12.123.16.173]
 8    31 ms    31 ms    31 ms  12.119.136.14
 9    31 ms    31 ms    31 ms  te9-1.dsr02.dllstx3.theplanet.com [70.87.253.22]
10    37 ms    37 ms    37 ms  vl41.dsr01.dllstx4.theplanet.com [70.85.127.83]
11    31 ms    37 ms    31 ms  gi1-0-1.car16.dllstx4.theplanet.com [67.18.116.67]
12    32 ms    32 ms    32 ms  70.87.98.130
Trace complete.

How to read it

Each line in the traceroute represents a network step between the source machine and the destination machine. In the traceroute above, there were a total of 12 steps required to get my network traffic from my laptop to <yourinstancename>.service-now.com.

  • The leftmost column is the step number
  • The next three columns are latency estimates (done three times to give an average)
  • The last column is the machine we're hopping to

For example, from rows #1 and #2 above, I can tell:

 1     1 ms     1 ms     1 ms  12.192.116.193
 2     4 ms     4 ms     4 ms  12.116.227.37

At the end of row number 1, I was at 12.192.116.193. It then took me 4ms (on average) to get to 12.116.227.37.

What to look for

Generally with a traceroute you're looking for individual steps that take a long time (like 500 ms for a particular hop). You're also looking for steps that show an asterisk (*) instead of a step time, e.g.,

 1     100 ms   *        500ms  12.192.116.193

The asterisk indicates that a particular packet failed to make it, which can indicate network problems on that particular hop. Note that this is also what you'll see if that particular router is set to not forward ICMP, so this can potentially be a false alarm if all three latency times for a step are asterisks.

Browser settings affect performance

Compression is important

Modern web pages get pretty big. For example, the home page on www.cnn.com is about 104k, while Amazon.com's is about 150k. ServiceNow's pages are no different than anybody else's and as such they run the gambit from fairly small (10k or so for the login page) to quite large (> 500k for a list of 100 incidents with many columns).

In order to speed performance, most browsers have the ability to accept compressed data from an application server so that we don't have to send a full 500k of data over the wire. Instead, the browser indicates "I can accept compressed data if you can send it". The app server will then compress the response, taking our aforementioned 500k document down to about 20k.

Compression is enabled by default on all ServiceNow application servers, which means that we'll always send you compressed data if your browser tells us it'll accept it. There are browser settings that dictate whether or not your browser will inform us properly that it's willing to accept compressed responses.

IE 6 and 7

To make sure your browser asks for compressed data, bring up Tools -> Internet Options and make sure that the following two checkboxes are set in the Advanced tab (HTTP1 1.1 settings subsection):

  • Use HTTP 1.1
  • Use HTTP 1.1 through a proxy server

image:InternetOptions1.png

Frequently though, it is a proxy or edge device in the customer environment that disables gzip compression. Enabling gzip compression would also speed up the interactions.

Caching items from https locations is essential

If your organization has an Internet Explorer policy to never cache items from an https location. This causes each and every interaction to re-fetch a large amount of JavaScript and images from our server.

IE has an option that reads, "Do not save encrypted pages to disk". The Microsoft default for this option is off, and for good reason. If you do not cache https pages, then each and every interaction with the server must re-fetch a large amount of JavaScript and images. This will absolutely kill your performance.

If you have the ability to test turning the option off, after your cache is loaded, you should see response times similar to Firefox.

Here is a Microsoft article on the topic: http://support.microsoft.com/kb/260650

IE 6 and 7

This option is in Tools -> Internet Options in the Advanced tab (Security subsection):

image:InternetOptions2.png


Cache Effects on Performance

Anytime you purge and rebuild the instance cache there is a performance degradation. Avoid or minimize actions during core business hours that cause a purge and rebuild of the instance cache such as: