I have been spending copious amounts of time trying to speed up our Web server. The results have been heartening and the experience typical of performance engineering. It's one of the most amazing activities of software development. Incredibly frustrating when you cannot speed it up but an adrenaline-rush when it does and the first person I bother when it works is poor Rajiv.
I am trying to log the various performance enhancements - some are just tips and some are things I discovered or used. Some you know, perhaps, some you don't.
- Socket Writes : The lesser the better. Buffer your socket writes so you never inadvertently write multiple times to the socket. Copying to the buffer is much less expensive than a socket write especially for small data.
- Socket Reads : Similarly, read as much as you can from the socket in one shot. Smaller reads will result in lesser performance. These rules are true of File streams as well.
- Socket Connections : Again the lesser the better. Creating a connection is extremely expensive. You can do several ten times of requests/second more on a kept-alive connection than if you have to create connections.
- Socket properties : Setting socket properties is expensive. e.g. java.net.Socket.setSoTimeout. Avoid setting socket properties per request, see if you can change them to per connection? [multiple requests in a kept-alive connection]
- String Operations : If you are writing extremely performance sensitive code such as a highly benchmarked web server :-) then keep Strings to a minimum. I cannot stress this enough. If you have a large character stream that you need to tokenize, parse, etc, use references into character arrays to reduce String operations. Create a class that takes a pointer to this character array, its start and length and you can create lots of instances of this class to point to pieces of the character stream instead of String objects which make copies of the characters. String concatenations can be quite expensive too.
- Avoid data copies : This is obvious, isn't it? However, there are so many methods that try to be safe and make data copies it happens without even one knowing it. Examples of classes that do it are - String , ByteArrayOutputStream.toByteArray(). Don't get me wrong - there's nothing wrong with these classes, it's just that sometimes one doesn't realize that these methods are causing data copies which is affecting performance.
- Set instead of List : A common programming mistake is the wrong choice of data structure to do a contains() . List.contains() is an O(n) operation versus Set.contains() for HashSet has a best case behaviour of O(1). So, if order does not matter to you, use Set.
- Object Pooling : Some classes are expensive to create. eg. large arrays. Use pools of these objects so they can be reused, there by preventing GC lags for these, and primarily creation costs. Interestingly, PrintWriter, if pooled, shows considerable performance improvement. The creation of the object is expensive because of a call to get the line separator in its constructor.
- You can buffer at unexpected places. e.g. You may have a logger thread that asynchronously logs certain frequently running activities (e.g. access logs) and notifying the writing thread every time will be expensive. Might help considerably to collect a few and then notify the thread.
- Integer.toString() is much much faster than
integer + "".
- try-finallies result in interesting performance degradation in Sun JVMs. Read about it here and Rajiv's excellent follow-up on it here.
- Lazy instantiation : Don't create something until you need it.
- The usual optimizations always apply - loop optimizations, moving loop-invariant code out of the loop, unused variables, repetitive processing, etc.
- Perceived performance : [I would love to collect my thoughts on that one sometime - we have had some very interesting experiences with perceived performance enhancements over the years ] The faster you respond so that the browser starts to refresh [if memory serves me right, a 50 milli- second response is perceived to be instantaneous by a human being] the more responsive the web-server looks although the total amount of time for the response might be the same as in the case if all data was returned in a single write.