Saturday, August 27, 2005

A Strange Situation With Buffered Socket Stream

A common technique to buffer socket output stream is to wrap it in a BufferedOutputStream

OutputStream os = new BufferedOutputStream(socket.getOutputStream(), BUF_SIZE);

where BUF_SIZE is the size of the buffer to be used. This works great except in one strange situation where it seems to fail.

Before I get any further, I would like to stress that this piece of information would only be useful to people for who network performance is critical. In applications where it is not, it is best to leave things as they are. Also, this is not related to NIO and might not be an issue in that case. If any one is aware of this please post it in the comments.

Consider an application that writes a small amount of data (say the content length of the data to follow, just a few bytes), followed by larger chunks. (HTTP Chunked encoding is a good example). BufferedOutputStream's write(byte b[], int off, int len) has the following implementation of buffering -


0 if (len >= buf.length) {
1 /* If the request length exceeds the size of the output buffer,
2 flush the output buffer and then write the data directly.
3 In this way buffered streams will cascade harmlessly. */
4 flushBuffer();
5 out.write(b, off, len);
6 return;
7 }
8 if (len > buf.length - count) {
9 flushBuffer();
10 }
11 System.arraycopy(b, off, buf, count, len);
12 count += len;


The if conditions in lines 0 and 8 are for cases where the data being written does not fit into the buffer.

Now, consider our case -
a) Application writes 4 bytes - buffer has 4 bytes.
b) Application writes a larger chunk that exceeds buffer capacity. To prevent needless data copies, BufferedOutputStream flushes its buffer (writing only 4 bytes to the socket stream) and then either copies or writes the chunk based on the conditions in lines 0 and 8.

If this repeats then you see, on the socket stream, writes of size - 4, CHUNK_SIZE, 4, CHUNK_SIZE. (The assumption is that CHUNK_SIZE is atleast BUF_SIZE - 4 for this behaviour) - net effect - the advantage of buffering is lost.

If the application cannot control CHUNK_SIZE, one solution is to override the write in BufferedOutputStream and ensure that a full BUF_SIZE worth of data is written and the remaining (CHUNK_SIZE - BUF_SIZE) is filled in to the buffer.

Monday, May 02, 2005

Reflections On Debugging

This rather pedagogically worded article is a collection of my thoughts on debugging Java software, the programming patterns I have used, some useful API and techniques.

What it is not - it is definitely not complete in terms of information on debugging, its techniques, styles, etc. It is primarily a list of things that have worked for me time and again and a few tools that I keep in my toolkit to use when the situation demands it. I think they will be of use to you as well.

I have been fortunate to work in environments where I touched upon various facets of Java, used various APIs and generally did extremely satisfying work. In all these years, debugging has stood out as an activity that everybody has to perform almost as much as they code or design. I have noticed time and again that being able to debug well is an extremeley useful skill. It can be learnt over time and honed and it is to a large extent the ability to match problem patterns to past issues. People like Rajiv can uncannily pinpoint a problem's cause when they hear its description. This ability comes from years of experience and the intent to learn from every new debugging experience.

Debugging

Debugging is the act of locating and fixing a flaw in software. A flaw can manifest itself in multiple ways. Sometimes it is apparent such as cases when the program crashes or does not do the intended action or does not return the intended result. Sometimes it is hard to say what's wrong when a program does not return or the CPU keeps processing something or when the program does something unexpected in addition to the right action. Debugging, of course, is the action we take post having seen a flaw.

Isolating the problem to code : Identifying where to look for a problem

The problem or flaw appears as a failure of the software to do something it should have. When you encounter a flaw, to debug it, you need to form a mental model of the code to identify where is the code that failed. Debugging largely follows the process of elimination and this process is helped by any symptoms that you can find.

When you have the piece of code that failed you try and find the cause by asking and answering questions - what is occuring? What possible causes could result in this problem? E.g. If something should've happened and it didn't perhaps the code was not reached. Why would the code not be reached? Maybe the if condition under which the method gets called did not evaluate to true or perhaps the if (something != null) check was called when something had the value null.

Another example - If there is an exception then there is additional information about the location of failure and the steps that led to it. The type of Exception will tell you the nature of failure. So if you see a ConcurrentModificationException thrown by an Iterator's next() method you will have to -
a) Find out under what conditions this happens.
b) How could these conditions have been created in your program?
c) Maybe you removed something using list.remove() in your loop, or perhaps you passed reference to the list to some other thread which is modifying it.
Once you have a mental picture of the surroundings of the problem and why it might be occuring, it’s a matter of eliminating the reasons one by one starting from the most likely cause.

Even if you intend to use a debugger, this is a necessary step. You have to always backtrack mentally from the point of failure to locate all possiible causes of failure. A lot of debugging skill relies on this one ability alone.

Reading an Exception

Java Exceptions have a lot of information in them and should be a very well understood to debug problems. Often, I have noticed programmers use the following template for exceptions.
try {
// do stuff here
} catch(Exception ex) {
System.out.println(ex);
}
If you wish to print the exception to know when a problem has occurred then you must consider using ex.printStackTrace(). There are multiple advantages -
A) When using System.out.println(ex), several times there is no message printed other than the class name of the exception that occurred. Imagine having this piece of code in multiple locations, how will you ever know which catch handler printed java.lang.NullPointerException.
B) An exception when printed stands out in a log file or the console. It is several lines long and just the pattern of an exception stack trace print is so different from other message, it is much easier to find than an exception message that looks like other logging statements.
C) Following JDK1.4, chained exceptions get printed as well and you don't have to manage them manually. Root cause gets carried along with the exception.
D) Lastly and most importantly, the stack trace contains a wealth of information that can be used to create a mental picture of what happened.

There have been lots of times when I have looked at a stack trace and said - it should not have come here and been able to trace the problem to a wrong check in an earlier part of the code. You must know how to read an exception. Here's a Java exception printed out.

: Output generated by System.out.println() :
java.lang.ArrayIndexOutOfBoundsException: 0


: Output generated by ex.printStackTrace() :
java.lang.ArrayIndexOutOfBoundsException: 0
at com.sonicsw.tools.test.ThrowException.processArgs(ThrowException.java:32)
at com.sonicsw.tools.test.ThrowException.main(ThrowException.java:21)



1) java.lang.ArrayIndexOutOfBoundsException: 0
The first part of printStackTrace is to do a print out the exception similar to the System.out.println so already you have gotten that for free. This part of the exception stack trace is formatted according to the type of exception and the information printed varies from exception to exception. Some exceptions print nothing more than the class of the exception. Some exceptions (specially custom ones) print a lot of context information that led to this exception.
2) at com.sonicsw.tools.test.ThrowException.processArgs(ThrowException.java:32)
The rest of the exception is the stack trace starting with the location that threw the exception at the top and the caller of the method in which the exception was thrown below it and so on until the executing thread's run method or the main method. The information provided on this line consists of -
the fully qualified class name - com.sonicsw.tools.test.ThrowException
the method - processArgs
the file - ThrowException.java
line number - 32.
Obviously, these are great nuggets of information. In some cases, when compilation does not include debugging information, you can end up with stack traces that do not have line numbers. That’s usually a bummer but atleast you have the stack of methods to locate where the problem occurred.

Other variations for method names are <clinit> for a static initializer - (this is also an example of exception chaining - note the "Caused by:")
java.lang.ExceptionInInitializerError
Caused by: java.lang.IllegalArgumentException
at com.sonicsw.tools.test.ThrowException.<clinit>
(ThrowException.java:21)

<init> for constructors and initializers
java.lang.IllegalArgumentException
at com.sonicsw.tools.test.ThrowException.<init>
(ThrowException.java:20)
at com.sonicsw.tools.test.ThrowException.main(ThrowException.java:24)


and the $number convention for anonymous classes -
java.lang.IllegalArgumentException
at com.sonicsw.tools.test.ThrowException$1.actionPerformed(ThrowException.java:25)
at com.sonicsw.tools.test.ThrowException.
(ThrowException.java:23)
at com.sonicsw.tools.test.ThrowException.main(ThrowException.java:31)


Ofcourse, if the line number is available it makes it a lot easier.

Sometimes, to understand why an exception occurred, you have to understand how control got to that point. The easy exceptions are always the ones which have a localized problem and you can easily catch the problem by the exception. But things can get really hard. E.g. The culprit could lie several methods below and might not even be caused in this thread of execution. A previous event might have generated a bad value which was stored in a field and was picked up by this thread of execution in which the exception occurred. To get to the bottom of these, you need to create a mental back map of methods and events that might have occurred. It always helps to see what methods were called to get to the point of failure.

Trapping Exceptions

Sometimes, something you expected to execute does not occur and there is no information on why it failed. For some reason, control got transferred out of your method. Once you have eliminated any if conditions that are failing it could be an exception getting thrown from somewhere deep inside your code. No exception gets printed though. This usually happens, when you are implementing a piece that fits into a framework. Perhaps, the framework has a logging switch that’s set too low for exceptions to get printed. Or perhaps, the framework is faulty and is not printing exceptions being thrown by overridden methods. Perhaps, the framework is failing because of the unexpected exception being thrown by your method. In such cases, it is best to eliminate this possibility by wrapping the entire method by a try-catch block. For such debugging situations, I prefer to wrap with a try … catch(Throwable t) block because you want to be sure that no Exception or Error is being thrown. Sometimes, when Errors get thrown (such as a NoClassDefFoundError because of a faulty classpath) it will slip through all catch(Exception) blocks.

Debugging is often an exercise in eliminating possibilities and locating the faulty piece of code. This is one technique to achieve that.

When to use Thread.dumpStack

Another neat tool to keep in your toolkit is using Thread.dumpStack() or equivalently new Exception().printStackTrace(). What either of these methods do is to do a printStackTrace() at this line without actually throwing an exception. The usual reason you do this is because you want to know what caused control of execution to come to this point. Problems caused by a method being called unexpectedly can be identified by this.

Using VM Thread dumps and understanding them

My biggest complaint with a lot of experienced Java developers is that they have never heard of the most amazing debugging tool called the VM Thread dump. You can use this technique in innovative ways -
A) To detect deadlocks
B) To diagnose UI hanging problems
C) To diagnose slow UI issues
D) To diagnose spinning/infinite loops
E) For quick and dirty profiling
F) To get an understanding of what the VM is doing at that instant

I cannot do better than this excellent article on this topic available here.

Classpath problems

Another class of problems are classpath issues. There are times when you are not sure if there is another version of a class in the classpath that is getting picked up before yours. Usually a result of a bad environment setup. To eliminate this possibility a simple check is to add a print statement to see if your new code gets picked up. If it isn't getting picked up you need to locate the other class that is getting picked up. One neat API in Java that allows you to locate where a class is being picked up is

Class.getProtectionDomain().getCodeSource().getLocation()

In most cases, depending on the class loader being used you will get the location of the class which is being executed and you can correct your environment setup.

When to use a debugger and when to print

There are times when you should use a debugger and there are times when a print statement is more useful. When you can use what is very well described by the brilliant pieces of work mentioned in the references section. The point I want to make here is that you must know what can be achieved using a debugger. It is an extremely powerful tool and can reduce debugging time quite a bit. It is applicable in many cases but is not suitable for a certain set of problems such as threading issues or issues that show up in long running tasks in which case a log with good print statements is essential.

Good debug printing

Once you have a mental check-list of what the causes might be, you need to eliminate them. Debug prints can tell you quickly if your assumption is right or wrong. Ensure that your debug prints are not causing any side effects inadvertantly. E.g.

System.out.println("value.getCode() = " + value.getCode());
if (value == null)
return;
switch(value.getCode()) {

}

It is important to print sufficient information about the object you are interested in. Perhaps your code is falling through a switch statement without firing any of the case clauses. You will need to print the value of the switch condition. Think a little before you decide to type in any debug print statements. Often when using good debug print statements, the code gets peppered at useful locations with debug statements which can be switched off with a boolean for future use. An easy to use pattern is

private static boolean DEBUG = Boolean.getBoolean("<classname>.debug").booleanValue();

So e.g. if the class is com.test.ArgumentsProcessor then you would write
private static boolean DEBUG = Boolean.getBoolean("com.test.ArgumentsProcessor.debug").booleanValue();

The advantage is that you can switch on debugging for this class without recompiling anything by specifying -Dcom.test.ArgumentsProcessor.debug=true when starting your VM.
E.g. java -Dcom.test.ArgumentsProcessor.debug=true <main-class> <args>

However, the compiler will not remove your debug statements during code optimization in this case.

Using Logging

There is not much that I can add on using Logging that is not already covered by a vast amount of material. You can take a look at the References section below for other information on this subject. The point I would like to make here is that when you are writing print statements that go to a log for debugging purposes, maybe as a patch to a customer to diagnose a particularly tricky problem, think about how the log file might look when it is sent to you. What we think the output would be like when we write log print statements changes dramatically in a live system with multiple threads executing the same log messages. Maybe you need to print the Thread id to bunch all logs or an operation together or perhaps some business data structure id needs to printed with each statement to understand what's happening. You may need to print out the execution path leading to the suspected problem location to know what conditions caused control flow to get there. Another thing to keep in mind is that log files can get so verbose and have so many messages that it becomes very difficult to scan them for problems later.

References

The Art of Programming (Brian Kernighan, Rob Pike) Highly recommended reading
How to be a Programmer (Robert L Read) The first chapter is on debugging and is really well written.

Friday, April 22, 2005

Swing fighting back

Being a veteran Swing programmer and a recent Eclipse plugin developer, I seem to have seen either side of the UI toolkit war.

I have a few interesting observations I would like to share.

Swing Knowledge
We have been trying to hire a few developers with past UI experience (Swing experience is more easily available) but I am apalled at how little the people we have talked to know about Swing. Most people seem to hack in a UI and know just about enough to manage to get the UI working. How a renderer works and what you can and cannot do with it is not understood. Neither did these guys understand that you can write your own TableModel implementation and wire it to your data model and it doesn't have to always be backed by ArrayLists or arrays. I agree that these are not the most knowledgeable Swing programmers out there but when they see a DefaultTableModel they just do whatever they need to to populate it and get their JTable working.

JFace (MV framework over SWT) on the other hand enforces an interface such as a IContentProvider (this has its own limitations though) which needs to be implemented and keeps the model to itself. This is interesting because people then don't think in terms of populating Vectors but consider returning their business objects. I suspect that this might really help in writing better models to their UI. I haven't been able to verify this because of the complete lack of SWT profiles that we have received.

Swing's fight back
I have done a lot of Swing programming in the past and this goes past the regular hack-a-UI. I have been involved in several interesting Swing UIs (IDEs, applications, etc). I have had my share of issues with Swing but over time you learn to work around them. SWT definitely looks good and there are enough articles describing the virtues and problems with either. What is really nice is to see how strongly Swing is fighting back. Several Swing developers are openly defending false accusations (primarily performance related) and there is serious effort at fixing other issues that have remained unfixed over years (probably because there was always something more important to do).
  • NetBeans is being treated by Sun as their flagship Swing product and the competition has really done it (and Swing) a lot of good.
  • Take a look at the new Desktop Java features.
  • Some very important perceived performance fixes such as this.
Swing Performance has been steadily improving and this is an interesting comment on the EclipseWiki
For most of the time that Eclipse was under development, all of the above were very true. It has only been a very recent development (~JDK 1.4) that Swing has gotten fast enough to be acceptable. The look and feel emulations built into Swing are still lacking in terms of platform fidelity (although this is mitigated somewhat by the existence of high-quality 3rd party Swing look and feels).
available here.

All in all I think we'll see very strong improvements in Swing over the next few releases. I am really looking forward to it.

Swing Image
What has still not happened and I am hoping that it will be fixed in the future is that Sun should make sure that any UI they dole out is really good and this also goes for things like Windows Control Panel Java Applet which doesn't look very polished. Sun cannot have any excuses for shipping badly designed UI. Swing's image is affected by such UI and it can potentially undo (to an extent) all the hard work being done.


Check this list of Swing best practises out.