Saturday, August 27, 2005

A Strange Situation With Buffered Socket Stream

A common technique to buffer socket output stream is to wrap it in a BufferedOutputStream

OutputStream os = new BufferedOutputStream(socket.getOutputStream(), BUF_SIZE);

where BUF_SIZE is the size of the buffer to be used. This works great except in one strange situation where it seems to fail.

Before I get any further, I would like to stress that this piece of information would only be useful to people for who network performance is critical. In applications where it is not, it is best to leave things as they are. Also, this is not related to NIO and might not be an issue in that case. If any one is aware of this please post it in the comments.

Consider an application that writes a small amount of data (say the content length of the data to follow, just a few bytes), followed by larger chunks. (HTTP Chunked encoding is a good example). BufferedOutputStream's write(byte b[], int off, int len) has the following implementation of buffering -

0 if (len >= buf.length) {
1 /* If the request length exceeds the size of the output buffer,
2 flush the output buffer and then write the data directly.
3 In this way buffered streams will cascade harmlessly. */
4 flushBuffer();
5 out.write(b, off, len);
6 return;
7 }
8 if (len > buf.length - count) {
9 flushBuffer();
10 }
11 System.arraycopy(b, off, buf, count, len);
12 count += len;

The if conditions in lines 0 and 8 are for cases where the data being written does not fit into the buffer.

Now, consider our case -
a) Application writes 4 bytes - buffer has 4 bytes.
b) Application writes a larger chunk that exceeds buffer capacity. To prevent needless data copies, BufferedOutputStream flushes its buffer (writing only 4 bytes to the socket stream) and then either copies or writes the chunk based on the conditions in lines 0 and 8.

If this repeats then you see, on the socket stream, writes of size - 4, CHUNK_SIZE, 4, CHUNK_SIZE. (The assumption is that CHUNK_SIZE is atleast BUF_SIZE - 4 for this behaviour) - net effect - the advantage of buffering is lost.

If the application cannot control CHUNK_SIZE, one solution is to override the write in BufferedOutputStream and ensure that a full BUF_SIZE worth of data is written and the remaining (CHUNK_SIZE - BUF_SIZE) is filled in to the buffer.


rs said...

A very nice observation!

I wonder why the author Arthur chose to implement the BufferedOutputStream that way. It was probably assumed that two array copies are more expensive than two flushes?!

Maybe you should log a bug with Sun and see their response!


Anonymous said...

Can you write this same thing in eglish