Bug ID: JDK-4131655 java.io.InputStreamReader performance: Factor of five speed penalty

See the attachment for the code that generated these measurements; you may
need to comment out the test for "UTF16Reader" to compile and run it.  The
test data is generated by the "gen.java" file attached to bugid 4131647,
which turned up its own set of bugs ... Note that the custom readers took
about 1/2 hour to write and debug.  Admittedly, things like UTF-8 will be
slower than UTF-16 but that does not justify a FACTOR OF FIVE (or more)
difference in speed. 

---------------------------
From xxxx Sat Apr 18 15:56:36 1998
To: xxxx
Subject: Reader performance
Cc: xxxx

You'd asked for numbers when I asked you about performance problems in
the Reader/Writer framework, and here are some ugly ones.

Each of these (single) runs read 1M chars of XML data (basically, this
was randomly generated UNICODE, with some XML framing) from files cached
in memory.  The "read" loop was "read a 1K block, then read 512 characters
one at a time" until the end of the data was reached.

    InputStreamReader, "UnicodeLittle"  16.34 ms (JDK 1.1.5)
    InputStreamReader, "UnicodeLittle"  17.94 ms (JDK 1.2 beta4)

    Custom "UnicodeLittleReader"         3.86 ms (JDK 1.1.5)
    Custom "UnicodeLittleReader"         3.77 ms (JDK 1.2 beta4)

    InputStreamReader, "UTF8"           24.82 ms (JDK 1.1.5)
    InputStreamReader, "UTF8"           25.63 ms (JDK 1.2 beta4)

The custom reader does the obvious stuff -- notably not allocating a
garbage character array on each character-at-a-time read, and adding
no superfluous method calling overhead for block reads.  Stuff that
the character converter object framework seemingly precludes.

If the character-at-a-time reads were removed, the times were rougly five
seconds to read the Unicode via InputStreamReader, eleven for UTF-8, and
about 10% faster for the custom reader.  That is, the custom reader is
still on the order of 25% faster.

For comparision, one XML parser, which doesn't use Readers because
of their performance, read ** AND PARSED ** the two files in only
two seconds more than the JDK's bulk read cases took ...
 
It's no wonder the people designing these APIs are steering away from
using the java.io.Reader classes.  Which is worrisome, since all XML
data is UNICODE.
 
- xxxx

<UPDATE>
<AUTHOR> david.brownell@Eng 1998-06-29 </AUTHOR>

Software REWRITTEN to use the bulk reads can get acceptable
performance even with this speed penalty.  In fact, I've
now done so and outperform the fastest of the third party
XML processing engines.

However, for other applications I still think this is a
pretty severe problem.   Not everyone has complete control
over all of their input data sources.

</UPDATE>
mtoplist