[H-GEN] Which is better?

Tue Apr 27 21:10:48 EDT 2004

> It turns of the same thing effects C#.  It is particularly noticable in
> in things like bringing up a form, an operation that does 1000's of
> memory allocations.  The interesting thing about C# is that you can't
> blame the GUI this time - its all done by Windows.  If you assume the
> JIT is as good as Java's (that possibly is a big assumption), then that
> only leaves one thing, doesn't it?  Memory allocation.

It leaves a heck of a lot more than memory allocation.  Firstly, if the 
GUI is entirely done by Windows then the memory allocation isn't a 
factor because it's all handled by the same code that is used if you 
write an application in C.  Secondly, the overhead of converting types 
from one language to another incurs a major performance hit.  In this 
case, you have to bridge between .Net's runtime and the native Windows 
libraries.  With Java you have to use JNI.  Both incur a performance 
penalty which would be large enough to explain the slower speed.  The 
only way to actually find out what is taking the time requires you to 
profile the application in question and find out.

>
>> If you haven't profiled it, you don't know what you're talking about.
>
> Excellent advice.  So lets profile it, shall we?  God knows, there have
> been enough people preaching about "premature" optimisation on this
> thread, so lets actually do some testing rather than just waffling on.
> At the very end of this post is a java program.  It runs some tests and
> times them.  Under java 1.4.1 this is its output:
>
>   dim unoptimised:     32212
>   dim optimised:        1482
>   format unoptimised: 343447
>   format optimised:    90374
>   Array unoptimised:    1570
>   Array harry:          1439
>   Array aj:             1456
>   Array david1:         1432
>   Array david2:         1457
>   Array david3:         1523

None of these times mean anything.  Firstly, you're benchmarking not 
profiling.  Profiling is the process of measuring objectively the time 
taken by a program in particular sections in order to identify 
performance problems.  Benchmarking on the other hand is the process of 
writing simple, non-real world tests and timing them.  Benchmarks are 
not useful, particularly with Java.

The reason benchmarks are so useless with Java and the reason your 
results are completely meaningless is because the behavior of the 
garbage collector is completely uncontrolled.  In your tests you are 
creating a whole bunch of objects but not controlling when or if the 
garbage collector will run.  Thus, at any point in any of the tests the 
garbage collector may kick in either for a partial sweep or a full 
sweep which will most likely dwarf the time taken actually running the 
test.  Further to this, the JIT may kick in at any time and completely 
change the actual code that's being run.  Also consider the fact that 
the first test to run must load the relevant classes, however this does 
not affect the other tests because they are already loaded.  Finally 
add in the fact that System.currentTimeMillis() is notoriously 
inaccurate.

The actual times you've received show no significant difference between 
any of array tests.  The format tests is really a test of doing a whole 
lot of string parsing compared to doing no string parsing (surprise, 
doing no string parsing is faster).  That leaves the Dimension tests.  
Now, if you were an optimizing compiler and came across the code:

>       for (int i = 0; i < 10000; i += 1)
> 	continue;
>     }

Would you run it?  No, this code will most likely be completely removed 
once the JIT kicks in on that piece of code.  Thus, you're comparing 
creating 10000 Dimension objects to doing nothing.  The result is no 
surprise.  Further to this, in the first test you are benchmarking both 
the time required to perform memory allocation (your intended 
benchmark) plus the time required to garbage collect.  Depending on the 
size of the young generation you may well have overflowed it and wound 
up using the older generation thus requiring a full GC run rather than 
the faster young generation run.

Worst of all, the dim unoptimized test must first initialize the entire 
AWT libraries before it can even begin, which would explain the entire 
difference in that test.

>   - Harries optimisation speed up by a factor of 14%.  This was
>     surprising to me.  It means the JIT was not doing any
>     global flow analysis.

In your specific example, on your specific platform, with your specific 
amount of RAM and with your specific CPU speed and architecture.  The 
fact that Java 1.4.2 adds further optimization strategies to the JIT 
should not go unnoticed either.  As mentioned above however, it most 
likely means that the garbage collector didn't kick in during Harry's 
test but left those objects to be cleaned up during a different test.

>   - Aj's optimisation was not quite as good as Harries, but still
>     an improvement.  The old C coding tricks still work in java
>     evidently.  Harry - if you still feel the urge to optimise
>     your for loops use this.  It doesn't harm readability and is
>     faster.

This most definitely does harm readability.  Most programmers will 
expect that the iteration of an array will occur incrementally, not 
decrementally.  By reversing the order of the iteration you break that 
expectation and make the programmer stop and think about exactly what 
the code does as they no longer recognise the pattern for a for loop 
iterating over an array.  Again, there is You're talking about a speed 
up of 0.1 of a second even if you consider the measurements above to be 
valid.

>   - The real surprise came from testing David's conjecture.  I would
>     of predicted that given a modern compiler there should be no
>     difference.  The intent of the code should be obvious to any
>     compiler.  Yet the results differ markedly.  This puts a dent
>     in the argument that optimisation should be left to the compiler.
>     I am not sure Java is doing any.  A far better argument is that
>     machines run so quickly nowadays that optimising your code is not
>     worth the effort.

There is no difference in the times.  Further to this, the point at 
which Java performs it's optimizations is undefined.  There is no way 
to tell if it would have optimized it further or not.  Assuming that 
compilers and JITs don't optimize code is just plain idiocy as it's 
clear that there are so many people employed (with Sun in this case) to 
implement these optimization routines.

> Anyway - the lesson I take from all this is that it is still worth
> applying coding patterns like AJ's.  Its gives you a minor speed up to
> your program at no expense in readability.

I would definitely disagree.  There is no indication that there is any 
performance benefit and by not using the most common form for a 
construct, you do incur an readability penalty.

> All bullshit aside memory
> allocation in Java is a relatively expensive operation, and like all
> expensive operations should be reduced where possible.

I continue to disagree strongly on this point.  None of these figures 
provide evidence to support that theory.  The only way to identify 
performance bottlenecks in your program is to take the real program and 
profile it.  As I mentioned previously, there has only been a single 
case in my experience where the bottle neck turned out to be memory 
related and in that case it was garbage collection rather than memory 
allocation.

>   Not at the
> expense of good design, of course, but then a good algorithm tend to
> avoid expensive operations so the two goals don't totally conflict.
> Unfortunately, the term "at the expense of" is subjective, and a feel
> for it only comes with experience.

If it doesn't need to be optimized, don't.  Write the clearest code you 
can, follow a good design and select your algorithms carefully.  It is 
possible to analyze the performance of algorithms in mathematical terms 
(O(n), O(n^2) etc).  Trying to micro-optimize code before you know 
where the real bottle necks are is a waste of your time and detracts 
from the readability of the code.

> By the by, I wrote the "dim unoptimised" test in C.  It ran 5,000 times
> faster than the Java version.  Source code appears at the end.

This doesn't surprise me at all.  Here's what the C code does:

* Check the RAM is available, if not crash.
* Mark this section of RAM as in use.
* Mark this section of RAM as not in use.

Here's what the Java version does:

* Check the RAM is available, if not:
     * Run a partial sweep of the garbage collector.  If there is still 
not enough ram, run a full GC.
     * Call finalize() on all objects which are marked for deletion, 
incurring the cost of a dynamic method lookup.
     * Check to see if this object is referenced with any weak or soft 
references.  If so, add the object to the queue for that reference.
     * Mark the section of RAM used by each object as not in use.
* Mark this section of RAM as in use.
* Create the Object structure within that RAM.
* Create the Dimension structure in RAM, including two ints and a long 
variable.
* Execute the static block in Dimension (including initializing the 
GUI):
    static {
         /* ensure that the necessary native libraries are loaded */
     Toolkit.loadLibraries();
         if (!GraphicsEnvironment.isHeadless()) {
             initIDs();
         }
     }
* Call the constructor Dimension().
* Call the constructor Dimension(int, int) - called by Dimension().
* Perform two integer assignments.
* Pass a reference to the new Object to the garbage collector or to a 
parent object so it can be collected later.
* When the loop is finished, mark the entire amount of RAM allocated to 
the JVM as not in use and exit.

At any point in that process the GC may kick in and perform the 
indented actions.  It becomes exceptionally clear that memory 
allocation is the least of your worries.

And just as I'm about to send this:

On Wed, 2004-04-28 at 10:14, Russell Stuart wrote:
>> By the by, I wrote the "dim unoptimised" test in C.  It ran 5,000 
>> times
>> faster than the Java version.  Source code appears at the end.
>
> Dam, dam, dam.  It didn't run 5,000 times faster.  It ran about 30%
> slower(!).  I forgot about the nested for loops in the Java version.

Says it all really. :)

Regards,

Adrian Sutton.
----------------------------------------------
Intencha "tomorrow's technology today"
Ph: 38478913 0422236329
Suite 8/29 Oatland Crescent
Holland Park West 4121
Australia QLD
www.intencha.com