[H-GEN] Which is better?

Thu Apr 29 06:50:37 EDT 2004

On 2004-04-29, Andrae Muys wrote:
> Russell Stuart wrote:

> >So I gave Harry a tip that would be helpful - beware that memory
> >allocation in Java is not a cheap operation, like say i = i + 1.  the
> >"new" operator is not a VM op code - it is a function call, and a fairly
> >expensive one at that.  This statement is not contentious.
>                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> I boggle that you have the chuptza to suggest that the very subject of 
> our disagreement is 'not contentious'.  Doubly so given I have already 
> posted one article stating the exact opposite.

OK boys, I think it's time to illustrate for the benefit of
anybody who happens to be following along just what is meant by
those of us who belong to the school that shuns optimisation.

> > God knows why.  It is an essential part of the
> >art of programming.  Your have to strive to write readable and tight
> >code.
> 
> And here we see the fundamental disconnect.  I never strive to achieve 
> "readable and tight code"; on the contary I strive to achieve "correct 
> and readable code"[1].  I will actively avoid doing *anything* that 
> might compromise correctness or readability in the name of optimisation. 
>  The only exception to this rule is to consider the complexity order of 
> any algorithm I choose, and even then I will try to take Kernighan and 
> Pike's advice: "try brute force first".

I'm with Andrae here, completely.

Today I've been working (in between other things) on a little C
program that I will get to run exactly once when it's correct.
Naturally, it's not a candidate for any kind of optimisation,
but it must make its single run correctly.

Since it has to be correct and since it will process about half
a million important files, I setup a test that involves about
3,500 files so I can verify it.  And, since this discussion has
involved several mentions of the value of profiling as a prelude
to any optimisation, I thought I'd profile this program.  Here
are the interesting parts[1] of the profiler's output:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 73.2       0.32     0.32     3442     0.09     0.09  _link [4]
 13.3       0.38     0.06     3453     0.02     0.02  _stat [5]
  6.2       0.41     0.03                             .mcount (57)
  3.8       0.42     0.02    13795     0.00     0.00  vfprintf [6]
  1.1       0.43     0.00    44799     0.00     0.00  __sfvwrite [8]
  0.9       0.43     0.00    79231     0.00     0.00  memcpy [10]
  0.7       0.44     0.00     3442     0.00     0.10  process_file [3]
  0.2       0.44     0.00    13795     0.00     0.00  localeconv [11]
  0.2       0.44     0.00    13768     0.00     0.00  __ultoa [13]
  0.2       0.44     0.00     3448     0.00     0.00  strspn [12]
  0.2       0.44     0.00        1     0.98   413.05  process_dir [2]
  0.0       0.44     0.00    44799     0.00     0.00  __sprint [9]
  0.0       0.44     0.00    17231     0.00     0.00  strcmp [25]
  0.0       0.44     0.00    13786     0.00     0.00  snprintf [7]
  0.0       0.44     0.00     3457     0.00     0.00  readdir [26]
  0.0       0.44     0.00       26     0.00     0.00  imalloc <cycle 1> [27]
  0.0       0.44     0.00       22     0.00     0.00  malloc_bytes <cycle 1> [28]
  0.0       0.44     0.00       21     0.00     0.00  ifree [29]
  0.0       0.44     0.00       17     0.00     0.00  _getdirentries [174]
  0.0       0.44     0.00       16     0.00     0.00  malloc [30]
  0.0       0.44     0.00       15     0.00     0.00  free [31]
  0.0       0.44     0.00       12     0.00     0.00  _mkdir [175]

Immediately we see that the system calls link(2) and stat(2)
take up 86.5% of the run time (and we also see that we processed
3442 files in less than half a second, so we don't need to speed
it up).  The link and stat cannot be avoided -- each file has to
be stat'd as part of the algorithm that determines its new name;
and each file has to be linked into a new directory, depending
on the results of the stat.  The next 6.2% is overhead for the
profiler, so we ignore that.

The printf family stuff is used to manufacture the new names
from templates.  Although it could be made to run faster by
using something less expensive, even an order of magnitude
improvement would only save us around 1 or 2 seconds when we
scale up to the 500,000 file run, and even I can't write a
replacement for printf in 2 seconds, so any gain there would be
a big loss.  And the rest of it obviously cannot be improved in
any meaningful way.

So, even if this thing was going to be a bit slow and I had
profiled it to look for a way to speed it up, my profiling has
demonstrated conclusively that no optimisation on earth can
possibly help.  If you haven't done this kind of thing, it's
time to start doing it.  Once you see the profiles of your code,
you'll have some facts to replace your opinions with.

Cheers, Greg

[1] There is actually another 700+ lines of output from the
    profiler, but the extract above has all we need for this
    analysis.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 249 bytes
Desc: not available
URL: <http://lists.humbug.org.au/pipermail/general/attachments/20040429/be55156b/attachment.sig>