[H-GEN] Which is better?

Thu Apr 29 23:41:44 EDT 2004

On Thu, 2004-04-29 at 20:50, Greg Black wrote:
> OK boys, I think it's time to illustrate for the benefit of
> anybody who happens to be following along just what is meant by
> those of us who belong to the school that shuns optimisation.

Warning: what follows is not really a response to Greg's post.  It just
a ramble about personal history.

This use of prof brings back memories, old memories.  A long time ago, I
remember stumbling across an article about using prof by one the C/Unix
illumines.  Intrigued, I decided to give it a go myself.  I recall the
advice in the article was to optimise until disk I/O dominated.  I was
surprised at the time to discover how little I knew about where my
program was spending its time.  Just doing the exercise taught me a lot
about the compilers I was using and the environment I ran things under.

This was in the days of machines with less than 1M of memory, and
running under 8MHz.  As you might imagine most things needed speeding up
and I became a prof convert.  So it was a sad day then when I discovered
that prof was becoming less useful.  Ultimately, it became a complete
waste of time.

What changed was the systems I was working on.  They grew bigger and
more complex.  They went from 1000's lines, to 10's of thousands, to
100's of thousands.  And they had multiple streams of execution (similar
to co-routines).  Whereas before, if it appeared that a lot of time was
spent in printf, there might only be 10 or so places that call it so it
wasn't hard to figure out who was doing the damage - just like Greg did
in his example.  Now there were 100's or 1000's printf was called from,
and whats worse, there might be 100's of places those places were
called, and so on.  In fact, because of the modularity of the system
unless you knew the code well it was hard to even guess who was calling
what.  All this meant that determining why printf was at the top of the
list was dammed near impossible.

Also, the simple rule about "stop when disk I/O dominates" no longer
worked.  This was a commercial program.  The excuse "I know it runs
slowly, but its only because of the disk I/O" didn't wash.  If disk I/O
was a problem, it was time to do something differently and reduce the
disk I/O.

Not that we gave up on profiling easily.  We tried instrumenting the
programs by compiling to assembler and hacking the assembler about to
record more information.  We intercepted cret and csav, and wrote
programs to analyse the memory.  We never did discover a way that works,
nor did we find something we could buy.

The profiling problem looks deceptively simple.  But there is a hint as
to why we failed even in the profile Greg produced:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 73.2       0.32     0.32     3442     0.09     0.09  _link [4]
 13.3       0.38     0.06     3453     0.02     0.02  _stat [5]
  6.2       0.41     0.03                             .mcount (57)
  3.8       0.42     0.02    13795     0.00     0.00  vfprintf [6]
  1.1       0.43     0.00    44799     0.00     0.00  __sfvwrite [8]

See the line ".mcount".  Well, that is actually the profiler itself.  It
consumes run time too, of course.  If you try and track more
information, then the time you spend in the ".mcount" equivalent goes
up.  Notice it is already ranked 3 in this trace.  What worse, as the
size of the program goes up the time spent in each of its routines goes
down - reasonably enough as the total CPU time gets spread over more
routines. Push this too far and the time spent in the ".mcount"
dominates everything else.  You can try and remove the time spent in
".mcount", but this proves hard too.  It gets spread over many places,
the timer tick trick used by the profiler has a granularity.  Sometimes
it attributes the time to the wrong routine.  So when you deduct the
.mcount time - you might be deducting it from the wrong thing.  In the
end we never did get a result we could trust.

So we stopped using profiling.  But we didn't stop optimising. 
Optimising without a profiler is like debugging without a debugger.  Its
harder to do and takes more time, and you end up having to know and
understand your program a lot more to make it work.  You also require a
very solid understanding of the compiler and environment you are using. 
If you don't, you tend to waste a lot of time making changes that had no
effect.  Ironically, this understanding is something I gained from doing
profiling in the first place.

So now I don't profile.  The new languages I use - Java / C#, don't even
come with a profiler.  Odd, isn't it?  20 years ago the C compiler I
used came with "cc -p".  The newer languages don't.  This is sad.  I
obviously don't understand the languages I am using now near as well as
I understood C+Unix, and I can't profile either.