[H-GEN] C code.
Anthony Towns
aj at humbug.org.au
Mon Oct 27 09:27:46 EST 1997
-----BEGIN PGP SIGNED MESSAGE-----
On Mon, 27 Oct 1997, Andrae Muys wrote:
> Well after that last C post I thought I'd solicit comments on my own
> coding style (constructive please, flames < /dev/random > /dev/null).
What fun! :)
> Anyone who's interested can have a look at
> http://www.uq.edu.au/~cmamuys/source_downloads/hash.[ch] constructive
> critism appreciated.
Personally, I'd change HASH_REPORT() to something like:
#ifdef HASH_DEBUG
#define HASH_REPORT(str) fprintf(stderr, "DEBUG: %s\n", str )
#else
#define HASH_REPORT(str) (void)0
#endif
1) Using %s is a small gain: it lets you use non-literal strings, and
it also means you don't need "DEBUG: " stored n times in the
program/memory, only once. It's only loss is that it won't give you a
compile time error for HASH_REPORT(1), although adding the -Wall -W
arguments to gcc will give you a warning. Which is why C++ prefers
inline functions. I'm under the impression C9X (the up and coming
review of C) will allow inline functions, among other things like
"for(int i=...)".
2) Removing the trailing ";" makes usage more natural:
HASH_REPORT(str)
becomes no more valid than
printf("%s", str)
3) The "\n" was added because you're really going to have a newline
each time: having `DEBUG: xyzzyDEBUG: foo' would look pretty weird.
4) Including the "(void)0" null statement just seems more tidy.
Wrt your "typedef struct _IntHashTable IntHashTable;", I personally
prefer
typedef struct IntHashTable {
/* definition */
} IntHashTable;
as my declaration. It doesn't introduce any unwieldy underscores, and
since the namespaces are treated separately also don't cause a
problem.
If you're going for C++ compatability, though, that won't (IIRC) work.
You could do:
#ifndef __cplusplus
typedef
#endif
struct xyzzy { ... }
#ifndef __cplusplus
xyzzy
#endif
;
for absolute compatability, but personally I just separate the two
languages totally and use C idioms in C and C++ idioms in C++.
In declarations, I tend to put the "*" next to the type rather than
the name. ie I use "char* pch;" instead of "char *pch;"
My way works if you think about it as relating to the type of 'pch'
and read it as "pointer": "pch is a char pointer".
The other way works even better, in that it matches C's precedence
rules correctly (eg "char* a, b" won't make to "char*"s), and "char
*pch" can be sensibly read "pch, when dereferenced, is a char".
This is _much_ more in keeping with C, but for me just doesn't say
what I want to say in the way I want to say it.
In your module description, rather than say "it permits any void* to
be stored", I would have said "it permits pointers to any data to be
stored". I misunderstood that the first time round expecting it to act
more like qsort and work with the actual objects pointed to rather
than just the pointers.
It may or may not be worth letting "intHashDel" accept data==NULL --
that would allow you to just get the job done if the hash table was
just used as a fast means of getting at data you already know about
elsewhere and not the only storage of it.
There's no indication of when "ERROR" might be returned or what should
be done about it in the header file.
Okay.
You return ERROR for precondition violations. I'd use assert(): it
gets ignored if you #define NDEBUG so you don't waste time once you've
got your program working properly; and it prints out the exact error
when the code dies.
The other uses of ERROR (when malloc fails) seem perfectly sensible
OTOH.
I would have called "IntHashErrorStrings" "IntHashResponseStrings",
and probably left off the "s", as well so it reads
return IntHashErrorString[1];
Also, there's no real point to the [4] in the definition of IHES's in
the C file. It's just a nuisance when you decide to add another.
In C, I tend not to use the StudlyCaps naming convention (or the
MaimingConvention as I called it in one of my C assignments...) but
rather to use all lower case, with words separated by underscores. I
find the fits in with the standard library much more nicely.
There no need to cast malloc's return value. It's no big deal, but
without the case it will give you an error if you neglect to include
stdlib.h, and (void*)'s are _guaranteed_ to be automatically typecast
correctly for you.
In C, at least. C++ is different, but you'd be using new and delete
there anyway, and you'd get a warning/error for not having a prototype
for malloc anyway.
The description in your header file "stored as a mod/bucket" didn't
make sense to me. You latter state that "the array is hashed as key %
size". This might have been nicer in the description at the top to
clarify the modules description for the clueless.
In the description of intHashDelEntry, you should mention that a NULL
pointer is returned if there is no entry associated with the key.
You might like to consider using RCS or CVS for your revisioning, btw.
Is there an introductory talk scheduled for either/both of these, does
anyone know?
Comments on the meanings of fields in your structures might be nice:
What's the deal with IntHashTable.size and .len, for example? Possibly
"size/len" and "used" would be better names? Similarly for "touched".
fprintf'ing on your malloc fails might not be the nicest thing to do:
returning an error status and letting the caller cope seems more
user-oriented (I've had NetWare programs telling me "malloc failed!".
So? What good does that do me? It's not like there's not enough memory
in the computer -- there's just not enough in the heap. And it's not
like Novell are nice enough to give me their source so I can
recompile... Ah. Free Software)
Using HASH_REPORT to note where the problem's happening might make
sense though.
I notice you use calloc to allocate the hash table itself, then go to
the effort to intialize it by hand immediately after. This seems
wasteful. Hmmm... Since you rely on the NULL's being there switching
to malloc would be sensible, IMO.
(and while I'm guessing you know this, others might not, so: no, you
can't just use calloc and assume all the pointers will be valid NULL
pointers. See the clc FAQ for details, but in short the NULL pointer
and an array of sizeof(void*) '\0's are specifically not guaranteed to
be the same.)
The debugging info when searching the list should have #ifdef
HASH_DEBUG around them.
prev->touched doesn't ever seem to be used? It's incremented a bit,
but that's it. I assume this is debugging information? #ifdef
HASH_DEBUG?
The "find an entry" thing seems overly complicated... I don't think it
actually _is_, but it certainly seems that way.
BTW, you have:
if ( entry->key < key ) {
...
} else if ( entry->key == key ) {
...
} else if ( entry->key > key ) {
...
}
I prefer to code the last condition as
} else { /* entry->key > key */
I notice you do the same elsewhere.
Hmmm.
I would probably have arranged the loop the other way around, so the
case "entry->key < key" is handled last, and since the others both
return, it wouldn't have needed an else, and hence would have been at
the same level as the loop body. As it seemes like the general case
for the loop body, this would seem a better way of writing it. ie:
while ( entry != NULL ) {
if ( entry->key == key ) {
...
return EXIST;
} else if ( entry->key > key ) {
...
return NEXIST;
}
/* find the next one */
prev = entry;
entry = entry->next;
}
You're also doing malloc's work twice. I'm wondering if a loop like:
while ( entry != NULL && entry->key < key ) {
/* find the next one */
prev = entry;
entry = entry->next;
}
if ( entry == NULL || entry->key > key ) {
/* insert. next entry is "entry" */
} else {
/* replace */
}
wouldn't be even better.
IMO going for minimal repetition of code (ie none, this includes
the conditions in if's), and minimal use of variables (ie not have
state variables) often results in the best code, as long as you don't
resort to misusing the looping structure entirely. (eg having a
condition and a break statement in your loop)
Following that rule, we note that the above checks for (entry == NULL)
twice (the key comparison is okay: you need to differentiate <, >, and
== which requires two tests). A data structure that is terminated by a
maximal key /may/ work better, but getting a working termination can
be hellishly difficult when every value is valid somewhere.
Hey. I like that. The #ifdef TEST around main() is a very nice trick.
You might like to conditionally include some printing functions to
display the contents of the hash table. Something like:
void intHashPrint( IntHashTable *table, void
(*printfn)(void*) );
(that is, it takes a hash table that it doesn't modify and a function
that will display an object stored in the table)
But apart from those one or two little things, it seems pretty cool.
Cheers,
aj
- --
Anthony Towns <aj at humbug.org.au> <http://student.uq.edu.au/~s343676/>
I don't speak for anyone save myself. PGP encrypted mail preferred.
``Do. Not. Taunt. The. Happy. Fun. Novak.
It will get gruesome, quickly.'' -- John Dilick, rasfwr-j
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: ascii
Comment: Key available at http://student.uq.edu.au/~s343676/aj_key.asc
iQCVAwUBNFSk5+RRvX9xctrtAQG/ZgP7BMry7VIJRTFTweZcvp6kChshjtAUKRNo
/j5HhE5Nsr5I7jM/KW/SYHIGrOq3R6fVW90CKwDe/gtwp73giQz0FGaAHWrQ9sg6
XQoDIjbXTV6miwGPVvgETgK7SNqsoignaSfGN6U8i3njgoeuA8L6RdiQjlIuGLvz
hEF4C6HPsOU=
=7hFw
-----END PGP SIGNATURE-----
----------------------- HUMBUG General List --------------------------------
echo "unsubscribe general" | mail majordomo at humbug.org.au # To Unsubscribe
More information about the General
mailing list