End result of Wiki-ish system design + final question
[prev]
[thread]
[next]
[Date index for 2005/02/13]
Thanks to all who responded for sharing their feedback and experience
regarding my Wiki-like thing -- I think the result is a much better
system! I have just one question, which I'll put up front -- at the
end, I will describe my more-or-less "final" setup.
The question is: I have a mod_perl handler whose whole job is to
examine a file, if necessary regenerate it, and then redirect Apache to
an unrelated URI. Is it stupid to do it this way:
$regen = checkFile($file_id);
Other::Module::regen($file_id) if $regen;
$r->headers_out->set(Location => $some_other_file);
$r->status(Apache::HTTP_MOVED_TEMPORARILY);
return Apache::HTTP_MOVED_TEMPORARILY;
My concern is that if Other::Module::regen takes a long time, mod_perl
has to wait around before returning the redirect. Since the redirect
doesn't depend on the regenerated file, ideally, the handler would
redirect immediately, and then regen if necessary in the background. My
initial reaction was to make a system call to generic perl to do the
regen, but that means firing up another interpreter and process. Even
though it would be in the background, it still bugs me. Maybe the
solution is to have 5 or 10 perl processes fire up and stay open as
daemons, processing these background regen requests?
Anyway, here's the system:
To very briefly restate the problem--err, I mean, application: It's a
Wiki-like system that must match terms without delimiters, and must be
able to match sub-words (ie, "net" in "network"). A list of links
between the nodes must also be maintained, along with votes on their
popularity, which are counted once per link-hit per user per day. When
a new node is added, any other nodes which contain its name should be
updated "pretty soon" to display the link. Oh yeah: the users can enter
an arbitrary URI in their prefs to use as the CSS file for the site. :)
I wound up with a hybrid log/live solution. Here are the main points:
1. The nodes themselves are static HTML pages served by Squid / thttpd.
2. The URI to the page's CSS file points to a mod_perl script, with one
argument: the node id.
3. This mod_perl script looks up the URI to the user's CSS file, and
also examines the node indicated by the arguments. If the node's last
cache validation was before some new nodes were added, the node's text
is searched for the new terms. If one of the terms appears, the HTML is
regenerated (otherwise, the cache file is marked as up-to-date). Then
the handler redirects to the chosen CSS file.
4. Once a day, right before the thttpd log files are rotated, a
"normal" Perl script combs through the hits and builds up a hash of the
unique link-hits by user. Then the database is incremented by those
values.
I like this system, because it takes 100% of the content-serving load
off of mod_perl, which only issues redirects. And if the mod_perl
process takes a little while for some reason, at least the user can
still see the HTML while they wait, sans-stylesheet. And log-file
analysis is definitely the way to go for hit parsing.
Thanks again!
- ben
 |
End result of Wiki-ish system design + final question
ben syverson 03:44 on 13 Feb 2005
|