End result of Wiki-ish system design + final question

[prev] [thread] [next] [Date index for 2005/02/13]

From: ben syverson
Subject: End result of Wiki-ish system design + final question
Date: 03:44 on 13 Feb 2005
Thanks to all who responded for sharing their feedback and experience 
regarding my Wiki-like thing -- I think the result is a much better 
system! I have just one question, which I'll put up front -- at the 
end, I will describe my more-or-less "final" setup.

The question is: I have a mod_perl handler whose whole job is to 
examine a file, if necessary regenerate it, and then redirect Apache to 
an unrelated URI. Is it stupid to do it this way:
$regen = checkFile($file_id);
Other::Module::regen($file_id) if $regen;
$r->headers_out->set(Location => $some_other_file);
$r->status(Apache::HTTP_MOVED_TEMPORARILY);
return Apache::HTTP_MOVED_TEMPORARILY;

My concern is that if Other::Module::regen takes a long time, mod_perl 
has to wait around before returning the redirect. Since the redirect 
doesn't depend on the regenerated file, ideally, the handler would 
redirect immediately, and then regen if necessary in the background. My 
initial reaction was to make a system call to generic perl to do the 
regen, but that means firing up another interpreter and process. Even 
though it would be in the background, it still bugs me. Maybe the 
solution is to have 5 or 10 perl processes fire up and stay open as 
daemons, processing these background regen requests?


Anyway, here's the system:
To very briefly restate the problem--err, I mean, application: It's a 
Wiki-like system that must match terms without delimiters, and must be 
able to match sub-words (ie, "net" in "network"). A list of links 
between the nodes must also be maintained, along with votes on their 
popularity, which are counted once per link-hit per user per day. When 
a new node is added, any other nodes which contain its name should be 
updated "pretty soon" to display the link. Oh yeah: the users can enter 
an arbitrary URI in their prefs to use as the CSS file for the site. :)

I wound up with a hybrid log/live solution. Here are the main points:

1. The nodes themselves are static HTML pages served by Squid / thttpd.

2. The URI to the page's CSS file points to a mod_perl script, with one 
argument: the node id.

3. This mod_perl script looks up the URI to the user's CSS file, and 
also examines the node indicated by the arguments. If the node's last 
cache validation was before some new nodes were added, the node's text 
is searched for the new terms. If one of the terms appears, the HTML is 
regenerated (otherwise, the cache file is marked as up-to-date). Then 
the handler redirects to the chosen CSS file.

4. Once a day, right before the thttpd log files are rotated, a 
"normal" Perl script combs through the hits and builds up a hash of the 
unique link-hits by user. Then the database is incremented by those 
values.

I like this system, because it takes 100% of the content-serving load 
off of mod_perl, which only issues redirects. And if the mod_perl 
process takes a little while for some reason, at least the user can 
still see the HTML while they wait, sans-stylesheet. And log-file 
analysis is definitely the way to go for hit parsing.

Thanks again!

- ben

End result of Wiki-ish system design + final question
ben syverson 03:44 on 13 Feb 2005

Generated at 17:31 on 15 Feb 2005 by mariachi v0.52