Re: Logging user's movements
[prev]
[thread]
[next]
[Date index for 2005/02/05]
ben syverson wrote:
> The way the system works now, it is live. Every time a page is
> generated, it stores the most recent node ID along with the cached file.
> The next time the page is viewed, it checks to see what node is the most
> recent, and compares it against what was the newest when the file was
> cached. If they're the same, nothing has changed, and the cache file is
> served. If they're different, the system looks through the node
> additions that happened since the node was cached, and sees if the
> original node's text contains any of those node names. If it does, it
> regenerates, recaches and serves the page. Otherwise, it revalidates the
> cache file by storing the new most recent node ID with the old cache
> file, and serves it up.
This is not a bad approach, but there's room for refinement.
> The problem with this is that 99% of the time, the document won't
> contain any of the new node names, so mod_perl is wasting most of its
> time serving up cached HTML.
It sounds like the problem is not so much that mod_perl is serving
cached HTML, since that is easily improved with a reverse proxy server,
but rather that your entire cache gets invalidated whenever anyone
creates a new node, and mod_perl has to spend time regenerating pages
that usually don't actually need to be regenerated.
I think you could improve this a great deal just by changing your cache
invalidation system. When someone creates a new node, rather than
assuming that anything which was cached before the most recent addition
is now invalid, try to figure out which nodes are truly affected and
just invalidate their caches. The way I would do this is by adding
full-text search capabilities on your data, using something like
MySQL's text search columns which allow you to index new documents on
the fly rather than rebuilding the whole index. Then, when someone adds
a new node called "Dinosaurs", you do a search for all nodes that
contain the word "Dinosaurs" and invalidate their caches.
If you want to improve response time even more, you can have a cron job
that periodically rebuilds anything without a cached version. Since you
will be invalidating small sets of pages now when someone adds a new
node rather than the entire site, this will only need to operate on a
few pages each time it runs. Of course this is not practical if you
often have people adding new nodes that need to be linked to from
thousands of pages.
- Perrin
 |
(message missing)
|