Re: Logging user's movements

[prev] [thread] [next] [Date index for 2005/02/05]

From: Perrin Harkins
Subject: Re: Logging user's movements
Date: 23:38 on 05 Feb 2005

ben syverson wrote:
> The way the system works now, it is live. Every time a page is 
> generated, it stores the most recent node ID along with the cached file. 
> The next time the page is viewed, it checks to see what node is the most 
> recent, and compares it against what was the newest when the file was 
> cached. If they're the same, nothing has changed, and the cache file is 
> served. If they're different, the system looks through the node 
> additions that happened since the node was cached, and sees if the 
> original node's text contains any of those node names. If it does, it 
> regenerates, recaches and serves the page. Otherwise, it revalidates the 
> cache file by storing the new most recent node ID with the old cache 
> file, and serves it up.

This is not a bad approach, but there's room for refinement.

> The problem with this is that 99% of the time, the document won't 
> contain any of the new node names, so mod_perl is wasting most of its 
> time serving up cached HTML.

It sounds like the problem is not so much that mod_perl is serving 
cached HTML, since that is easily improved with a reverse proxy server, 
but rather that your entire cache gets invalidated whenever anyone 
creates a new node, and mod_perl has to spend time regenerating pages 
that usually don't actually need to be regenerated.

I think you could improve this a great deal just by changing your cache 
invalidation system.  When someone creates a new node, rather than 
assuming that anything which was cached before the most recent addition 
is now invalid, try to figure out which nodes are truly affected and 
just invalidate their caches.  The way I would do this is by adding 
full-text search capabilities on your data, using something like 
MySQL's text search columns which allow you to index new documents on 
the fly rather than rebuilding the whole index.  Then, when someone adds 
a new node called "Dinosaurs", you do a search for all nodes that 
contain the word "Dinosaurs" and invalidate their caches.

If you want to improve response time even more, you can have a cron job 
that periodically rebuilds anything without a cached version.  Since you 
will be invalidating small sets of pages now when someone adds a new 
node rather than the entire site, this will only need to operate on a 
few pages each time it runs.  Of course this is not practical if you 
often have people adding new nodes that need to be linked to from 
thousands of pages.

- Perrin

(message missing)

Logging user's movements
ben syverson 08:13 on 04 Feb 2005

Re: Logging user's movements
Leo Lapworth 09:36 on 04 Feb 2005

Re: Logging user's movements
Malcolm J Harwood 14:58 on 04 Feb 2005

Re: Logging user's movements
ben syverson 23:32 on 04 Feb 2005

Re: Logging user's movements
Christian Hansen 00:51 on 05 Feb 2005

Re: Logging user's movements
ben syverson 06:35 on 06 Feb 2005

Re: Logging user's movements
Perrin Harkins 23:38 on 05 Feb 2005

Re: Logging user's movements
ben syverson 06:06 on 06 Feb 2005

Re: Logging user's movements
Perrin Harkins 17:04 on 06 Feb 2005

Re: Logging user's movements
ben syverson 19:30 on 06 Feb 2005