Re: Logging user's movements

[prev] [thread] [next] [Date index for 2005/02/06]

From: Perrin Harkins
Subject: Re: Logging user's movements
Date: 17:04 on 06 Feb 2005
ben syverson wrote:
> That's not how it works. The entire cache IS invalidated when a new node 
> is added.

What I'm saying is that you only invalidate the entire cache right now 
because you have no way of telling which nodes are affected by the 
change.  If you had a full-text index, you could efficiently determine 
which nodes are affected by a change and only invalidate them.

> But when you request one of the nodes, it checks to see what 
> the new nodes are. It then searches the node text for those new node 
> names. If there are no matches, it revalidates the cache file (without 
> regenerating it), and serves it. Otherwise, it regenerates the node.

Yes, I understood all of that.  That's what I meant by "regenerates." 
I'm suggesting an approach that lets you skip revalidating, since the 
cache would only be invalidated on documents that actually contained 
matches.

> But if you have 1,000,000 documents (or even 10,000), do you really want 
> to search through every single document every time a node is added?

Have you ever used an inverted word index?  This is what full-text 
search usually is based on.  Searching a million documents efficiently 
should be no big deal.  You also only have to do this as part of the job 
of creating a new node.  You don't need to do it when serving files.

> Furthermore, do you really want every document loaded into the MySQL 
> database?

I suggested MySQL as an easy starting point, since it allows incremental 
updates to the text index.  There are many things you could use, and 
some will have more compact storage than others.

> My thinking is that if you have many documents, odds are only a small 
> subset are being actively viewed, so it doesn't make sense to keep those 
> unpopular documents constantly up-to-date...

You can use this approach for invalidation and still wait until the 
pages are requested to regenerate them.

If the system is running fast enough and not having scalability 
problems, there's no reason for you to get into making changes like what 
I'm describing.  I thought you were concerned about the time wasted by 
revalidating unchanged documents, and this approach would eliminate that.

- Perrin

(message missing)

Logging user's movements
ben syverson 08:13 on 04 Feb 2005

Re: Logging user's movements
Leo Lapworth 09:36 on 04 Feb 2005

Re: Logging user's movements
Malcolm J Harwood 14:58 on 04 Feb 2005

Re: Logging user's movements
ben syverson 23:32 on 04 Feb 2005

Re: Logging user's movements
Christian Hansen 00:51 on 05 Feb 2005

Re: Logging user's movements
ben syverson 06:35 on 06 Feb 2005

Re: Logging user's movements
Perrin Harkins 23:38 on 05 Feb 2005

Re: Logging user's movements
ben syverson 06:06 on 06 Feb 2005

Re: Logging user's movements
Perrin Harkins 17:04 on 06 Feb 2005

Re: Logging user's movements
ben syverson 19:30 on 06 Feb 2005

Generated at 22:41 on 06 Feb 2005 by mariachi v0.52