Re: Logging user's movements
[prev]
[thread]
[next]
[Date index for 2005/02/06]
ben syverson wrote:
> That's not how it works. The entire cache IS invalidated when a new node
> is added.
What I'm saying is that you only invalidate the entire cache right now
because you have no way of telling which nodes are affected by the
change. If you had a full-text index, you could efficiently determine
which nodes are affected by a change and only invalidate them.
> But when you request one of the nodes, it checks to see what
> the new nodes are. It then searches the node text for those new node
> names. If there are no matches, it revalidates the cache file (without
> regenerating it), and serves it. Otherwise, it regenerates the node.
Yes, I understood all of that. That's what I meant by "regenerates."
I'm suggesting an approach that lets you skip revalidating, since the
cache would only be invalidated on documents that actually contained
matches.
> But if you have 1,000,000 documents (or even 10,000), do you really want
> to search through every single document every time a node is added?
Have you ever used an inverted word index? This is what full-text
search usually is based on. Searching a million documents efficiently
should be no big deal. You also only have to do this as part of the job
of creating a new node. You don't need to do it when serving files.
> Furthermore, do you really want every document loaded into the MySQL
> database?
I suggested MySQL as an easy starting point, since it allows incremental
updates to the text index. There are many things you could use, and
some will have more compact storage than others.
> My thinking is that if you have many documents, odds are only a small
> subset are being actively viewed, so it doesn't make sense to keep those
> unpopular documents constantly up-to-date...
You can use this approach for invalidation and still wait until the
pages are requested to regenerate them.
If the system is running fast enough and not having scalability
problems, there's no reason for you to get into making changes like what
I'm describing. I thought you were concerned about the time wasted by
revalidating unchanged documents, and this approach would eliminate that.
- Perrin
 |
(message missing)
|