Re: Using TEMP columns when the object is already in memory

[prev] [thread] [next] [Date index for 2004/05/12]

From: Charles Bailey
Subject: Re: Using TEMP columns when the object is already in memory
Date: 21:08 on 12 May 2004
--On Wednesday, May 12, 2004 3:37 PM -0400 Perrin Harkins <perrin@xxxx.xxx> 
wrote:

> On Wed, 2004-05-12 at 14:36, Charles Bailey wrote:
>> There's also the larger question of why someone's calling _init() when
>> the  object already exists.
>
> No one calls _init() directly, but if you do a search() or retrieve() or
> anything else that returns CDBI objects it will call _init() internally
> for each one.  That's why I put the single instance code there, at Tim
> Bunce's suggestion.

Right -- I'm sorry; I was thinking of _create, rather than _init.  Now I've 
been straightened out, I'd recommend the inverse of your patch's behavior 
with new data, though I don't see a good general solution.  I'm considering 
these cases:

  - If the data passed to _init matches that in the cached object, not
    overwriting data in the object is always correct, so I'll only consider
    cases below where the data passed to _init is different from that
    in the cached object.
  - Arrival at _init through _create:  Since the cached object already
    exists, creating an identical object is an error (since CDBI requires
    nonduplicated keys).  Options include throwing an exception (though it'd
    be relatively expensive to ascertain that there was in fact a difference
    between the cache and the current call, and that the call came through
    _create()), treating it as an implicit find_or_create() (i.e. not over-
    writing the cached data), or treating it as an implicit update (i.e.
    overwriting the cached data).  While the first option seems to me the
    best in theory, I think it's too expensive in practice.  I can see good
    arguments for either of the other options, but if you treat it as a
    silent update, you need to be sure it gets back to the DB (or at least
    complains if the DB isn't updated).
  - Arrival at _init through sth_to_objects (search/retrieve):  If the
    cached object differs from the values in the DB, then an update is
    pending (assuming the cache hasn't been addled), so the cached data
    is "correct", and shouldn't be overwritten.  We're still left with
    the case that a search "succeeded" in finding an object it
    shouldn't have, because the pending update didn't get to the DB yet.
  - Arrival at _init through construct or a relative: This boils down to
    one of the two arguments above, depending on the caller's intent.  The
    docs for construct and friends imply that its params should come from
    the DB (select-like behavior), but there's no reason it can't be used
    to generate objects de novo, as long as the application calls update
    at some point.

Basically, if the caller's intent was to "find" an object, returning the 
cached values more accurately reflects the "current state" of the object, 
at the risk that its attributes may not match those used to search for the 
object.  If the intent was to "create" an object, then the entire 
occurrence is a logic error.  If the intent was to "create if not exists" 
an object, then arguably the extant object is the "right" choice, unless 
the "if not exists" select succeeded based on an attribute which has 
changed, in which case it's an error to use the cache at all.  (That's 
still a race, of course, but avoiding it requires a lot of thinking about 
whether to use the cache, and I think it's too expensive to impose that on 
every fetch for the rare occurrences of such a collision.)

On a practical level, we might just minimize these collisions if 
_attribute_store() and _attribute_set() invalidated the cached object. 
There's still a race between a pending update and a select, but if the 
application turns off autoupdate (or multiple applications/threads access 
the class concurrently), it has to account for this.  At least it doesn't 
result in a select or create being surprised by values it didn't pass in, 
or an update having its legs yanked out from under it by an ill-timed 
select.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

Re: Using TEMP columns when the object is already in memory
Charles Bailey 21:08 on 12 May 2004

Generated at 11:34 on 01 Dec 2004 by mariachi v0.52