Re: Using TEMP columns when the object is already in memory
[prev]
[thread]
[next]
[Date index for 2004/05/12]
--On Wednesday, May 12, 2004 3:37 PM -0400 Perrin Harkins <perrin@xxxx.xxx>
wrote:
> On Wed, 2004-05-12 at 14:36, Charles Bailey wrote:
>> There's also the larger question of why someone's calling _init() when
>> the object already exists.
>
> No one calls _init() directly, but if you do a search() or retrieve() or
> anything else that returns CDBI objects it will call _init() internally
> for each one. That's why I put the single instance code there, at Tim
> Bunce's suggestion.
Right -- I'm sorry; I was thinking of _create, rather than _init. Now I've
been straightened out, I'd recommend the inverse of your patch's behavior
with new data, though I don't see a good general solution. I'm considering
these cases:
- If the data passed to _init matches that in the cached object, not
overwriting data in the object is always correct, so I'll only consider
cases below where the data passed to _init is different from that
in the cached object.
- Arrival at _init through _create: Since the cached object already
exists, creating an identical object is an error (since CDBI requires
nonduplicated keys). Options include throwing an exception (though it'd
be relatively expensive to ascertain that there was in fact a difference
between the cache and the current call, and that the call came through
_create()), treating it as an implicit find_or_create() (i.e. not over-
writing the cached data), or treating it as an implicit update (i.e.
overwriting the cached data). While the first option seems to me the
best in theory, I think it's too expensive in practice. I can see good
arguments for either of the other options, but if you treat it as a
silent update, you need to be sure it gets back to the DB (or at least
complains if the DB isn't updated).
- Arrival at _init through sth_to_objects (search/retrieve): If the
cached object differs from the values in the DB, then an update is
pending (assuming the cache hasn't been addled), so the cached data
is "correct", and shouldn't be overwritten. We're still left with
the case that a search "succeeded" in finding an object it
shouldn't have, because the pending update didn't get to the DB yet.
- Arrival at _init through construct or a relative: This boils down to
one of the two arguments above, depending on the caller's intent. The
docs for construct and friends imply that its params should come from
the DB (select-like behavior), but there's no reason it can't be used
to generate objects de novo, as long as the application calls update
at some point.
Basically, if the caller's intent was to "find" an object, returning the
cached values more accurately reflects the "current state" of the object,
at the risk that its attributes may not match those used to search for the
object. If the intent was to "create" an object, then the entire
occurrence is a logic error. If the intent was to "create if not exists"
an object, then arguably the extant object is the "right" choice, unless
the "if not exists" select succeeded based on an attribute which has
changed, in which case it's an error to use the cache at all. (That's
still a race, of course, but avoiding it requires a lot of thinking about
whether to use the cache, and I think it's too expensive to impose that on
every fetch for the rare occurrences of such a collision.)
On a practical level, we might just minimize these collisions if
_attribute_store() and _attribute_set() invalidated the cached object.
There's still a race between a pending update and a select, but if the
application turns off autoupdate (or multiple applications/threads access
the class concurrently), it has to account for this. At least it doesn't
result in a select or create being surprised by values it didn't pass in,
or an update having its legs yanked out from under it by an ill-timed
select.
--
Regards,
Charles Bailey < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania