Re: retrieve() hits db even if obj present in %Live_Objects.
[prev]
[thread]
[next]
[Date index for 2004/06/24]
>From: Perrin Harkins <perrin@xxxx.xxx>
>To: Todd Lorenz <trlorenz@xxxxxxx.xxx>
>CC: cdbi-talk@xxxxxx.xxxxx.xxx
>Subject: Re: retrieve() hits db even if obj present in %Live_Objects.
>Date: Thu, 24 Jun 2004 14:42:55 -0400
>
>On Thu, 2004-06-24 at 14:23, Todd Lorenz wrote:
> > db connections should be made only when a
> > needed item is not found in the cache -- and should be closed as soon as
>any
> > db lookup is complete. I'd like also to be able to disable lookups that
>are
> > *not* id-based (to avoid db connections), and to disable inserts,
>deletes,
> > and updates to the db.
>
>Those are pretty unusual requirements. Class::DBI may be the wrong
>thing to use for this. I would look at a simple hash-based thing like
>BerkeleyDB or Cache::FastMmap instead.
>
Thanks for the reply. I know you guys stay busy, and I really appreciate
your time.
Now, then... I wouldn't have thought these requirements would be that
unusual, really. There must be cases where people need to cache their data
in a semi-permanent state, not only for speed, but to reduce connections.
(Disabling the non-id-based lookups -- yeah, that's unusual. Just looking
for ways to prevent users of my classes from getting connections where I
don't want them to.)
Anyway, I was using BerkeleyDB originally, but just serializing the objects
themselves wound up being much easier. (Either way, to populate the cache,
I'd still have to connect, grab, and disconnect. And I realize that I could
store a serialized hash of the object's attributes along with its class, and
later use that hash to construct a new object; but why not just serialize
the object?)
(I should note that I want CDBI objects versus straight hashes (if that's
what you meant) because it's really nice to have a single, uniform interface
to the data, no matter what part of the system you're on, where you can
expect an instance of CDBI::Album and CDBI::Album::Cacheable to do basically
similar things.)
My project is a large one, and my objects have a well-defined lifecycle.
About half the code is devoted to using these objects in their early,
volatile state, where they're basically being created and tweaked by a user
through a UI; the other half is devoted to using them in a more stable,
finalized form, where they are used heavily on a distributed system.
Regarding the highly convenient CDBI relationship model: CDBI is *almost*
there for being able to do $album->artist, where $album and $artist are
both serialized, and completely separate from one another. The relationship
logic is pretty much ready to go in CDBI; with a straight-hash BerkeleyDB
cache implentation, I would have had to reproduce it if I really wanted it.
And I really did.
> > _do_search() suffers from (and underlies) the same problem that
>retrieve()
> > has -- Tim's patch will fix the problem with retrieve(), specifically,
>as I
> > understand. Still, will _do_search() be able to recognize an id-only
>lookup,
> > and try the cache before calling sql_Retrieve()? (Not that I care,
>actually,
> > if retrieve() works and I can disable non-id-based searches)
>
>What kind of action would result in _do_search() being called for an
>ID-only lookup other than a retrieve() call?
Nothing to worry about, really. Just something dumb like search(id => 5). I
brought it up only because I wasn't positive that retrieve() was the only
method that would call search() internally with just ids.
> > Another problem is has_a, which causes foreign-key attributes to be
>inflated
> > in place upon object instantiation, via _simple_bless. This is
>undesirable
> > for my purpose, because I'd much rather have 10 frozen albums that all
> > "point" to one artist, rather than 10 frozen albums that also carry 10
> > frozen copies of the same artist.
>
>You just need to customize the serialization on your objects so that
>they don't store that data. Check the Storable docs.
Talking about CDBI in the regular, non-serializing sense, here: I'd think
you'd rather store the flat ids as foreign-key attributes rather than the
inflated objects, anyway. You've got the object cache, now, so why not use
it? It might be a cleaner implementation always to dip into the cache for
id-based lookups; always to retrieve(), in fact. It would also yield a
cleaner Dump of objects:
$VAR1 = bless( {
'id' => '100',
'title' => 'Point of Know Return',
'label' => 'Kirschner',
'artist' => '20',
}, 'CDBI::Album' );
...where "artist" number 20 is, of course, "Kansas," which is a competely
separate object, which you can grab with a simple retrieve() on your cache.
Currently, under has_a, you'd wind up with:
$VAR1 = bless( {
'id' => '100',
'title' => 'Point of Know Return',
'label' => 'Kirschner',
'artist' => bless ( {
'id' => '20',
'name' => 'Kansas',
'style' => 'prog. rock',
}, 'CDBI::Artist' ),
}, 'CDBI::Album' );
...which, getting back to serializing, is not how you'd want to store your
album, especially if you have lots of other albums by the same artist. You'd
have to go out of your way to massage the object into a serializable form
(whether through Storable or however), and it just doesn't seem like it
ought to be necessary. After a bit of a tweak -- storing ids as attributes
instead of inflated objects, and looking to the cache through a single
lookup function for doing id-based lookups -- I think it might become easier
to write caching layers for CDBI, whatever the implementation.
> > An unrelated-but-related problem: How to set up a CDBI::Album::BASE
>class,
> > from which an industrial-strength CDBI::Album (your rank-and-file CDBI
> > class) and a lighter CDBI::Album::Cacheable class (containing some, but
>not
> > all, of the methods/data available to CDBI::Album) derive. Some of the
> > current CDBI class setup routines make this kind of thing tricky. (So if
> > anyone's read this far, and is remotely interested, I can blather on
>about
> > that, as well.)
>
>Go ahead, but please start a separate thread for it.
>
>- Perrin
>