Re: retrieve() hits db even if obj present in %Live_Objects.

[prev] [thread] [next] [Date index for 2004/06/24]

From: Todd Lorenz
Subject: Re: retrieve() hits db even if obj present in %Live_Objects.
Date: 20:44 on 24 Jun 2004

>From: Perrin Harkins <perrin@xxxx.xxx>
>To: Todd Lorenz <trlorenz@xxxxxxx.xxx>
>CC: cdbi-talk@xxxxxx.xxxxx.xxx
>Subject: Re: retrieve() hits db even if obj present in %Live_Objects.
>Date: Thu, 24 Jun 2004 14:42:55 -0400
>
>On Thu, 2004-06-24 at 14:23, Todd Lorenz wrote:
> > db connections should be made only when a
> > needed item is not found in the cache -- and should be closed as soon as 
>any
> > db lookup is complete. I'd like also to be able to disable lookups that 
>are
> > *not* id-based (to avoid db connections), and to disable inserts, 
>deletes,
> > and updates to the db.
>
>Those are pretty unusual requirements.  Class::DBI may be the wrong
>thing to use for this.  I would look at a simple hash-based thing like
>BerkeleyDB or Cache::FastMmap instead.
>

Thanks for the reply. I know you guys stay busy, and I really appreciate 
your time.

Now, then... I wouldn't have thought these requirements would be that 
unusual, really. There must be cases where people need to cache their data 
in a semi-permanent state, not only for speed, but to reduce connections. 
(Disabling the non-id-based lookups -- yeah, that's unusual. Just looking 
for ways to prevent users of my classes from getting connections where I 
don't want them to.)

Anyway, I was using BerkeleyDB originally, but just serializing the objects 
themselves wound up being much easier. (Either way, to populate the cache, 
I'd still have to connect, grab, and disconnect. And I realize that I could 
store a serialized hash of the object's attributes along with its class, and 
later use that hash to construct a new object; but why not just serialize 
the object?)

(I should note that I want CDBI objects versus straight hashes (if that's 
what you meant) because it's really nice to have a single, uniform interface 
to the data, no matter what part of the system you're on, where you can 
expect an instance of CDBI::Album and CDBI::Album::Cacheable to do basically 
similar things.)

My project is a large one, and my objects have a well-defined lifecycle. 
About half the code is devoted to using these objects in their early, 
volatile state, where they're basically being created and tweaked by a user 
through a UI; the other half is devoted to using them in a more stable, 
finalized form, where they are used heavily on a distributed system.

Regarding the highly convenient CDBI relationship model: CDBI is *almost* 
there for being able to do  $album->artist, where $album and $artist are 
both serialized, and completely separate from one another. The relationship 
logic is pretty much ready to go in CDBI; with a straight-hash BerkeleyDB 
cache implentation, I would have had to reproduce it if I really wanted it. 
And I really did.

> > _do_search() suffers from (and underlies) the same problem that 
>retrieve()
> > has -- Tim's patch will fix the problem with retrieve(), specifically, 
>as I
> > understand. Still, will _do_search() be able to recognize an id-only 
>lookup,
> > and try the cache before calling sql_Retrieve()? (Not that I care, 
>actually,
> > if retrieve() works and I can disable non-id-based searches)
>
>What kind of action would result in _do_search() being called for an
>ID-only lookup other than a retrieve() call?

Nothing to worry about, really. Just something dumb like search(id => 5). I 
brought it up only because I wasn't positive that retrieve() was the only 
method that would call search() internally with just ids.

> > Another problem is has_a, which causes foreign-key attributes to be 
>inflated
> > in place upon object instantiation, via _simple_bless. This is 
>undesirable
> > for my purpose, because I'd much rather have 10 frozen albums that all
> > "point" to one artist, rather than 10 frozen albums that also carry 10
> > frozen copies of the same artist.
>
>You just need to customize the serialization on your objects so that
>they don't store that data.  Check the Storable docs.

Talking about CDBI in the regular, non-serializing sense, here: I'd think 
you'd rather store the flat ids as foreign-key attributes rather than the 
inflated objects, anyway. You've got the object cache, now, so why not use 
it? It might be a cleaner implementation always to dip into the cache for 
id-based lookups; always to retrieve(), in fact. It would also yield a 
cleaner Dump of objects:

$VAR1 = bless( {

'id'    => '100',
'title' => 'Point of Know Return',
'label' => 'Kirschner',
'artist' => '20',

}, 'CDBI::Album' );

...where "artist" number 20 is, of course, "Kansas," which is a competely 
separate object, which you can grab with a simple retrieve() on your cache.

Currently, under has_a, you'd wind up with:

$VAR1 = bless( {

'id'    => '100',
'title' => 'Point of Know Return',
'label' => 'Kirschner',
'artist' => bless ( {

   'id' => '20',
   'name' => 'Kansas',
   'style' => 'prog. rock',
   }, 'CDBI::Artist' ),

}, 'CDBI::Album' );

...which, getting back to serializing, is not how you'd want to store your 
album, especially if you have lots of other albums by the same artist. You'd 
have to go out of your way to massage the object into a serializable form 
(whether through Storable or however), and it just doesn't seem like it 
ought to be necessary. After a bit of a tweak -- storing ids as attributes 
instead of inflated objects, and looking to the cache through a single 
lookup function for doing id-based lookups -- I think it might become easier 
to write caching layers for CDBI, whatever the implementation.

> > An unrelated-but-related problem: How to set up a CDBI::Album::BASE 
>class,
> > from which an industrial-strength CDBI::Album (your rank-and-file CDBI
> > class) and a lighter CDBI::Album::Cacheable class (containing some, but 
>not
> > all, of the methods/data available to CDBI::Album) derive. Some of the
> > current CDBI class setup routines make this kind of thing tricky. (So if
> > anyone's read this far, and is remotely interested, I can blather on 
>about
> > that, as well.)
>
>Go ahead, but please start a separate thread for it.
>
>- Perrin
>

retrieve() hits db even if obj present in %Live_Objects.
trlorenz 04:50 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tim Bunce 08:30 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tim Bunce 11:05 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Perrin Harkins 14:12 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tim Bunce 10:16 on 25 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tony Bowden 15:47 on 25 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tony Bowden 15:38 on 24 Oct 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tony Bowden 15:48 on 24 Oct 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tony Bowden 15:26 on 24 Oct 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Todd Lorenz 18:23 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Perrin Harkins 18:42 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Tim Bunce 09:22 on 25 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Todd Lorenz 20:44 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Takes Tea at Half Past Three 21:40 on 24 Jun 2004

Re: retrieve() hits db even if obj present in %Live_Objects.
Perrin Harkins 21:45 on 24 Jun 2004