Re: Re: Performance Query
[prev]
[thread]
[next]
[Date index for 2004/09/07]
On Tue, 2004-09-07 at 03:30, tom@xxxxxxxx.xxx wrote:
> My unerstanding is that it will be split into something like this (pesudocode):
> <pre>
> while(SELECT id, name, group_id, type_id, access_id FROM USER){
> SELECT name FROM GROUP g WHERE g.id=group_id
> SELECT name FROM TYPE t WHERE t.id=type_id
> SELECT name FROM ACCESS a WHERE a.id=access_id
> }
> </pre>
>
> The first SQL is 1 query with 1 overhead retriving X rows and the 2nd
> loop is 1+X*3 queries retrieving X+1 rows. My worries are that if I
> start retrieve a few thousand rows, instead of 1 query, I have 3000+
> queries to the database. Make this a web site with a few concurrent
> visitors and suddenly I am hitting the database with 10,000 queries?
> To me that is a performance issue.
When I worked at eToys.com, we had a very normalized database schema. I
built the model class for fetching product data from it. Starting with
the assumption that a few larger queries are faster than many small
queries (almost always true, at least with Oracle, which is what we
used), I wrote the queries to group the fetches as much as possible with
joins, just as you did with your non-CDBI query. We had many tables to
fetch data from and this worked pretty well.
Later, we needed to use the product data in search results and on
catalog browsing pages. On those pages, only a small subset of the
total product data was required. Fetching all of the data for the 10 or
20 products on a page was slow, so we changed the way things worked to
allow smaller chunks of data to be fetched separately. Eventually we
ended up with something like a lazy-loading system, except that it
required the client code to ask for specific subsets and thus was very
tightly coupled in a bad way. It also broke some of my grouped queries,
because they were doing extra joins that were only needed in the few
places that used that data, and slowing everything down for no good
reason.
Class::DBI solves this problem well by allowing you to specify column
groups and by lazy-loading related objects. Most of the time, you don't
actually need all of that related data, and fetching it just makes
things slow.
If you have a situation where you *know* that you will always use all of
that data, you can just do a complex query like the one you showed and
put the data into CDBI objects. I posted some code to do this on the
list a while back. If you want to do this and need help, just ask.
- Perrin