best stategry: check before create

[prev] [thread] [next] [Date index for 2005/03/18]

From: Ofer Nave
Subject: best stategry: check before create
Date: 07:10 on 18 Mar 2005
Still a beginner, now using CDBI for a WikiNews project.  Need some 
strategy advice.

Scenario Part 1:  There is a page that lists new articles by title.  I 
am writing a crawler that will hit that page every so often (say, 5 
minutes) and parse the list of titles.  I then want to update a table by 
creating new article records for articles that I don't already have 
stored (in other words, articles added since the last crawl).  I will 
have numeric IDs (mysql auto_number) for faster table joins, but the 
title/date combo is the way I determine if the article is already in the DB.

Now, the 'find_or_create' method is very cool, but I have two concerns:
1) It will hit the database for every title.  I could avoid this by 
building a quick hash of all titles already in my DB and checking myself 
in perl, then calling 'create' for the new ones, but that's less CDBI-ish.
2) It uses the same data structure to both 'find' and 'create' - meaning 
you can't say "just match on title, but if you don't find the title, 
then create it with that title as well as this body and this other misc 
data".  Since it returns the object, I could then set the non-matchable 
fields afterwards, and then call 'update', but that's two DB hits 
instead of one.  Not a big deal, but I am wondering if there's a more 
elegant CDBI solution that I don't know of.

Scenario Part 2: Each article has several categories associated with 
it.  As I add articles, I will want to add their categorization, but 
first I will want to make sure the categories exist, and if not add 
them.  Really, this is the same problem as Scenario Part 1, but in this 
case, the less-efficient solution is far more less-efficient (if that 
makes sense).

Here's the schema and class construction as a reference:

---

CREATE TABLE article (
  id         INT UNSIGNED NOT NULL AUTO_INCREMENT,
  title      VARCHAR(250) NOT NULL,
  body       TEXT         NULL,
  pub_date   DATE         NOT NULL,

  PRIMARY KEY( id )
);

CREATE TABLE categorization (
  article    INT UNSIGNED NOT NULL,
  category   INT UNSIGNED NOT NULL,

  PRIMARY KEY( article, category )
);

CREATE TABLE category (
  id         INT UNSIGNED NOT NULL AUTO_INCREMENT,
  name       VARCHAR(250) NOT NULL,

  PRIMARY KEY( id )
);

---

package WikiNews::Article;
use base 'WikiNews::base';

WikiNews::Article->table( 'article' );
WikiNews::Article->columns( All => qw/ id title body pub_date / );

WikiNews::Article->has_many( categories => [ 'WikiNews::Categorization' 
=> 'category' ] );

###############################################################################
package WikiNews::Categorization;
use base 'WikiNews::base';

WikiNews::Categorization->table( 'categorization' );
WikiNews::Categorization->columns( Primary => qw/ article category / );

WikiNews::Categorization->has_a( article  => 'WikiNews::Article'  );
WikiNews::Categorization->has_a( category => 'WikiNews::Category' );

###############################################################################
package WikiNews::Category;
use base 'WikiNews::base';

WikiNews::Category->table( 'category' );
WikiNews::Category->columns( All => qw/ id name / );

WikiNews::Article->has_many( articles => [ 'WikiNews::Categorization' => 
'article' ] );

-ofer

best stategry: check before create
Ofer Nave 07:10 on 18 Mar 2005

Re: best stategry: check before create
Ofer Nave 07:14 on 18 Mar 2005

Re: best stategry: check before create
Jim Mozley 08:21 on 18 Mar 2005

Re: best stategry: check before create
Ofer Nave 08:34 on 18 Mar 2005

Re: best stategry: check before create
Perrin Harkins 15:58 on 18 Mar 2005

Re: best stategry: check before create
Ofer Nave 22:04 on 18 Mar 2005

Re: best stategry: check before create
=?ISO-8859-1?Q?Ask_Bj=F8rn_Hansen?= 07:22 on 18 Mar 2005

Re: best stategry: check before create
Ofer Nave 07:36 on 18 Mar 2005

Generated at 15:09 on 27 Mar 2005 by mariachi v0.52