best stategry: check before create
[prev]
[thread]
[next]
[Date index for 2005/03/18]
Still a beginner, now using CDBI for a WikiNews project. Need some
strategy advice.
Scenario Part 1: There is a page that lists new articles by title. I
am writing a crawler that will hit that page every so often (say, 5
minutes) and parse the list of titles. I then want to update a table by
creating new article records for articles that I don't already have
stored (in other words, articles added since the last crawl). I will
have numeric IDs (mysql auto_number) for faster table joins, but the
title/date combo is the way I determine if the article is already in the DB.
Now, the 'find_or_create' method is very cool, but I have two concerns:
1) It will hit the database for every title. I could avoid this by
building a quick hash of all titles already in my DB and checking myself
in perl, then calling 'create' for the new ones, but that's less CDBI-ish.
2) It uses the same data structure to both 'find' and 'create' - meaning
you can't say "just match on title, but if you don't find the title,
then create it with that title as well as this body and this other misc
data". Since it returns the object, I could then set the non-matchable
fields afterwards, and then call 'update', but that's two DB hits
instead of one. Not a big deal, but I am wondering if there's a more
elegant CDBI solution that I don't know of.
Scenario Part 2: Each article has several categories associated with
it. As I add articles, I will want to add their categorization, but
first I will want to make sure the categories exist, and if not add
them. Really, this is the same problem as Scenario Part 1, but in this
case, the less-efficient solution is far more less-efficient (if that
makes sense).
Here's the schema and class construction as a reference:
---
CREATE TABLE article (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
title VARCHAR(250) NOT NULL,
body TEXT NULL,
pub_date DATE NOT NULL,
PRIMARY KEY( id )
);
CREATE TABLE categorization (
article INT UNSIGNED NOT NULL,
category INT UNSIGNED NOT NULL,
PRIMARY KEY( article, category )
);
CREATE TABLE category (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(250) NOT NULL,
PRIMARY KEY( id )
);
---
package WikiNews::Article;
use base 'WikiNews::base';
WikiNews::Article->table( 'article' );
WikiNews::Article->columns( All => qw/ id title body pub_date / );
WikiNews::Article->has_many( categories => [ 'WikiNews::Categorization'
=> 'category' ] );
###############################################################################
package WikiNews::Categorization;
use base 'WikiNews::base';
WikiNews::Categorization->table( 'categorization' );
WikiNews::Categorization->columns( Primary => qw/ article category / );
WikiNews::Categorization->has_a( article => 'WikiNews::Article' );
WikiNews::Categorization->has_a( category => 'WikiNews::Category' );
###############################################################################
package WikiNews::Category;
use base 'WikiNews::base';
WikiNews::Category->table( 'category' );
WikiNews::Category->columns( All => qw/ id name / );
WikiNews::Article->has_many( articles => [ 'WikiNews::Categorization' =>
'article' ] );
-ofer
|
best stategry: check before create
Ofer Nave 07:10 on 18 Mar 2005
|