autogenerating has_a and has_many from foreign keys
[prev]
[thread]
[next]
[Date index for 2004/12/13]
I'm not sure why nobody has coded this yet (that I could see),
but I had to solve a problem over the last few days, and like
most problems I solve, I also wrote a column about it.
*** PLEASE NOTE THAT THIS DOES NOT CONSTITUTE A "PUBLISHING" OF THIS ARTICLE
*** THIS ARTICLE IS INCLUDED FOR REVIEW PURPOSES ONLY
*** Please send all comments in a timely manner to merlyn@xxxxxxxxxx.xxx
*** All other uses STRICTLY FORBIDDEN to the degree allowed by copyright
=head1 Linux Magazine Column 67 (Feb 2005)
[suggested title: "Creating Class::DBI relationships automatically"]
I've recently been contracted by Geekcruises
(C<http://www.geekcruises.com>) to update their online booking and
administration forms. They're using a simple CGI architecture, with
a lot of hand-rolled SQL to talk to their PostgreSQL database.
My first design decision was to adopt a more generic flexible web
abstraction, creating C<CGI::Prototype> as a result. They've allowed
me to upload that to the CPAN, and I'll be writing more about that in
the future. For the "model" part of the abstraction, I've settled in
on the wildly popular C<Class::DBI> framework. And that brings us to
today's problem and solution.
The existing database consists of 50 tables, with typical foreign key
mappings. For example, a C<person_cruise> record links a C<person>
with a C<cruise>. In C<Class::DBI>, setting up these foreign_key
relationships so that they could be easily accessed looked simple enough:
I'd just have to set up a bunch of statements like:
GC::DB::PersonCruise->has_a(person_id => 'GC::DB::Person');
GC::DB::Person->has_many(personcruises => 'GC::DB::PersonCruises');
GC::DB::PersonCruise->has_a(cruise_id => 'GC::DB::Cruise');
GC::DB::Cruise->has_many(personcruises => 'GC::DB::PersonCruises');
And that was fine, for the first 10 or 15 records. And then it
started to get boring. And then tedious. And then error-prone. I
started to think "hey, wait a minute! PostgreSQL knows these
relationships! Why am I reentering that information?" And then "Oh,
someone else must've done this already!"
When I started looking in the CPAN for some reusable solutions, I
quickly found C<Class::DBI::Loader>. Joy! This module solves the
problem of querying the database for all the records and their
columns, including determining the auto increment sequences. That's
good, because that was already starting to be frustrating as well.
But C<Class::DBI::Loader> did nothing about the foreign keys. So, as
I kept looking, I stumbled upon C<Class::DBI::Loader::Relationship>,
which started to look promising (I could use English instead of Perl),
but alas, even the example has a bug in it, and I just couldn't get it
to work like I wanted. Also, again, why am I specifying all these
foreign keys when the database itself already knows?
After staring at the C<DBD::Pg> manpage for about a half hour, and
trying out little snippets in the debugger, I was finally able to coax
the database into telling me about its foreign keys. Then I took some
of the ideas from C<Class::DBI::Loader::Relationship> to turn those
into C<has_a> and C<has_many> calls (including the idea of
automatically making plurals). And after a bit of hacking, I had my
automatic C<Class::DBI> generator.
However, I ran into one other snag. The database had used
PostgreSQL's table inheritance for the payment types. A specific
credit card, money order, or check payment inherited from a generic
payment record, so that an SQL query could ask for "any payment"
versus "any credit card payment" easily. Since these child record
types didn't have an explicit foreign-key relationship with the
linking records, I had to add a means of faking it in my code.
The result of all of this is in [listing one below]. Lines 1 and 94
define this as a package C<GC::DB>, which I will C<require> or C<use>
in my application.
Lines 4 through 90 create a lexical scope so that I can define
temporary variables during the setup that won't apply to any of the
later code.
Lines 6 and 7 set up the database connection, user, and password for
this database. They can be overridden for testing purposes by setting
C<@GC::DB::CONNECTION> before bringing in this file. More on that in
a future column as well.
Lines 9 and 10 bring in two CPAN modules. I'm using C<require> rather
than C<use> because I don't want any imports to pollute C<GC::DB>
class.
Line 12 turns on debug tracing mode. This was helpful when I was
developing, to see what relationships were actually being generated.
Lines 14 through 18 use C<Class::DBI::Loader> to create the 50
C<Class::DBI> classes reflecting the 50 records of the database. The
C<namespace> parameter gives C<GC::DB::> prefixes to these new
classes. This call also sets up all the C<sequence> calls so that key
generation works properly.
Lines 19 and 20 extract out the classes and tables just generated.
I'm presuming the two lists correlate, but have no assurance of that
from the docs. In fact, the docs don't even mention these two
methods.
Line 22 gets me a database handle by calling C<db_Main> on the first
class. Any class would have done, as they all share the same
database.
Lines 26 to 29 add our current class to the C<@ISA> list for each of
the created classes. I want to be able to define methods that all of
my created classes understand, and this was the fastest way to do
that.
Lines 33 and 34 set up the C<%class_of> hash, so that I can go from a
table name to a class name. Again, I'm presuming that the two lists
are correlated properly.
Line 36 patches up a plural that didn't come out quite right for me.
The class named C<GC::DB::PaidTo> was getting a linking method called
C<paidtoes>, which was humorous but expectedly hard to remember. So,
I just whacked the pluralizer on the head with an additional fact, and
it's all good.
Lines 38 to 60 define a "lexical subroutine" as a coderef stored in a
lexical variable. Again, I'm trying hard not to pollute the C<GC::DB>
namespace. This subroutine takes a table name, a linking column in
that table, and the other table to which the column links, and turns
those into the right C<has_a> and C<has_many> calls.
Lines 41 and 42 use the class name hash to get the corresponding class
names for the two table names.
Lines 44 through 48 provide a debugging trace of the C<has_a> call
that we're about to execute in line 49. This creates our forward
link (many to one).
Lines 51 and 52 determine the plural reverse link by first extracting
the "basename" of the class, then lowercasing that, then pluralizing
that.
Lines 54 through 58 provide a debugging trace of the C<has_many> call
that we're about to execute in line 59. This creates our reverse link
(one to many).
And now that we have that subroutine to do the dirty work, we just
need to figure out how to call it!
Lines 64 through 68 define a hash to map a table name into all of its
child tables, which apparently we cannot determine automatically from
the C<DBI> layer. Oh well.
Lines 70 through 89 loop through all the tables we've seen. Line
71 traces the particular table being examined.
Line 72 uses the C<DBI> abstraction to get the foreign keys for a
given table (indicated as the sixth positional parameter). The return
value from C<foreign_key_info> is a statement handle that can be used
with the normal C<DBI> methods to extract the data. If there are no
foreign keys in this table, C<$sth> is undefined, so we need to skip
the rest of the loop.
Lines 73 through 87 loop through the results from getting the details
of the foreign keys as an array of hashrefs. Each C<$res> is a
hashref for a particular foreign key mapping.
Lines 75 and 76 extract the linking column and the referenced table
from the result.
Lines 78 and 79 deal with the PostgreSQL inheritance by possibly
replacing a single table name with all of its related tables. Most
of the time, C<@tables> will consist of only C<$table>, but when
C<$table> was C<payment>, we'll instead have the entire list.
Lines 81 through 85 execute the C<$has_a_many> subroutine for each
combination of source table, foreign table, and column that we've
now noted.
Line 92 is where I will be adding other things for each class. For
example, if I need to add a method to the C<GC::DB::PersonCruise> record,
I'll merely add:
{
package GC::DB::PersonCruise;
sub my_extra_method {
...
}
}
to the mix, keeping all of my database model code in this one file.
The lesson I'm reminded here is that when things seem tedious, it's
time to get the computer to do more of the work. And I'm happy that I
have a language like Perl that can generate and execute code on the
fly, including late-binding the method calls to the objects and
classes created at run-time. Try that in Java. Not gonna happen.
Until next time, enjoy!
=head2 Listings
=1= package GC::DB;
=2= use strict;
=3=
=4= BEGIN {
=5=
=6= our @CONNECTION;
=7= @CONNECTION = qw(dbi:Pg:dbname=gcdb Luser Guessnot) unless @CONNECTION;
=8=
=9= require Class::DBI::Loader;
=10= require Lingua::EN::Inflect;
=11=
=12= my $DEBUG = 0;
=13=
=14= my $l = Class::DBI::Loader->new(dsn => $CONNECTION[0],
=15= user => $CONNECTION[1],
=16= password => $CONNECTION[2],
=17= namespace => __PACKAGE__,
=18= );
=19= my @classes = $l->classes;
=20= my @tables = $l->tables;
=21=
=22= my $dbh = $classes[0]->db_Main;
=23=
=24= ## add mixin of us
=25=
=26= for my $class (@classes) {
=27= no strict 'refs';
=28= push @{$class . "::ISA"}, __PACKAGE__;
=29= }
=30=
=31= ## set up the has_a/has_many from the foreign keys
=32=
=33= my %class_of;
=34= @class_of{@tables} = @classes;
=35=
=36= Lingua::EN::Inflect::def_noun('paidto','paidtos'); # PL() gets this wrong
=37=
=38= my $has_a_many = sub { # no lexical subs yet!
=39= my ($table, $column, $other) = @_;
=40=
=41= my $table_class = $class_of{$table};
=42= my $other_class = $class_of{$other};
=43=
=44= warn sprintf("%s->has_a(%s => %s)\n",
=45= $table_class,
=46= $column,
=47= $other_class,
=48= ) if $DEBUG;
=49= $table_class->has_a($column => $other_class);
=50=
=51= my ($table_class_base) = $table_class =~ /.*::(.*)/ or die;
=52= my $plural = Lingua::EN::Inflect::PL(lc $table_class_base);
=53=
=54= warn sprintf("%s->has_many(%s => %s)\n",
=55= $other_class,
=56= $plural,
=57= $table_class,
=58= ) if $DEBUG;
=59= $other_class->has_many($plural => $table_class);
=60= };
=61=
=62= ## deal with Pg inheritance
=63=
=64= my %inherits =
=65= (
=66= payment => [qw(payment payment_cc payment_check
=67= payment_money_order payment_wire)],
=68= );
=69=
=70= for my $table (@tables) {
=71= warn "$table:\n" if $DEBUG;
=72= if (my $sth = $dbh->foreign_key_info('','','','','',$table)) {
=73= for my $res (@{$sth->fetchall_arrayref({})}) {
=74=
=75= my $column = $res->{FK_COLUMN_NAME};
=76= my $other = $res->{UK_TABLE_NAME};
=77=
=78= my @tables = @{$inherits{$table} || [$table]};
=79= my @others = @{$inherits{$other} || [$other]};
=80=
=81= for my $table (@tables) {
=82= for my $other (@others) {
=83= $has_a_many->($table, $column, $other);
=84= }
=85= }
=86=
=87= }
=88= }
=89= }
=90= }
=91=
=92= ## other GC::DB::* things go here
=93=
=94= 1;