Re: blessing db data as utf8
[prev]
[thread]
[next]
[Date index for 2004/06/10]
On Thu, Jun 10, 2004 at 12:18:42PM +0300, Gaal Yahas wrote:
> On Thu, Jun 10, 2004 at 09:51:06AM +0100, Tim Bunce wrote:
> > This isn't a good way to check for utf8:
> >
> > +int is_high_bit_set(char *val) {
> > + while (*val++)
> > + if (*val & 0x80) return 1;
> > + return 0;
> > +}
> >
> > because it make it hard for any latin-1 data to coexist.
> > The perl guts probably has a function to check for well-formed utf8
> > and that should be used instead.
>
> This function is only used as an optimization. The actual decision is here:
>
> + if (imp_dbh->enable_utf8 &&
> + is_high_bit_set(col) && is_utf8_string(col, len))
> + SvUTF8_on(sv);
Ah, okay.
> That said, bad things are going to happen sooner of later if a table has
> both latin-1 and utf8 data.
I'm thinking more about different fields having either latin-1 or utf8 data.
> But now that I think of it, I'm not sure the call to is_high_bit_set is
> a good idea there, since SvUTF8_on() on a pure (7 bit) ASCII string shouldn't
> do any harm
It does add overhead (and is actually harmful on 5.6.x where many
utf8 bugs lurk) so the check is worthwhile.
> and may even be more correct if the string is later concatenated
> with utf8 data.
No, perl will do-the-right-thing.
> I'm not sure what the cleanest way would be to go about this in the long run
> (whose responsibility it is to say what is and what isn't utf8) but the patch
> addresses an immediate need for people with utf8-only data. Maybe this problem
> would go away in mysql 4.1; I'd prefer not to wait.
Something along these lines is needed. But it does require careful thought.
Tim.
|
(message missing)
|