Re: blessing db data as utf8

[prev] [thread] [next] [Date index for 2004/06/10]

From: Tim Bunce
Subject: Re: blessing db data as utf8
Date: 18:01 on 10 Jun 2004
On Thu, Jun 10, 2004 at 12:18:42PM +0300, Gaal Yahas wrote:
> On Thu, Jun 10, 2004 at 09:51:06AM +0100, Tim Bunce wrote:
> > This isn't a good way to check for utf8:
> > 
> > +int is_high_bit_set(char *val) {
> > +    while (*val++)
> > +      if (*val & 0x80) return 1;
> > +    return 0;
> > +}
> > 
> > because it make it hard for any latin-1 data to coexist.
> > The perl guts probably has a function to check for well-formed utf8
> > and that should be used instead.
> 
> This function is only used as an optimization. The actual decision is here:
> 
> +        if (imp_dbh->enable_utf8 &&
> +            is_high_bit_set(col) && is_utf8_string(col, len))
> +          SvUTF8_on(sv);

Ah, okay.

> That said, bad things are going to happen sooner of later if a table has
> both latin-1 and utf8 data.

I'm thinking more about different fields having either latin-1 or utf8 data.

> But now that I think of it, I'm not sure the call to is_high_bit_set is
> a good idea there, since SvUTF8_on() on a pure (7 bit) ASCII string shouldn't
> do any harm

It does add overhead (and is actually harmful on 5.6.x where many
utf8 bugs lurk) so the check is worthwhile.

> and may even be more correct if the string is later concatenated
> with utf8 data.

No, perl will do-the-right-thing.

> I'm not sure what the cleanest way would be to go about this in the long run
> (whose responsibility it is to say what is and what isn't utf8) but the patch
> addresses an immediate need for people with utf8-only data. Maybe this problem
> would go away in mysql 4.1; I'd prefer not to wait.

Something along these lines is needed. But it does require careful thought.

Tim.

(message missing)

blessing db data as utf8
Gaal Yahas 21:07 on 08 Jun 2004

Re: blessing db data as utf8
Tony Bowden 05:23 on 09 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 11:12 on 09 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 11:44 on 09 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 20:28 on 09 Jun 2004

Re: blessing db data as utf8
Tim Bunce 08:51 on 10 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 09:18 on 10 Jun 2004

Re: blessing db data as utf8
Tim Bunce 18:01 on 10 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 18:49 on 10 Jun 2004

Generated at 11:35 on 01 Dec 2004 by mariachi v0.52