Re: blessing db data as utf8

[prev] [thread] [next] [Date index for 2004/06/10]

From: Gaal Yahas
Subject: Re: blessing db data as utf8
Date: 09:18 on 10 Jun 2004
On Thu, Jun 10, 2004 at 09:51:06AM +0100, Tim Bunce wrote:
> This isn't a good way to check for utf8:
> 
> +int is_high_bit_set(char *val) {
> +    while (*val++)
> +      if (*val & 0x80) return 1;
> +    return 0;
> +}
> 
> because it make it hard for any latin-1 data to coexist.
> The perl guts probably has a function to check for well-formed utf8
> and that should be used instead.

This function is only used as an optimization. The actual decision is here:

+        if (imp_dbh->enable_utf8 &&
+            is_high_bit_set(col) && is_utf8_string(col, len))
+          SvUTF8_on(sv);

That said, bad things are going to happen sooner of later if a table has
both latin-1 and utf8 data.

But now that I think of it, I'm not sure the call to is_high_bit_set is
a good idea there, since SvUTF8_on() on a pure (7 bit) ASCII string shouldn't
do any harm and may even be more correct if the string is later concatenated
with utf8 data.

I'm not sure what the cleanest way would be to go about this in the long run
(whose responsibility it is to say what is and what isn't utf8) but the patch
addresses an immediate need for people with utf8-only data. Maybe this problem
would go away in mysql 4.1; I'd prefer not to wait.

        -- 
        Gaal Yahas <gaal@xxxxxx.xxx>
http://gaal.livejournal.com/

(message missing)

blessing db data as utf8
Gaal Yahas 21:07 on 08 Jun 2004

Re: blessing db data as utf8
Tony Bowden 05:23 on 09 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 11:12 on 09 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 11:44 on 09 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 20:28 on 09 Jun 2004

Re: blessing db data as utf8
Tim Bunce 08:51 on 10 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 09:18 on 10 Jun 2004

Re: blessing db data as utf8
Tim Bunce 18:01 on 10 Jun 2004

Re: blessing db data as utf8
Gaal Yahas 18:49 on 10 Jun 2004

Generated at 11:35 on 01 Dec 2004 by mariachi v0.52