Re: blessing db data as utf8
[prev]
[thread]
[next]
[Date index for 2004/06/10]
On Thu, Jun 10, 2004 at 09:51:06AM +0100, Tim Bunce wrote:
> This isn't a good way to check for utf8:
>
> +int is_high_bit_set(char *val) {
> + while (*val++)
> + if (*val & 0x80) return 1;
> + return 0;
> +}
>
> because it make it hard for any latin-1 data to coexist.
> The perl guts probably has a function to check for well-formed utf8
> and that should be used instead.
This function is only used as an optimization. The actual decision is here:
+ if (imp_dbh->enable_utf8 &&
+ is_high_bit_set(col) && is_utf8_string(col, len))
+ SvUTF8_on(sv);
That said, bad things are going to happen sooner of later if a table has
both latin-1 and utf8 data.
But now that I think of it, I'm not sure the call to is_high_bit_set is
a good idea there, since SvUTF8_on() on a pure (7 bit) ASCII string shouldn't
do any harm and may even be more correct if the string is later concatenated
with utf8 data.
I'm not sure what the cleanest way would be to go about this in the long run
(whose responsibility it is to say what is and what isn't utf8) but the patch
addresses an immediate need for people with utf8-only data. Maybe this problem
would go away in mysql 4.1; I'd prefer not to wait.
--
Gaal Yahas <gaal@xxxxxx.xxx>
http://gaal.livejournal.com/
|
(message missing)
|