0000117LDMud 3.3Efunspublic2011-09-21 13:27
Summary0000117: Lossy mode for convert_charset()
DescriptionWould it be possible to extend convert_charset() so that it optionally runs in a lossy mode, where characters, which are not convertible to the target charset, will be replaced by "?" or a specifyable string instead of function aborting completely?

We cannot use the function otherwise for e.g. UTF-8 to ISO-8859-1 conversions for text entered by users (e.g. say)
2004-09-20 22:46

reporter   ~0000170

The tricky part is to find out how many of the input characters mess up the conversion.

Various modes are imaginable:

  convert_charset(in, from-cs, to_cs, 1): if the conversion aborts on an unexpected sequence, the
first input character is removed from in, a '?' append to the result, and the conversion begins again. Repeat ad nauseatum.

  convert charset(in, from_cs, to_cs, fun): The function fun() receives the remaining in-string as argument, and returns an array consisting of ({ "string to add to the result", "new remaning in-string" }).


2004-10-14 14:54

reporter   ~0000202

Last edited: 2004-10-14 15:05

we had this problem with irc users switching between latin1 and utf8 charsets and return -1 instead of throwing an exception and handle the problem on 'mudlib' level.

brutal diff of strfuns.c (adjust return type in func_spec alike)... see next entry (edit does not seem to like attaching files)

--- projects/psyc/ldmud/3-3/src/strfuns.c 2004-04-28 05:57:59.000000000 +0200
+++ 3-3/src/strfuns.c 2004-10-14 21:49:11.000000000 +0200
@@ -483,22 +483,42 @@

             if (errno == EILSEQ)
+#if 0
                 error("convert_charset(): Invalid character sequence at index %ld\n", (long)(pIn - get_txt(in_str)));
                 /* NOTREACHED */
+ free_string_svalue(sp--);
+ free_string_svalue(sp--);
+ free_string_svalue(sp);
+ put_number(sp, -1);
                 return sp;

             if (errno == EINVAL)
+#if 0
                 error("convert_charset(): Incomplete character sequence at index %ld\n", (long)(pIn - get_txt(in_str)));
                 /* NOTREACHED */
+ free_string_svalue(sp--);
+ free_string_svalue(sp--);
+ free_string_svalue(sp);
+ put_number(sp, -1);
                 return sp;
+#if 0
             error("convert_charset(): Error %d at index %ld\n"
                  , errno, (long)(pIn - get_txt(in_str))
             /* NOTREACHED */
+ free_string_svalue(sp--);
+ free_string_svalue(sp--);
+ free_string_svalue(sp);
+ put_number(sp, -1);
             return sp;
         } /* if (rc < 0) */
     } /* while (in_left) */

edited on: 10-14-04 15:05


2005-04-01 06:13

reporter   ~0000358

Just wondered on the status of this enhancement request. Do you wait for feedback, did you forget it or did you decide to not change the behaviour?

Regarding the various modes you suggested: Both are ok, but the first one would definitely already solve my problem, while probably being simpler to implement.


2005-05-04 08:54

reporter   ~0000361

note: looking at the implementation of the iconv program (iconv_prog.c) that comes with glibc may be helpful, as it has an -c 'omit invalid characters from output' switch


2005-06-26 17:48

reporter   ~0000380

you can use convert_charset(your_string, "UTF-8", "ISO-8859-1//TRANSLIT") instead, iconv will replace unconvertable characters to "?" or something else depending on iconv implementation


2006-03-06 19:40

reporter   ~0000492

Hi.. I have tried //TRANSLIT and it didn't help at all. Sigh. :(
Now I'm using catch(), but since these failures happen rather often,
it is a costy solution. I'm using catch() with the nolog flag. Does
it skip line number calculation in that case?


2011-02-19 20:52

administrator   ~0001999

Out of curiosity: is it still the case, that appending "//TRANSLIT" does not work? And on which platforms/libiconv is that the case? At least on my system, it does work.
Another possibility might be also "//IGNORE".


2011-09-21 13:27

administrator   ~0002061

Since there was no other feedback: I believe, with current iconv() the desired effect can be achieved with //TRANSLIT and/or //IGNORE and we therefore don't need to change anything.
If not, please re-open or tell me.

