View Issue Details

IDProjectCategoryView StatusLast Update
0000876LDMud 3.6Efunspublic2020-09-01 20:41
Reporteriago4 Assigned ToGnomi  
PrioritylowSeverityfeatureReproducibilityalways
Status resolvedResolutionfixed 
Fixed in Version3.6.3 
Summary0000876: Unicode character width efun would still be handy in 3.6
DescriptionI have been looking through the Unicode implementation in the 3.6x series and see that everything I wanted (reported as iago3) has been implemented seamlessly and simply, with one exception.

I believe there is still value in an efun that returns string width (in terms of how many onscreen columns the string takes up). This could be used for functions that attempt to fit or wrap text to a user's display. Unicode characters can take up more (or less) than one fixed-width space, so counting characters with sizeof() is not the same thing.

I had initially reported this with all of the other Unicode feature request bugs in bug#434, but it was closed with the others when the 3.6 Unicode plan was implemented. I have included a working sample implementation for an efun that accepts either a string or an array of int Unicode code points, and returns the width. I call it wcswidth() because that is the name of the underlying C function, but you could call it something simpler. If you want to avoid creating another new efun, you could add this functionality to the existing widthof() efun, which only accepts mappings right now.
Additional Information/*-------------------------------------------------------------------------*/
#include <wchar.h>

svalue_t *
f_wcswidth (svalue_t * sp)

/* EFUN wcswidth()
 *
 * int wcswidth(string str|int *)
 *
 * Returns the number of screen columns the given string or array of wide characters will take up
 */

{
    size_t charwidth, wcdest_width, i;
    wcdest_width=0;
    if (sp->type == T_STRING) {
        size_t wcdest_len, orig_len, len;
        wchar_t *wcdest, *current_wchar;
        char *orig_txt, *tmp_txt, **orig_txt_ptr;
        orig_len=mstrsize(sp->u.str);
        orig_txt=get_txt(sp->u.str);
        orig_txt_ptr=malloc(sizeof(char *));
        wcdest=malloc((orig_len+1)*sizeof(wchar_t));
        wmemset(wcdest,(wchar_t)0,orig_len+1);
        wcdest_len=0;
        len=0;
        tmp_txt=orig_txt;
        *orig_txt_ptr=orig_txt;
        while(len<orig_len) {
            wcdest_len+=mbsrtowcs(wcdest+wcdest_len, (const char **)orig_txt_ptr, orig_len-len, (mbstate_t *)NULL);
            len+=strlen(tmp_txt);
            if(len<orig_len) {
                len++;
                wcdest_len++;
                tmp_txt=orig_txt+len;
                *orig_txt_ptr=tmp_txt;
            }
        }
        current_wchar=wcdest;
        for(i=0;i<wcdest_len;i++) {
            if(*current_wchar=='\t') charwidth=8; /* tabs count as eight columns wide -- why? more accurate than zero! */
            else if(*current_wchar>=0xf2040 && *current_wchar<=0xf204f) charwidth=0; /* for tengwar PUA implementation */
            else charwidth=wcwidth(*current_wchar);
            if(charwidth>0) wcdest_width+=charwidth;
            current_wchar++;
        }
        free(wcdest);
        free(orig_txt_ptr);
        free_string_svalue(sp);
        put_number(sp, (p_int)wcdest_width);
    } else if(sp->type == T_POINTER) {
        vector_t *vec;
        svalue_t *svp;
        vec=sp->u.vec;
        for(i = 0, svp = vec->item; ++i <= VEC_SIZE(vec); svp++) {
            if(svp->type == T_NUMBER) {
                if(svp->u.number=='\t') charwidth=8; /* tabs count as eight columns wide -- why? more accurate than zero! */
                else if(svp->u.number>=0xf2040 && svp->u.number<=0xf204f) charwidth=0; /* for tengwar PUA implementation */
                else charwidth=wcwidth(svp->u.number);
                if(charwidth>0) wcdest_width+=charwidth;
            }
        }
        free_svalue(sp);
        put_number(sp,wcdest_width);
    }
    return sp;
} /* f_wcswidth() */
TagsNo tags attached.

Activities

iago4

2020-04-27 18:30

reporter   ~0002523

I should add that the character-specific code for Private Use Area Tengwar characters should probably be removed from a driver implementation and handled on the mudlib-side. The character-specific code for tabs should probably remain.

iago4

2020-07-16 15:45

reporter   ~0002533

Bugfix for the previous code (it checked for a size_t negative value that would never happen):

/*-------------------------------------------------------------------------*/
#include <wchar.h>

svalue_t *
f_wcswidth (svalue_t * sp)

/* EFUN wcswidth()
 *
 * int wcswidth(string str|int *)
 *
 * Returns the number of screen columns the given string or array of wide characters will take up
 */

{
    size_t charwidth, wcdest_width, i;
    wcdest_width=0;
    if (sp->type == T_STRING) {
        size_t wcdest_len, orig_len, len;
        wchar_t *wcdest, *current_wchar;
        char *orig_txt, *tmp_txt, **orig_txt_ptr;
        orig_len=mstrsize(sp->u.str);
        orig_txt=get_txt(sp->u.str);
        orig_txt_ptr=malloc(sizeof(char *));
        wcdest=malloc((orig_len+1)*sizeof(wchar_t));
        wmemset(wcdest,(wchar_t)0,orig_len+1);
        wcdest_len=0;
        len=0;
        tmp_txt=orig_txt;
        *orig_txt_ptr=orig_txt;
        while(len<orig_len) {
            wcdest_len+=mbsrtowcs(wcdest+wcdest_len, (const char **)orig_txt_ptr, orig_len-len, (mbstate_t *)NULL);
            len+=strlen(tmp_txt);
            if(len<orig_len) {
                len++;
                wcdest_len++;
                tmp_txt=orig_txt+len;
                *orig_txt_ptr=tmp_txt;
            }
        }
        current_wchar=wcdest;
        for(i=0;i<wcdest_len;i++) {
            if(*current_wchar=='\t') charwidth=8; /* tabs count as eight columns wide -- why? more accurate than zero! */
            else if(*current_wchar>=0xf2040 && *current_wchar<=0xf204f) charwidth=0; /* for tengwar PUA implementation */
            else charwidth=wcwidth(*current_wchar);
            if(charwidth!=(size_t)-1) wcdest_width+=charwidth;
            current_wchar++;
        }
        free(wcdest);
        free(orig_txt_ptr);
        free_string_svalue(sp);
        put_number(sp, (p_int)wcdest_width);
    } else if(sp->type == T_POINTER) {
        vector_t *vec;
        svalue_t *svp;
        vec=sp->u.vec;
        for(i = 0, svp = vec->item; ++i <= VEC_SIZE(vec); svp++) {
            if(svp->type == T_NUMBER) {
                if(svp->u.number=='\t') charwidth=8; /* tabs count as eight columns wide -- why? more accurate than zero! */
                else if(svp->u.number>=0xf2040 && svp->u.number<=0xf204f) charwidth=0; /* for tengwar PUA implementation */
                else charwidth=wcwidth(svp->u.number);
                if(charwidth!=(size_t)-1) wcdest_width+=charwidth;
            }
        }
        free_svalue(sp);
        put_number(sp,wcdest_width);
    }
    return sp;
} /* f_wcswidth() */

Gnomi

2020-07-16 15:51

manager   ~0002534

The current master (not released, yet) has a new efun text_width(), which calculates the on-screen width of a given text.

Gnomi

2020-09-01 20:41

manager   ~0002554

The new efun text_width() calculates the displayed width for a string.

Issue History

Date Modified Username Field Change
2020-04-26 06:57 iago4 New Issue
2020-04-27 18:30 iago4 Note Added: 0002523
2020-07-16 15:45 iago4 Note Added: 0002533
2020-07-16 15:51 Gnomi Note Added: 0002534
2020-09-01 20:40 Gnomi Assigned To => Gnomi
2020-09-01 20:40 Gnomi Status new => assigned
2020-09-01 20:40 Gnomi Project LDMud => LDMud 3.6
2020-09-01 20:41 Gnomi Status assigned => resolved
2020-09-01 20:41 Gnomi Resolution open => fixed
2020-09-01 20:41 Gnomi Fixed in Version => 3.6.3
2020-09-01 20:41 Gnomi Note Added: 0002554