View Issue Details

IDProjectCategoryView StatusLast Update
0000478LDMudEfunspublic2012-02-17 20:17
Reporterbubbs Assigned ToGnomi  
PrioritynormalSeveritytweakReproducibilityalways
Status closedResolutionfixed 
Summary0000478: option to prevent regreplace() replacing '&' in replaced text
Descriptionregreplace() performs a number of simple substitutions on the replacement text;

from the man page:
 \1 - \9 replaced with the matched pattern in
                 the first - the ninth pair of ()
 & replaced with the whole matched pattern
 \& replaced with &
 \\ replaced with \
                 (required for multiple \ and in combination
                  with the listed patterns; you don't need
                  to escape every single \)

It would be handy, on occasion, to be able to prevent this substitution. Adding a new bit flag to the existing bit-flag argument to regreplace() would seem logical.

Steps To Reproduce  regreplace("echo $*", \\$\\*", "wibble &", 1)

returns
  "echo wibble $*"
Additional InformationTo prevent this unrequired change from happening, I am currently performing an additional regreplace() on the target regreplace() replacement string. In the above case;

  regreplace("echo $*", "\\$\\*",
    regreplace("wibble &", "&|\\\\&|\\\\\\\\|\\\\[1-9]",
      (: switch($1) {
           case "&" : return "\\\\&";
           case "\\&" : return "\\\\\\\\&";
           case "\\\\": return "\\\\\\\\\\\\\\\\";
           default : return "\\\\\\" +$1;
         }
      :), 1),
    1)

returns
  "echo wibble &"
TagsNo tags attached.
External Data (URL)

Activities

menaures

2006-10-07 20:19

reporter   ~0000517

For the UNItopia Mudlib I wrote an escape_string() Simul Efun which can escape patterns for get_dir, regexp, regreplace, and pcre. Although it means doing an additional regreplace call for the escaping, it has an advantage as compared to a simple bitflag, as sometimes you only want a part of the pattern to be escaped, not the whole thing. Additionally it supports the use of wildcards,
i.e. it can leave * (for any number of characters) and ? (for a single character) functional (and the wildcards themselves can again be escaped). Performance-wise the escaping is not much of an issue, if you have the regular expressions that do it in a central place, as the Driver can cache them this way.

Here's the Simul Efun. The PCRE part was just added, it's not been properly tested yet. Documentation is in German, I can translate if it's not obvious anyway. The function assumes that the MUD uses traditional regexp, not PCRE by default.

closure escape_pcre_wildcard;
closure escape_regexp_wildcard;
closure escape_regexp_replace;
closure escape_getdir;

/*
FUNKTION: escape_string
DEKLARATION: varargs string escape_string(string str, int mode)
BESCHREIBUNG:
Die Funktion fuegt Escape-Zeichen (\) nach verschiedenen Regeln in
den Eingabestring 'str' ein und liefert das Ergebnis zurueck. Die
verwendeten Regeln haengen von 'mode' ab.

Die Bestandteile des Bitflags 'mode':

   ESCAPE_REGEXP: (Default)
       Der zurueckgelieferte String ist ein regexp-Muster, das auf
       den Eingabestring 'str' passt, auch wenn 'str' Teil eines
       groesseren Strings ist.

   ESCAPE_PCRE:
       Anstelle von ESCAPE_REGEXP fuer PCRE-kompatible Patterns.

   ESCAPE_WILDCARD:
       Das zurueckgelieferte Muster beachtet einfache Wildcards:

           ? steht fuer ein beliebiges einzelnes Zeichen
           * steht fuer beliebig viele (oder keine) Zeichen

       Die Wildcards duerfen im Eingabestring selbst escaped werden.

          \? steht fuer ?
          \* steht fuer *

   ESCAPE_EXACT:
       Anstatt einen String nach einem passenden Teilstring zu durchsuchen,
       passt das zurueckgelieferte Muster nur auf den kompletten String.

   ESCAPE_REPLACE:
       Es wird kein Muster, sondern ein Replacepattern zurueckgeliefert.
       Damit setzt regreplace() genau den Eingabestring ein, auch wenn
       dieser Sonderzeichen enthaelt.

   ESCAPE_GETDIR:
       Der zurueckgelieferte String matcht Dateien bei get_dir()
       exakt der Eingabe, Wildcards und Escapes werden ignoriert.

BEISPIELE:
    escape_string("(foo|bar)")
        -> Das Muster passt auf "(foo|bar)", "bla (foo|bar) bla"

    escape_string("(foo|bar)", ESCAPE_PCRE)
        -> Dasselbe, wenn man PCRE benutzt.

    escape_string("(foo|bar)", ESCAPE_EXACT)
        -> Das Muster passt ausschliesslich auf "(foo|bar)".

    escape_string("bla*", ESCAPE_WILDCARD)
        -> Das Muster passt auf "bla", "blafasel", "/pfad/zum/blablubb"

    escape_string("bla*", ESCAPE_WILDCARD|ESCAPE_EXACT)
        -> Jetzt passt das Muster nicht mehr auf "/pfad/zum/blablubb"

    escape_string("\\1&\\2&\\3", ESCAPE_REPLACE)
        -> Als replacepattern verwendet, wird \1&\2&\3 eingesetzt.

VERWEISE: regexp, regreplace, get_dir
GRUPPEN: string, simul_efun
*/
varargs string escape_string(string str, int mode)
{
#define ESCAPE_LAMBDA(x) lambda( ({'s}), ({#'||, ({#'[, x, 's}), "\\\\&"}) )

    string ret;

    if(!stringp(str))
    {
        raise_error("Bad argument 1 to escape_string(): not a string\n");
    }

    if(mode & ESCAPE_GETDIR)
    {
        escape_getdir ||= ESCAPE_LAMBDA(
            ([ "\\\\": "\\\\\\\\\\\\\\\\", // Escapter Backslash
               "\\": "\\\\\\\\", // Normaler Backslash
               "\\?": "\\\\\\\\\\\\?", // escaptes ?
               "\\*": "\\\\\\\\\\\\*", // escaptes *
            ]) );

        ret = regreplace(str, "\\\\\\\\|\\\\[?*]|[?*\\\\]", escape_getdir, 1);
    }

    else if(mode & ESCAPE_REPLACE)
    {
        // Keine Unterschiede zwischen traditioneller Regexp und PCRE?
        escape_regexp_replace ||= ESCAPE_LAMBDA(
            ([ "\\\\": "\\\\\\\\\\\\\\\\", // escapter Backslash
               "\\": "\\\\\\\\", // normaler Backslash
               "\\&": "\\\\\\\\\\\\\\&", // escaptes &
            ]) );
 
          ret = regreplace(str, "\\\\\\\\|\\\\[0-9&]|[&\\\\]",
                                escape_regexp_replace, 1);
    }

    else if(mode & ESCAPE_PCRE)
    {
        if(mode & ESCAPE_WILDCARD)
        {
            escape_pcre_wildcard ||= ESCAPE_LAMBDA(
                ([ "\\\\": "\\\\", // escapter Backslash
                   "\\E": "\\\\E\\\\\\\\E\\\\Q", // PCRE quote
                   "\\*": "*", // escaptes *,
                   "*": "\\\\E.*\\\\Q", // Wildcard *
                   "\\?": "?", // escaptes ?
                   "?": "\\\\E.\\\\Q" // Wildcard ?
                ]) );
               
            ret = regreplace(str, "\\\\\\\\|\\\\E|\\\\[*?]|[\\\\?*]",
                                  escape_pcre_wildcard, 1);
        }

        else
        {
            ret = regreplace(str, "\\\\E", "\\\\E\\\\\\\\E\\\\Q", 1);
        }

        if(mode & ESCAPE_EXACT)
        {
            ret = sprintf("^\\Q%s\\E$", ret);
        }

        else
        {
            ret = sprintf("\\Q%s\\E", ret);
        }
    }

    else // mode & ESCAPE_REGEXP
    {
        if(mode & ESCAPE_WILDCARD)
        {
            // Closure initialisieren, falls noch nicht gemacht.
            escape_regexp_wildcard ||= ESCAPE_LAMBDA(
                ([ "\\\\": "\\\\\\\\", // escapter Backslash
                   "\\*": "\\\\*", // escaptes *
                   "*": ".*", // Wildcard *
                   "\\?": "?", // escaptes ?
                   "?": "." // Wildcard ?
                ]) );

            // regexp-Zeichen ersetzen, Wildcards erlauben.
            ret = regreplace(str, "\\\\\\\\|\\\\[<>*?]|[\\\\?*.^$|()+[\\]]",
                                  escape_regexp_wildcard, 1);
        }

        else
        {
            // regexp-Zeichen escapen.
            ret = regreplace(str, "\\\\[<>]|[\\\\*.^$|()+[\\]]", "\\\\&", 1);
        }

        if(mode & ESCAPE_EXACT)
        {
            // Exakter Match: ^ und $ anfuegen.
            ret = sprintf("^%s$", ret);
        }
    }

    return ret;

#undef ESCAPE_LAMBDA
}

Gnomi

2012-02-17 16:40

manager   ~0002100

Can I add it to the ldlib repository? (https://github.com/ldmud/ldlib/, ISC license, which is a simplified 2-clause-BSD license.)

menaures

2012-02-17 17:07

reporter   ~0002101

yeah sure

Gnomi

2012-02-17 18:32

manager   ~0002102

The need to prevent special characters to do their work arises on several occasions (eg. both arguments of regreplace or get_dir). I don't want to introduce flags each time there are special characters.

I added Menaures' escape_string() to the ldlib repository for everyone to use. This solution is more flexible and performs the task. So closing this bug as won't fix. Please re-open if you disagree.

bubbs

2012-02-17 19:15

reporter   ~0002103

Thanks for the escape_* functions Menaures - very handy. I'd need an English version of the manual to understand the functions (without picking through the code).

I've reopened since I'm not sure the escape_* functions help with the replacement argument of the regreplace() function - not the regexp argument. It may be they do, since I don't read German I can't tell (and my apologies if it does).

An alternative solution to the general problem would be a generic plain string substutition function. On Timewarp mudlibs, we use subst(), which recognised the following method signatures:

string subst(string, string, string);
string subst(string, string, closure);
string subst(string, string*, string*);
string subst(string, string*, mapping);
string subst(string, string*, closure);
string subst(string, mapping);


Cheers to all.

Gnomi

2012-02-17 19:26

manager   ~0002104

In the ldlib repository is an english version of the documentation.
In your example you need escape_string("wibble &", ESCAPE_REPLACE).

guest

2012-02-17 20:17

viewer   ~0002105

Ah, cool. Closed.

Issue History

Date Modified Username Field Change
2006-07-10 13:59 bubbs New Issue
2006-10-07 20:19 menaures Note Added: 0000517
2012-02-17 16:40 Gnomi Note Added: 0002100
2012-02-17 17:07 menaures Note Added: 0002101
2012-02-17 18:32 Gnomi Note Added: 0002102
2012-02-17 18:32 Gnomi Status new => closed
2012-02-17 18:32 Gnomi Assigned To => Gnomi
2012-02-17 18:32 Gnomi Resolution open => won't fix
2012-02-17 19:15 bubbs Note Added: 0002103
2012-02-17 19:15 bubbs Status closed => feedback
2012-02-17 19:15 bubbs Resolution won't fix => reopened
2012-02-17 19:26 Gnomi Note Added: 0002104
2012-02-17 20:17 guest Note Added: 0002105
2012-02-17 20:17 guest Status feedback => closed
2012-02-17 20:17 guest Resolution reopened => fixed