View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0000478 | LDMud | Efuns | public | 2006-07-10 13:59 | 2012-02-17 20:17 |
| Reporter | bubbs | Assigned To | Gnomi | ||
| Priority | normal | Severity | tweak | Reproducibility | always |
| Status | closed | Resolution | fixed | ||
| Summary | 0000478: option to prevent regreplace() replacing '&' in replaced text | ||||
| Description | regreplace() performs a number of simple substitutions on the replacement text; from the man page: \1 - \9 replaced with the matched pattern in the first - the ninth pair of () & replaced with the whole matched pattern \& replaced with & \\ replaced with \ (required for multiple \ and in combination with the listed patterns; you don't need to escape every single \) It would be handy, on occasion, to be able to prevent this substitution. Adding a new bit flag to the existing bit-flag argument to regreplace() would seem logical. | ||||
| Steps To Reproduce | regreplace("echo $*", \\$\\*", "wibble &", 1) returns "echo wibble $*" | ||||
| Additional Information | To prevent this unrequired change from happening, I am currently performing an additional regreplace() on the target regreplace() replacement string. In the above case; regreplace("echo $*", "\\$\\*", regreplace("wibble &", "&|\\\\&|\\\\\\\\|\\\\[1-9]", (: switch($1) { case "&" : return "\\\\&"; case "\\&" : return "\\\\\\\\&"; case "\\\\": return "\\\\\\\\\\\\\\\\"; default : return "\\\\\\" +$1; } :), 1), 1) returns "echo wibble &" | ||||
| Tags | No tags attached. | ||||
| External Data (URL) | |||||
|
|
For the UNItopia Mudlib I wrote an escape_string() Simul Efun which can escape patterns for get_dir, regexp, regreplace, and pcre. Although it means doing an additional regreplace call for the escaping, it has an advantage as compared to a simple bitflag, as sometimes you only want a part of the pattern to be escaped, not the whole thing. Additionally it supports the use of wildcards, i.e. it can leave * (for any number of characters) and ? (for a single character) functional (and the wildcards themselves can again be escaped). Performance-wise the escaping is not much of an issue, if you have the regular expressions that do it in a central place, as the Driver can cache them this way. Here's the Simul Efun. The PCRE part was just added, it's not been properly tested yet. Documentation is in German, I can translate if it's not obvious anyway. The function assumes that the MUD uses traditional regexp, not PCRE by default. closure escape_pcre_wildcard; closure escape_regexp_wildcard; closure escape_regexp_replace; closure escape_getdir; /* FUNKTION: escape_string DEKLARATION: varargs string escape_string(string str, int mode) BESCHREIBUNG: Die Funktion fuegt Escape-Zeichen (\) nach verschiedenen Regeln in den Eingabestring 'str' ein und liefert das Ergebnis zurueck. Die verwendeten Regeln haengen von 'mode' ab. Die Bestandteile des Bitflags 'mode': ESCAPE_REGEXP: (Default) Der zurueckgelieferte String ist ein regexp-Muster, das auf den Eingabestring 'str' passt, auch wenn 'str' Teil eines groesseren Strings ist. ESCAPE_PCRE: Anstelle von ESCAPE_REGEXP fuer PCRE-kompatible Patterns. ESCAPE_WILDCARD: Das zurueckgelieferte Muster beachtet einfache Wildcards: ? steht fuer ein beliebiges einzelnes Zeichen * steht fuer beliebig viele (oder keine) Zeichen Die Wildcards duerfen im Eingabestring selbst escaped werden. \? steht fuer ? \* steht fuer * ESCAPE_EXACT: Anstatt einen String nach einem passenden Teilstring zu durchsuchen, passt das zurueckgelieferte Muster nur auf den kompletten String. ESCAPE_REPLACE: Es wird kein Muster, sondern ein Replacepattern zurueckgeliefert. Damit setzt regreplace() genau den Eingabestring ein, auch wenn dieser Sonderzeichen enthaelt. ESCAPE_GETDIR: Der zurueckgelieferte String matcht Dateien bei get_dir() exakt der Eingabe, Wildcards und Escapes werden ignoriert. BEISPIELE: escape_string("(foo|bar)") -> Das Muster passt auf "(foo|bar)", "bla (foo|bar) bla" escape_string("(foo|bar)", ESCAPE_PCRE) -> Dasselbe, wenn man PCRE benutzt. escape_string("(foo|bar)", ESCAPE_EXACT) -> Das Muster passt ausschliesslich auf "(foo|bar)". escape_string("bla*", ESCAPE_WILDCARD) -> Das Muster passt auf "bla", "blafasel", "/pfad/zum/blablubb" escape_string("bla*", ESCAPE_WILDCARD|ESCAPE_EXACT) -> Jetzt passt das Muster nicht mehr auf "/pfad/zum/blablubb" escape_string("\\1&\\2&\\3", ESCAPE_REPLACE) -> Als replacepattern verwendet, wird \1&\2&\3 eingesetzt. VERWEISE: regexp, regreplace, get_dir GRUPPEN: string, simul_efun */ varargs string escape_string(string str, int mode) { #define ESCAPE_LAMBDA(x) lambda( ({'s}), ({#'||, ({#'[, x, 's}), "\\\\&"}) ) string ret; if(!stringp(str)) { raise_error("Bad argument 1 to escape_string(): not a string\n"); } if(mode & ESCAPE_GETDIR) { escape_getdir ||= ESCAPE_LAMBDA( ([ "\\\\": "\\\\\\\\\\\\\\\\", // Escapter Backslash "\\": "\\\\\\\\", // Normaler Backslash "\\?": "\\\\\\\\\\\\?", // escaptes ? "\\*": "\\\\\\\\\\\\*", // escaptes * ]) ); ret = regreplace(str, "\\\\\\\\|\\\\[?*]|[?*\\\\]", escape_getdir, 1); } else if(mode & ESCAPE_REPLACE) { // Keine Unterschiede zwischen traditioneller Regexp und PCRE? escape_regexp_replace ||= ESCAPE_LAMBDA( ([ "\\\\": "\\\\\\\\\\\\\\\\", // escapter Backslash "\\": "\\\\\\\\", // normaler Backslash "\\&": "\\\\\\\\\\\\\\&", // escaptes & ]) ); ret = regreplace(str, "\\\\\\\\|\\\\[0-9&]|[&\\\\]", escape_regexp_replace, 1); } else if(mode & ESCAPE_PCRE) { if(mode & ESCAPE_WILDCARD) { escape_pcre_wildcard ||= ESCAPE_LAMBDA( ([ "\\\\": "\\\\", // escapter Backslash "\\E": "\\\\E\\\\\\\\E\\\\Q", // PCRE quote "\\*": "*", // escaptes *, "*": "\\\\E.*\\\\Q", // Wildcard * "\\?": "?", // escaptes ? "?": "\\\\E.\\\\Q" // Wildcard ? ]) ); ret = regreplace(str, "\\\\\\\\|\\\\E|\\\\[*?]|[\\\\?*]", escape_pcre_wildcard, 1); } else { ret = regreplace(str, "\\\\E", "\\\\E\\\\\\\\E\\\\Q", 1); } if(mode & ESCAPE_EXACT) { ret = sprintf("^\\Q%s\\E$", ret); } else { ret = sprintf("\\Q%s\\E", ret); } } else // mode & ESCAPE_REGEXP { if(mode & ESCAPE_WILDCARD) { // Closure initialisieren, falls noch nicht gemacht. escape_regexp_wildcard ||= ESCAPE_LAMBDA( ([ "\\\\": "\\\\\\\\", // escapter Backslash "\\*": "\\\\*", // escaptes * "*": ".*", // Wildcard * "\\?": "?", // escaptes ? "?": "." // Wildcard ? ]) ); // regexp-Zeichen ersetzen, Wildcards erlauben. ret = regreplace(str, "\\\\\\\\|\\\\[<>*?]|[\\\\?*.^$|()+[\\]]", escape_regexp_wildcard, 1); } else { // regexp-Zeichen escapen. ret = regreplace(str, "\\\\[<>]|[\\\\*.^$|()+[\\]]", "\\\\&", 1); } if(mode & ESCAPE_EXACT) { // Exakter Match: ^ und $ anfuegen. ret = sprintf("^%s$", ret); } } return ret; #undef ESCAPE_LAMBDA } |
|
|
Can I add it to the ldlib repository? (https://github.com/ldmud/ldlib/, ISC license, which is a simplified 2-clause-BSD license.) |
|
|
yeah sure |
|
|
The need to prevent special characters to do their work arises on several occasions (eg. both arguments of regreplace or get_dir). I don't want to introduce flags each time there are special characters. I added Menaures' escape_string() to the ldlib repository for everyone to use. This solution is more flexible and performs the task. So closing this bug as won't fix. Please re-open if you disagree. |
|
|
Thanks for the escape_* functions Menaures - very handy. I'd need an English version of the manual to understand the functions (without picking through the code). I've reopened since I'm not sure the escape_* functions help with the replacement argument of the regreplace() function - not the regexp argument. It may be they do, since I don't read German I can't tell (and my apologies if it does). An alternative solution to the general problem would be a generic plain string substutition function. On Timewarp mudlibs, we use subst(), which recognised the following method signatures: string subst(string, string, string); string subst(string, string, closure); string subst(string, string*, string*); string subst(string, string*, mapping); string subst(string, string*, closure); string subst(string, mapping); Cheers to all. |
|
|
In the ldlib repository is an english version of the documentation. In your example you need escape_string("wibble &", ESCAPE_REPLACE). |
|
|
Ah, cool. Closed. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2006-07-10 13:59 | bubbs | New Issue | |
| 2006-10-07 20:19 | menaures | Note Added: 0000517 | |
| 2012-02-17 16:40 | Gnomi | Note Added: 0002100 | |
| 2012-02-17 17:07 | menaures | Note Added: 0002101 | |
| 2012-02-17 18:32 | Gnomi | Note Added: 0002102 | |
| 2012-02-17 18:32 | Gnomi | Status | new => closed |
| 2012-02-17 18:32 | Gnomi | Assigned To | => Gnomi |
| 2012-02-17 18:32 | Gnomi | Resolution | open => won't fix |
| 2012-02-17 19:15 | bubbs | Note Added: 0002103 | |
| 2012-02-17 19:15 | bubbs | Status | closed => feedback |
| 2012-02-17 19:15 | bubbs | Resolution | won't fix => reopened |
| 2012-02-17 19:26 | Gnomi | Note Added: 0002104 | |
| 2012-02-17 20:17 | guest | Note Added: 0002105 | |
| 2012-02-17 20:17 | guest | Status | feedback => closed |
| 2012-02-17 20:17 | guest | Resolution | reopened => fixed |