View Issue Details

IDProjectCategoryView StatusLast Update
0000709LDMud 3.3LPC Compiler/Preprocessorpublic2009-12-22 10:15
ReporterWildcat Assigned ToGnomi  
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSCentOSOS Version5.3
Product Version3.3.718 
Target Version3.3.720Fixed in Version3.3.720 
Summary0000709: Repeatable crash in 64 bit build of 3.3.718
DescriptionGreetings,

I just upgraded form an ancient driver to the most recent driver and I'm getting a repeatable crash loading a specific NPC but as I try to narrow it down the stack changes, it's always in a null string deference during the cloning but the objects in the path can change. In this example it was /bin/damage/mino for example.

#0 0x0000000000450cf5 in mstr_mem_size (s=0x0) at mstrings.h:126
0000001 0x00000000004517cf in ref_mstring (s=0x0) at mstrings.h:187
0000002 0x00000000004588e0 in eval_instruction (
    first_instruction=0x24a41ca "b\001\002cq", initial_sp=0x7801f0)
    at interpret.c:8443
0000003 0x000000000046ea66 in apply_low (fun=0xbafbc8, ob=0x10dfd68, num_arg=1,
    b_ign_prot=true, allowRefs=false) at interpret.c:17096
0000004 0x000000000046ebf1 in int_apply (fun=0xbafbc8, ob=0x10dfd68, num_arg=1,
    b_ign_prot=true, b_use_default=true) at interpret.c:17174
0000005 0x000000000046f046 in sapply_int (fun=0xbafbc8, ob=0x10dfd68, num_arg=1,
    b_find_static=true, b_use_default=true) at interpret.c:17335
0000006 0x0000000000495002 in reset_object (ob=0x10dfd68, arg=5) at object.c:899
0000007 0x00000000004dffa0 in load_object (lname=0x7d0860 "bin/damage/mino",
    create_super=false, depth=0, isMasterObj=false, chain=0x0)
    at simulate.c:2120
0000008 0x00000000004e0842 in lookfor_object (str=0x108e950, bLoad=true)
    at simulate.c:2388
0000009 0x00000000004e473e in f_load_object (sp=0x7801a0) at simulate.c:4449
0000010 0x0000000000457eb5 in eval_instruction (
    first_instruction=0xbfd3b2 "b\001\001\037", initial_sp=0x780190)
    at interpret.c:8175
0000011 0x00000000004679b7 in eval_instruction (
    first_instruction=0x19777da "?\002\001\020,\00375l\001\031?\0035\n",
---Type <return> to continue, or q <return> to quit---
    initial_sp=0x780100) at interpret.c:14943
0000012 0x000000000046ea66 in apply_low (fun=0xbdb118, ob=0x250ea58, num_arg=1,
    b_ign_prot=false, allowRefs=false) at interpret.c:17096
0000013 0x000000000046ebf1 in int_apply (fun=0xbdb118, ob=0x250ea58, num_arg=1,
    b_ign_prot=false, b_use_default=true) at interpret.c:17174
#14 0x000000000046a57e in eval_instruction (
    first_instruction=0x7fff99e5e0f0 "?\030E", initial_sp=0x780100)
    at interpret.c:16444
#15 0x0000000000470df5 in int_call_lambda (lsvp=0x7800d0, num_arg=3,
    allowRefs=false, external=true) at interpret.c:18354
#16 0x0000000000474c65 in v_apply (sp=0x780100, num_arg=4) at interpret.c:20588
#17 0x0000000000458696 in eval_instruction (
    first_instruction=0xbffa2c "c\037", initial_sp=0x7800b0)
    at interpret.c:8374
#18 0x00000000004dc8c4 in catch_instruction (flags=0, offset=12,
    i_sp=0x8159b0, i_pc=0xbffa2c "c\037", i_fp=0x780070, reserve_cost=2000,
    i_context=0x0) at simulate.c:449
#19 0x000000000045a66b in eval_instruction (
    first_instruction=0xbffa22 "b\002\002?\003Yc ", initial_sp=0x7800a0)
    at interpret.c:9593
#20 0x000000000047048f in int_call_lambda (lsvp=0x780050, num_arg=2,
    allowRefs=false, external=true) at interpret.c:18075
#21 0x00000000004e5f63 in v_limited (sp=0x780080, num_arg=4) at simulate.c:5228
---Type <return> to continue, or q <return> to quit---
#22 0x0000000000458696 in eval_instruction (
    first_instruction=0xbffa82 "b\002\001\002\b\016?\206\001",
    initial_sp=0x780030) at interpret.c:8374
#23 0x00000000004679b7 in eval_instruction (
    first_instruction=0xb66d22 "b\002\006?\a", initial_sp=0x77fff0)
    at interpret.c:14943
#24 0x000000000046e646 in apply_low (fun=0xb88010, ob=0xbadb60, num_arg=2,
    b_ign_prot=false, allowRefs=false) at interpret.c:16983
#25 0x000000000046ebf1 in int_apply (fun=0xb88010, ob=0xbadb60, num_arg=2,
    b_ign_prot=false, b_use_default=true) at interpret.c:17174
#26 0x000000000046a57e in eval_instruction (
    first_instruction=0x12285a2 "b\001\003?\a", initial_sp=0x77ff40)
    at interpret.c:16444
#27 0x000000000046e646 in apply_low (fun=0xc02e08, ob=0x11e7aa8, num_arg=1,
    b_ign_prot=false, allowRefs=false) at interpret.c:16983
#28 0x000000000046ebf1 in int_apply (fun=0xc02e08, ob=0x11e7aa8, num_arg=1,
    b_ign_prot=false, b_use_default=true) at interpret.c:17174
#29 0x000000000046f046 in sapply_int (fun=0xc02e08, ob=0x11e7aa8, num_arg=1,
    b_find_static=false, b_use_default=true) at interpret.c:17335
#30 0x0000000000408968 in parse_command (buff=0x7fff99e62e30 "clone drguard",
    from_efun=false) at actions.c:1068
#31 0x0000000000409282 in execute_command (str=0x7fff99e62e30 "clone drguard",
    ob=0x11e7aa8) at actions.c:1269
---Type <return> to continue, or q <return> to quit---
#32 0x00000000004119f7 in backend () at backend.c:677
#33 0x00000000004819ef in main (argc=2, argv=0x7fff99e64888) at main.c:673

This only occurs if the driver is compiled on a 64 bit machine. I have a centos 5.3 i386 machine I do development on locally which doesn't exhibit the problem, but a centos 5.3 x64 does. Since it's so reproducible I can pretty much do whatever is needed to try to debug it more.
TagsNo tags attached.

Relationships

has duplicate 0000708 resolvedGnomi Illegal instruction encountered 

Activities

zesstra

2009-12-21 04:07

administrator   ~0001653

Always nice to have some reproducible problem. ;-)
Could you supply us executable, coredump and the source code of the crashing program? Using a driver compiled with -O0 and -ggdb3 would be the best.
(If the package gets big, we might exchange it using ftp or alike).

zesstra

2009-12-21 04:38

administrator   ~0001654

Ok, on a second thought: as it is not a single program, which exhibits this behaviour, it may be a better idea to look at your mudlib. Do you have a public (minimal) mudlib, which triggers the crash and which we might use to reproduce it in our development environment?
Additionally, could you please add config.h, machine.h, Makefile and the output of 'gcc -dumpmachine' and 'gcc -dumpspecs' as well?
I use a driver compiled for x86_64 as well which does usually not crash. So I think, there has to be some significant difference either in your build environment or your mudlib.

Wildcat

2009-12-21 12:55

reporter   ~0001655

I've tarred up ldmud, a core dump, output of dumpmachine, dumpspecs, config.h, machine.h, Makefile into a tarball that can be found at http://www.thebigwave.net/709/709.tar.gz

Unfortunately the mudlib is extremely large and custom. It started life as a stock LP 2.4.5 in 1990 and has gone through different drivers along the way while still being compat mode. I'm pretty sure there isn't that many like it left around. The specific program that I found crashing 'drguard.c' hasn't been modified since '01 when I think we were on an Amylaar driver. I notice that it even used:
string sChat;
sChat = allocate(2);
sChat[0] = "String";
sChat[1] = "String";
type nomenclature, however changing that to string* sChat += does not avert the crash.

Is there an easy way to dump all programs that are being compiled? I could perhaps form a minimum mudlib if I can track down what's being loaded easily but there are several layers involved.

Given how reproducible it is, I just run a test driver/lib on another port and crash it all the time there, I can do whatever debugging you need done as I'm a professional game developer in my other life with experience shipping Linux based MMOs.

zesstra

2009-12-21 15:11

administrator   ~0001656

Thanks for the data.
Sometimes muds have a small, public version of their core lib without any secret objects and data, we could have checked if this is affected as well and used it as a starting point for a testcase. (Also, if we think about the possibility that programs are mis-compiled it helps to know how the compiled bytecode should look like.)
However, the driver writes a list of all programs compiled to stdout if you start it with the command line option -c.

Are you sure, you use a 3.3.718? I checked out 718 from our repository, but the line numbers in interpret.c don't match. Line 8443 is the opening { in CASE(F_FLOAT);. Are there any modifications/patches in use?

zesstra

2009-12-21 16:00

administrator   ~0001657

Ok, Gnomi told me, that this seems to be 3.3.719.
Then another thing: Could you do try 3.3.718 (and maybe even older releases) as well? If this bug was introduced recently, we may limit the problem to one release.

If you manage to assemble a test case, I could use git bisect to find the right revision, that would be even better. We could then trace the compilation as well.

Gnomi had a first look at the bytecode, which seems to be wrong. He needs the instr.h and the source of a crashing program.

Wildcat

2009-12-21 16:59

reporter   ~0001658

Yes it is 719, I'll try on 718 and a smattering of older releases. I'm also putting together a public mudlib with as much trimmed out as possible. One quick question, is there an option for 'deterministic random' I can easily turn on? The case has the NPC spawn a random race but I can trim out a few hundred files if it's always the same race. Actually an interesting point/question, I'm going to add the NPC in question to the end of the autoload and see if it crashes on startup with it listed there, this would reduce the set of files needed dramatically.

Another random thing while you can get all of the info from config.h I figure to mention the configure I run with:
./configure --prefix=/usr/users --enable-compat-mode --enable-erq=xerq --with-er
q-debug=0 --with-read-file-max-size=300000 --with-master-name=obj/master --with-
max-array-size=0 --with-max-mapping-size=0 --with-max-mapping-keys=0 --with-max-
players=100 --with-max-cost=5000000 --with-hard-malloc-limit=0 --enable-use-mysq
l --enable-use-mccp --enable-use-pcre=builtin --enable-use-xml=xml2 --enable-use
-tls --with-portno=2777 LDFLAGS=-L/usr/lib64/mysql

The driver is also stock 719 other than the configure script listed above, and ldd is returning:
        linux-vdso.so.1 => (0x00007fff7bffe000)
        /lib64/rtkaio/librt.so.1 (0x00007f1b73cae000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x000000396b000000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003969c00000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x000000396b400000)
        libssl.so.6 => /lib64/libssl.so.6 (0x000000325e200000)
        libcrypto.so.6 => /lib64/libcrypto.so.6 (0x000000325ce00000)
        libmysqlclient.so.15 => /usr/lib64/mysql/libmysqlclient.so.15 (0x000000325e600000)
        libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x0000003971000000)
        libz.so.1 => /usr/lib64/libz.so.1 (0x000000396a000000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003969000000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003969800000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003968c00000)
        libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x000000325d200000)
        libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x000000325da00000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x000000396e400000)
        libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x000000325de00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003969400000)
        libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x000000325d600000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x000000396b800000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x000000396bc00000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x000000325ca00000)
        libsepol.so.1 => /lib64/libsepol.so.1 (0x000000396a800000)

incase that comes up at all.

Ok, adding it to the startup file repros the crash 100% of the time and I just hardcoded the race to always be the same which also is reproing 100% of the time. Test mudlib should be forthcoming...

Ok, at http://www.thebigwave.net/crashlib.tgz is a minimal mudlib that reproduces it, you'll need to start the driver with -D PUBLIC_MUDLIB (used in master, quest, and commandd I believe) and it should start up and crash instantly given the before mentioned executable.

the instrs.h that was used to compile the debug version given before is in http://www.thebigwave.net/instrs.h

I think that's pretty much everything you could ask for/need, otherwise I'll be happy to supply more information.

Gnomi

2009-12-21 17:14

manager   ~0001659

I tried the minimal mudlib and it complains about missing files in /obj/simul_efuns/ (included by /obj/simul_efun.c).

Wildcat

2009-12-21 17:27

reporter   ~0001660

I was afraid of that, there are some cases where includes are used and -c doesn't catch that, I'll grep the mudlib fast for includes and add anything that's not in a sys directory...

I did a single pass through it and update the lib at http://www.thebigwave.net/crashlib.tgz , I'll go and start a VM to verify it as well...

Wildcat

2009-12-21 17:39

reporter   ~0001661

Ok I verified that the size 213927 date stamp Dec 21 14:33 crashlib.tgz that's there now boots without an error until the point of crash. (While using -D PUBLIC_MUDLIB)

zesstra

2009-12-21 18:01

administrator   ~0001662

Thank you very much for the testcase. I can reproduce it on my system (and a different platform) and isolated the revision which introduced the error.

2009-12-21 20:40

 

bug709.diff (1,528 bytes)   
Index: trunk/src/version.sh
===================================================================
--- trunk/src/version.sh	(Revision 2618)
+++ trunk/src/version.sh	(Arbeitskopie)
@@ -17,7 +17,7 @@
 # A timestamp, to be used by bumpversion and other scripts.
 # It can be used, for example, to 'touch' this file on every build, thus
 # forcing revision control systems to add it on every checkin automatically.
-version_stamp="2009-05-30 12:00:00"
+version_stamp="Di 22. Dez 02:25:01 CET 2009"
 
 # The version number information
 version_micro=719
Index: trunk/src/prolang.y
===================================================================
--- trunk/src/prolang.y	(Revision 2618)
+++ trunk/src/prolang.y	(Arbeitskopie)
@@ -12997,7 +12997,12 @@
               }
 
               CURRENT_PROGRAM_SIZE--;
-              last_expression--;
+
+              /* If last_expression lies within the program area
+               * that was moved one bytecode adjust it accordingly.
+               */
+              if(last_expression > $<function_call_head>2.start)
+                  last_expression--;
           }
 
           argument_level--;
@@ -13268,6 +13273,12 @@
               }
 
               CURRENT_PROGRAM_SIZE--;
+
+              /* If last_expression lies within the program area
+               * that was moved one bytecode adjust it accordingly.
+               */
+              if(last_expression > $<function_call_head>4.start)
+                  last_expression--;
           }
 
           argument_level--;
bug709.diff (1,528 bytes)   

Gnomi

2009-12-21 20:45

manager   ~0001663

I attached a patch that fixes this case. Unfortunately it doesn't seem to apply to 0000683 as well.

The problem was, that the compiler adjusted last_expression wrongly after it had moved some parts of the program, so that last_expression might point to an argument instead of an instruction (and when that arguments happened to be 0x14 (F_NUMBER) and the distance between last_expression and CURRENT_PROGRAM_SIZE happened to be 9, the compiler did some optimizations it better had not done).

Gnomi

2009-12-22 10:15

manager   ~0001664

Bugfix committed as r2809.

Issue History

Date Modified Username Field Change
2009-12-20 22:05 Wildcat New Issue
2009-12-21 03:59 zesstra Project LDMud => LDMud 3.3
2009-12-21 04:07 zesstra Note Added: 0001653
2009-12-21 04:07 zesstra Status new => acknowledged
2009-12-21 04:07 zesstra OS => CentOS
2009-12-21 04:07 zesstra OS Version => 5.3
2009-12-21 04:07 zesstra Platform => x86_64
2009-12-21 04:07 zesstra Target Version => 3.3.720
2009-12-21 04:38 zesstra Note Added: 0001654
2009-12-21 12:55 Wildcat Note Added: 0001655
2009-12-21 15:11 zesstra Note Added: 0001656
2009-12-21 16:00 zesstra Note Added: 0001657
2009-12-21 16:59 Wildcat Note Added: 0001658
2009-12-21 17:14 Gnomi Note Added: 0001659
2009-12-21 17:27 Wildcat Note Added: 0001660
2009-12-21 17:39 Wildcat Note Added: 0001661
2009-12-21 18:01 zesstra Note Added: 0001662
2009-12-21 18:01 zesstra Status acknowledged => confirmed
2009-12-21 20:40 Gnomi File Added: bug709.diff
2009-12-21 20:45 Gnomi Note Added: 0001663
2009-12-22 10:15 Gnomi Note Added: 0001664
2009-12-22 10:15 Gnomi Status confirmed => resolved
2009-12-22 10:15 Gnomi Fixed in Version => 3.3.720
2009-12-22 10:15 Gnomi Resolution open => fixed
2009-12-22 10:15 Gnomi Assigned To => Gnomi
2009-12-22 10:17 Gnomi Relationship added has duplicate 0000708