View Issue Details

IDProjectCategoryView StatusLast Update
0000574LDMud 3.3Runtimepublic2009-10-29 04:21
Reporterwedsall Assigned To 
PrioritynormalSeveritycrashReproducibilityrandom
Status closedResolutionunable to reproduce 
Product Version3.3.717 
Summary0000574: random crashes possibly memory related
DescriptionMud crashes randomly -- seems to happen near the beginning of the boot. A reboot of the server seemed to clear up the mess after 3 consecutive crashes all within 15 minutes to 5 hours.

I captured the log files from 2 crashes. Here is the stderr from crash 1:
2008.09.14 03:01:38 write socket (compressed): wrote 1460, should be 1024.
2008.09.14 03:01:44 write socket (compressed): wrote 2920, should be 108.
2008.09.14 03:02:04 write socket: wrote 267, should be 1024.
2008.09.14 03:02:04 write socket: wrote 872, should be 1024.
2008.09.14 03:02:04 write socket: wrote 266, should be 1024.
2008.09.14 03:02:04 write socket: wrote 873, should be 1024.
2008.09.14 03:02:04 write socket: wrote 873, should be 1024.
2008.09.14 03:02:04 write socket: wrote 873, should be 1024.
2008.09.14 03:02:04 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 436, should be 1023.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 872, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:05 write socket: wrote 1005, should be 1024.
2008.09.14 03:02:05 write socket: wrote 873, should be 1024.
2008.09.14 03:02:06 comm: write EWOULDBLOCK. Message discarded.
2008.09.14 03:02:16 write socket: wrote 569, should be 1024.
2008.09.14 03:02:16 write socket: wrote 721, should be 1024.
2008.09.14 03:02:16 write socket: wrote 873, should be 1024.
2008.09.14 03:02:16 write socket: wrote 721, should be 1024.
..
[xerq] read: Success
2008.09.14 05:09:31 [xerq] Demon exiting.

Here is some stdout from crash 1:
2008.09.14 02:38:08 Error in master_ob->valid_read()
2008.09.14 02:38:08 eval_cost too big 2100022
2008.09.14 02:38:08 Caught error: Too long evaluation. Execution aborted.
.. I believe the above was my fault, and I repaired.

2008.09.14 03:37:40 ... execution continues.
2008.09.14 03:37:40 MCCP-DEBUG: 'obj/player#991' mccp ended
]/ / / / / / / /
] ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
]/ / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / / / /
]^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
]/ / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^
^
]/ / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / /
] ^ ^ ^ ^ ^ ^
] / / / / / / /
] ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / /
]^ ^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / /
/
]^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / /

] ^ ^ ^ ^ ^ ^ ^ ^ ^
]/ / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
]/ / / / / / / /
] ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
]/ / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
]/ / / / / / / /
] ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
] / / / / / / / / / / / / /
] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
2008.09.14 03:45:20 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
..This rain looking message came through twice :)

2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:47:56 ... execution continues.
2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:47:56 ... execution continues.
2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:47:56 ... execution continues.
2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:47:56 ... execution continues.
2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:47:56 ... execution continues.
2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:47:56 ... execution continues.
2008.09.14 04:55:29 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:55:29 ... execution continues.
2008.09.14 04:55:51 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:55:51 ... execution continues.
2008.09.14 04:56:25 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:56:25 ... execution continues.
2008.09.14 04:57:57 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:57:57 ... execution continues.
2008.09.14 04:58:06 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:58:06 ... execution continues.
2008.09.14 04:59:11 MCCP-DEBUG: 'secure/login/login#3815' mccp started (86)
2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:59:14 ... execution continues.
2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:59:14 ... execution continues.
2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:59:14 ... execution continues.
2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:59:14 ... execution continues.
2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:59:14 ... execution continues.
2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec
ted 'string/array/object'.
2008.09.14 04:59:14 ... execution con


Nothing much in crash 1's debug log.


Here is the debug log from crash 2:
51') line 1209
' weapon_hit' in 'guilds/devil/objs/soulharvester.c' ('guilds/devil/objs/soulharvester#8151') line 266
' master_hit' in ' obj/monster.c' ('domains/areas/varrak/tree/monsters/pixie#9405') line 1359
' master_hit' in ' obj/living.c' ('domains/areas/varrak/tree/monsters/pixie#9405') line 1650
' CATCH' in ('domains/areas/varrak/tree/monsters/pixie#9405')
' store_damage' in 'obj/daemon/damage_d.c' (' obj/daemon/damage_d') line 81
2008.09.14 20:44:01 ... execution continues.
2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88
2008.09.14 20:44:30 Dump of the call chain:
No program to trace.


Here is the stdout from crash 2:
8fa410a: 18 2 clit (2: 3)
8fa410c: 46 / (3: 4)
8fa410d: 50 < (2: 3)
8fa410e: 107 10 branch_when_zero (1: 2)
8fa4110: 31 0 local (0: 1)
8fa4112: 18 2 clit (1: 2)
8fa4114: 46 / (2: 3)
8fa4115: 119 166 push_identifier_lvalue (1: 2)
8fa4117: 79 (void)+= (2: 3)
8fa4118: 106 branch (0: 1) line 211
8fa411f: 8 166 identifier (0: 1) line 213
8fa4121: 14 35000 number (1: 2)
8fa4126: 48 > (2: 3)
8fa4127: 107 branch_when_zero (1: 2)
8fa4131: 98 save_arg_frame (0: 1) line 216
8fa4132: 10 9 cstring0 (1: 2)
8fa4134: 16 const1 (2: 3)
8fa4135: 415 7 call_out (3: 4)
8fa4137: 15 const0 (1: 2)
8fa4138: 99 restore_arg_frame (2: 3)
8fa4139: 93 pop_value (1: 2)
8fa413a: 25 return0 (0: 1) line 217
secure/simul_efun secure/simul_efun.c line 773
8d80f6e: 97 770 clear_locals (0: 4) line 773
8d80f71: 31 1 local (0: 4) line 776
8d80f73: 107 112 branch_when_zero (1: 5)
8d80f75: 8 8 identifier (0: 4) line 777
8d80f77: 31 0 local (1: 5)
8d80f79: 124 1 push_local_variable_lvalue (2: 6)
8d80f7b: 38 --x (3: 7)
8d80f7c: 62 index (3: 7)
8d80f7d: 62 index (2: 6)
8d80f7e: 124 4 push_local_variable_lvalue (1: 5)
8d80f80: 41 = (2: 6)
8d80f81: 198 pointerp (1: 5)
8d80f82: 39 9 && (1: 5)
8d80f84: 98 save_arg_frame (0: 4)
8d80f85: 31 4 local (1: 5)
8d80f87: 15 const0 (2: 6)
8d80f88: 446 38 member (3: 7)
8d80f8a: 99 restore_arg_frame (2: 6)
8d80f8b: 15 const0 (1: 5)
8d80f8c: 49 >= (2: 6)
8d80f8d: 107 branch_when_zero (1: 5)
8d80fca: 31 4 local (0: 4) line 785
8d80fcc: 108 branch_when_non_zero (1: 5)
8d80fd8: 98 save_arg_frame (0: 4) line 787
8d80fd9: 10 236 cstring0 (1: 5)
8d80fdb: 16 const1 (2: 6)
8d80fdc: 31 0 local (3: 7)
8d80fde: 31 1 local (4: 8)
8d80fe0: 415 7 call_out (5: 9)
8d80fe2: 15 const0 (1: 5)
8d80fe3: 99 restore_arg_frame (2: 6)
8d80fe4: 93 pop_value (1: 5)
8d80fe5: 25 return0 (0: 4) line 789
domains/areas/movalia/rooms/people4 <lambda ?> line 0
8d59853: 98 save_arg_frame (0: -1) line 0
8d59854: 173 0 lambda_cconstant (1: 0)
8d59856: 172 previous_object0 (2: 1)
8d59857: 207 this_object (3: 2)
8d59858: 430 funcall (4: 3)
secure/master secure/master.c line 373
8d54e16: 31 0 local (0: 4) line 373
8d54e18: 107 38 branch_when_zero (1: 5)
8d54e1a: 97 258 clear_locals (0: 4) line 375
8d54e1d: 98 save_arg_frame (0: 4)
8d54e1e: 15 const0 (1: 5)
8d54e1f: 185 no_warn_deprecated (2: 6)
8d54e20: 22 61628 closure (2: 6) line 376
8d54e25: 31 1 local (3: 7)
8d54e27: 10 50 cstring0 (4: 8)
8d54e29: 16 const1 (5: 9)
8d54e2a: 167 4 aggregate (6: 10)
8d54e2d: 393 50 unbound_lambda (3: 7)
8d54e2f: 31 0 local (2: 6)
8d54e31: 413 5 bind_lambda (3: 7)
8d54e33: 99 restore_arg_frame (2: 6)
8d54e34: 124 2 push_local_variable_lvalue (1: 5)
8d54e36: 42 (void)= (2: 6)
8d54e37: 98 save_arg_frame (0: 4) line 377
8d54e38: 31 2 local (1: 5)
8d54e3a: 430 funcall (2: 6)
domains/areas/movalia/rooms/people4 <lambda ?> line 0
a8a004b: 98 save_arg_frame (0: 6) line 0
a8a004c: 173 0 lambda_cconstant (1: 7)
a8a004e: 173 1 lambda_cconstant (2: 8)
a8a0050: 16 const1 (3: 9)
a8a0051: 188 call_other (4: 10)
domains/areas/movalia/rooms/people4 domains/areas/movalia/rooms/people4.c line 6
a4dfa9a: 98 save_arg_frame (0: 10) line 6
a4dfa9b: 31 0 local (1: 11)
a4dfa9d: 112 call_inherited (2: 12)
domains/areas/movalia/rooms/people4 room/room.c line 160
8eedfae: 31 0 local (0: 12) line 160
8eedfb0: 107 1 branch_when_zero (1: 13)
8eedfb2: 25 return0 (0: 12)
domains/areas/movalia/rooms/people4 domains/areas/movalia/rooms/people4.c line 6
a4dfaa2: 99 restore_arg_frame (2: 12) line 6
a4dfaa3: 93 pop_value (1: 11)
a4dfaa4: 31 0 local (0: 10) line 7
a4dfaa6: 107 1 branch_when_zero (1: 11)
a4dfaa8: 25 return0 (0: 10)
domains/areas/movalia/rooms/people4 <lambda ?> line 0
a8a0052: 99 restore_arg_frame (2: 8) line 0
a8a0053: 24 return (1: 7)
secure/master secure/master.c line 377
8d54e3c: 99 restore_arg_frame (2: 6) line 377
8d54e3d: 93 pop_value (1: 5)
8d54e3e: 106 branch (0: 4) line 379
8d54e49: 31 1 local (0: 4) line 382
8d54e4b: 108 6401 branch_when_non_zero (1: 5)
8d54e4e: 98 save_arg_frame (0: 4) line 385
8d54e4f: 31 1 local (1: 5)
8d54e51: 10 51 cstring0 (2: 6)
8d54e53: 10 52 cstring0 (3: 7)
8d54e55: 226 15 time (4: 8)
8d54e57: 188 call_other (5: 9)
8d54e58: 99 restore_arg_frame (2: 6)
8d54e59: 93 pop_value (1: 5)
8d54e5a: 98 save_arg_frame (0: 4) line 386
8d54e5b: 31 1 local (1: 5)
8d54e5d: 10 53 cstring0 (2: 6)
8d54e5f: 10 54 cstring0 (3: 7)
8d54e61: 188 call_other (4: 8)
8d54e62: 99 restore_arg_frame (2: 6)
8d54e63: 40 5 || (1: 5)
8d54e65: 14 900 number (0: 4)
8d54e6a: 24 return (1: 5)
domains/areas/movalia/rooms/people4 <lambda ?> line 0
8d5985a: 99 restore_arg_frame (2: 1) line 0
8d5985b: 24 return (1: 0)
guilds/demoniser/rooms/ds2 <lambda ?> line 0
8d59853: 98 save_arg_frame (0: -1)
8d59854: 173 0 lambda_cconstant (1: 0)
8d59856: 172 previous_object0 (2: 1)
8d59857: 207 this_object (3: 2)
8d59858: 430 funcall (4: 3)
secure/master secure/master.c line 373
8d54e16: 31 0 local (0: 4) line 373
8d54e18: 107 38 branch_when_zero (1: 5)
8d54e1a: 97 258 clear_locals (0: 4) line 375
8d54e1d: 98 save_arg_frame (0: 4)
8d54e1e: 15 const0 (1: 5)
8d54e1f: 185 no_warn_deprecated (2: 6)
8d54e20: 22 61628 closure (2: 6) line 376
8d54e25: 31 1 local (3: 7)
8d54e27: 10 50 cstring0 (4: 8)
8d54e29: 16 const1 (5: 9)
8d54e2a: 167 4 aggregate (6: 10)
8d54e2d: 393 50 unbound_lambda (3: 7)
8d54e2f: 31 0 local (2: 6)
8d54e31: 413 5 bind_lambda (3: 7)
8d54e33: 99 restore_arg_frame (2: 6)
8d54e34: 124 2 push_local_variable_lvalue (1: 5)
8d54e36: 42 (void)= (2: 6)
8d54e37: 98 save_arg_frame (0: 4) line 377
8d54e38: 31 2 local (1: 5)
8d54e3a: 430 funcall (2: 6)
guilds/demoniser/rooms/ds2 <lambda ?> line 0
a8a004b: 98 save_arg_frame (0: 6) line 0
a8a004c: 173 0 lambda_cconstant (1: 7)
a8a004e: 173 1 lambda_cconstant (2: 8)
a8a0050: 16 const1 (3: 9)
a8a0051: 188 call_other (4: 10)
guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/ds2.c line 20
10e5c096: 8 23 identifier (0: 10) line 20
10e5c098: 108 branch_when_non_zero (1: 11)
10e5c0a6: 98 save_arg_frame (0: 10) line 25
10e5c0a7: 31 0 local (1: 11)
10e5c0a9: 112 call_inherited (2: 12)
guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/shadowroom.c line 15
110077ea: 98 save_arg_frame (0: 12) line 15
110077eb: 31 0 local (1: 13)
110077ed: 112 call_inherited (2: 14)
guilds/demoniser/rooms/ds2 room/room.c line 160
8eedfae: 31 0 local (0: 14) line 160
8eedfb0: 107 1 branch_when_zero (1: 15)
8eedfb2: 25 return0 (0: 14)
guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/shadowroom.c line 15
110077f2: 99 restore_arg_frame (2: 14) line 15
110077f3: 93 pop_value (1: 13)
110077f4: 31 0 local (0: 12) line 16
110077f6: 107 1 branch_when_zero (1: 13)
110077f8: 25 return0 (0: 12)
guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/ds2.c line 25
10e5c0ae: 99 restore_arg_frame (2: 12) line 25
10e5c0af: 93 pop_value (1: 11)
10e5c0b0: 31 0 local (0: 10) line 26
10e5c0b2: 107 1 branch_when_zero (1: 11)
10e5c0b4: 25 return0 (0: 10)
guilds/demoniser/rooms/ds2 <lambda ?> line 0
a8a0052: 99 restore_arg_frame (2: 8) line 0
a8a0053: 24 return (1: 7)
secure/master secure/master.c line 377
8d54e3c: 99 restore_arg_frame (2: 6) line 377
8d54e3d: 93 pop_value (1: 5)
8d54e3e: 106 branch (0: 4) line 379
8d54e49: 31 1 local (0: 4) line 382
8d54e4b: 108 6401 branch_when_non_zero (1: 5)
8d54e4e: 98 save_arg_frame (0: 4) line 385
8d54e4f: 31 1 local (1: 5)
8d54e51: 10 51 cstring0 (2: 6)
8d54e53: 10 52 cstring0 (3: 7)
8d54e55: 226 15 time (4: 8)
8d54e57: 188 call_other (5: 9)
8d54e58: 99 restore_arg_frame (2: 6)
8d54e59: 93 pop_value (1: 5)
8d54e5a: 98 save_arg_frame (0: 4) line 386
8d54e5b: 31 1 local (1: 5)
8d54e5d: 10 53 cstring0 (2: 6)
8d54e5f: 10 54 cstring0 (3: 7)
8d54e61: 188 call_other (4: 8)
8d54e62: 99 restore_arg_frame (2: 6)
8d54e63: 40 5 || (1: 5)
8d54e65: 14 900 number (0: 4)
8d54e6a: 24 return (1: 5)
guilds/demoniser/rooms/ds2 <lambda ?> line 0
8d5985a: 99 restore_arg_frame (2: 1) line 0
8d5985b: 24 return (1: 0)
a6b7071: 124 2 42 98 208 10 48 98
No program to trace.
2008.09.14 20:44:30 LDMud aborting on fatal error.

and finally the stderr from crash 2:
2008.09.14 05:10:01 [xerq] XERQ Aug 15 2006: Path 'erq', debuglevel 0
2008.09.14 05:10:01 [xerq] Demon started
2008.09.14 05:10:04 Failed to load file: 'players/parisboy/tokyo/shoquest'.
2008.09.14 05:10:04 Failed to load file: 'players/tiberius/quest_object'.
2008.09.14 06:26:17 write socket: wrote 417, should be 1024.
2008.09.14 06:32:47 Failed to load file: 'players/rapier/objects/cball'.
2008.09.14 07:35:21 write socket: wrote 79, should be 681.
2008.09.14 07:44:09 Failed to load file: 'players/undertaker/items/bluegem'.
2008.09.14 07:44:09 Failed to load file: 'players/undertaker/items/bluegem'.
2008.09.14 08:05:38 write socket: wrote 418, should be 1024.
2008.09.14 08:10:42 write socket: wrote 417, should be 1024.
2008.09.14 08:14:11 Failed to load file: 'players/undertaker/items/bluegem'.
2008.09.14 08:17:48 write socket: wrote 416, should be 1024.
2008.09.14 09:08:44 write socket: wrote 275, should be 579.
2008.09.14 09:09:20 write socket: wrote 416, should be 1024.
2008.09.14 16:08:20 obj/living/soul.c line 1552: syntax error before ' ob = find'.
2008.09.14 16:08:20 obj/living/soul.c line 1553: Bad assignment: illegal lhs (target) before end of line.
2008.09.14 16:08:20 Error in loading object: 'obj/living/soul'.
2008.09.14 20:35:50 guilds/demoniser/rooms/quest/indoorquestroom.c line 51: Warning: casting a value to its own type: int before ' 2)'.
2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88
[xerq] read: Success
2008.09.14 20:44:31 [xerq] Demon exiting.
Additional Information> Another crash, this time with different log data.
Ok, then right at the beginning: Did you get a core dump? ;-)

> Ldmud debug log (with some mud paths cut out):
> 2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for
> slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88
> 2008.09.14 20:44:30 Dump of the call chain:
> No program to trace.
Ok, there are several possibilities here I think. One is that the
allocator tried to free a slab, thats memory was (at least partially)
corrupted/over-written by someone else. Each slab has a magic value at
the beginning to detect exactly that kind of problem. Also the size
information of the block was corrupted as well. Another one would be,
that someone called xfree() for a pointer which doesn't point to the
beginning of a memory block previously allocated by xalloc().
The memory allocator then calls fatal() which usually dumps the LPC
stack trace, but apparantly there was none to print, which suggests
that the driver was in the backend cycle and not executing some LPC
program, I guess.
fatal() also call dump_core() which takes care of dumping the core if
allowed.

OK, these are the last LPC instructions that were executed before the
crash, but unfortunately they don't have to be related as nobody knows
so far, when these memory block was corrupted. Seems that some lambda
from guilds/demoniser/rooms/ds2 was last executed, you may have a look
at that, but I don't really expect that you find something.

As one possible cause for the crash is some memory corruption, I
advise you to enable --enable-malloc-trace and
--enable-malloc-lpc-trace. That will consume some additional memory
but may give some more hints.
But I don't see a realistic chance of solving the issue without a core
dump, best one written by a driver without any optimization.

> I'm starting to think my machine has some bad hardware.. maybe memory?
That would be a possibility as well, yes. Besides a genuine bug in the
driver. ;-)

I really think you should file a bug at the bug tracker and attach the
important part of your log files, your config.h and maybe core dumps,
executable, and additinonal information about your system
(architecture, OS) there. I guessed the memory allocator in use, but
that doesn't have to be correct.



Lars Duening schrieb:
> I don't think this is a case of bad memory - the values are too meaningful.
[...]
> Putting this all together, I think we have classic case of invalid
> memory access here: somehow the control field before the block got
> decremented by 2. When the allocator calculated the address of the slab,
> it didn't get the actual slab header, but instead an address 2 words
> into the slab header. This the slab->prev pointer (pointing to 0xef6e88)
> was mistaken as the magic word, and slab->next (being 0) was mistaken as
> the slab's size.
TagsNo tags attached.

Activities

zesstra

2008-09-17 18:09

administrator   ~0000786

Complete Mail from Lars just FTR:

> 2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88

I don't think this is a case of bad memory - the values are too meaningful.

Taken literally, this message means that a block was freed in a slab not suited for its size (e.g. a 4-Byte block in a slab for 8-Byte blocks). But the found magic value is peculiar - it's a value which looks like a valid memory address (the slab is at 0xe4f5600, the block at 0xe4f6be8, the magic value is 0xe4f6e88).

The size (which is the block's size, but read from the slab header) is 4294967280 or 0xFFFFFFF0. This is not the exact value listed in the slab, but essentially (slab->size-4) * 4; with this formula the size listed in the slab must have been 0.

The allocator finds the slab for a given block by taking the control word before the block and extracting the offset to the slab start from it, with the offset given in word_t's.

Putting this all together, I think we have classic case of invalid memory access here: somehow the control field before the block got decremented by 2. When the allocator calculated the address of the slab, it didn't get the actual slab header, but instead an address 2 words into the slab header. This the slab->prev pointer (pointing to 0xef6e88) was mistaken as the magic word, and slab->next (being 0) was mistaken as the slab's size.

However, to further debug this problem, you need at minimum a good coredump; and having MALLOC_TRACE and MALLOC_LPC_TRACE enable wouldn't hurt either.

zesstra

2008-09-21 13:24

administrator   ~0000788

I am setting this to 'feedback' state until William gets a core dump or other additional information usable for tracking this down. Seems we have to wait until then.

Coogan

2009-10-28 19:38

reporter   ~0001569

I doubt that you'll receive a core dump here, after this long time...

wedsall

2009-10-28 19:58

reporter   ~0001570

Sorry for the long delay. I think we talked offline at some point..

I replaced the server with better newer hardware and this resolved the problem.

I believe the issue was bad server memory however it was not reporting as bad memory with memtest.

At least, something was wrong with the old server. Memory, motherboard, etc.

zesstra

2009-10-29 04:21

administrator   ~0001571

Like Lars I doubt that it was bad memory (alone?). As Lars said, the values were too meaningful.
But since we can't proceed without a core dump and this problem did not occur again for some reason, I close this as 'unable to reproduce'. If it surfaces again, please tell me to re-open.

Issue History

Date Modified Username Field Change
2008-09-17 17:01 wedsall New Issue
2008-09-17 18:09 zesstra Note Added: 0000786
2008-09-21 13:24 zesstra Note Added: 0000788
2008-09-21 13:24 zesstra Status new => feedback
2009-10-28 19:38 Coogan Note Added: 0001569
2009-10-28 19:58 wedsall Note Added: 0001570
2009-10-29 04:21 zesstra Note Added: 0001571
2009-10-29 04:21 zesstra Status feedback => closed
2009-10-29 04:21 zesstra Resolution open => unable to reproduce