View Issue Details

IDProjectCategoryView StatusLast Update
0000677LDMud 3.2Runtimepublic2009-10-06 03:32
ReporterCoogan Assigned Tozesstra  
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
Product Version3.2.16 
Summary0000677: starting an SSL-secured connection crashes the driver
DescriptionRunning on Debian stable and compiled with SSL-support (openssl), starting an SSL-connection on the mud-SSL-port crashes the driver, dumping a core file.

coogan@mud:/home/tubmud/mudlib$ telnet-ssl -z ssl mud.tubmud.de 7682
Trying 85.214.44.4...
SSL_connect: Success
coogan@mud:/home/tubmud/mudlib$

LDMUD-3.2.14 works fine. The handshake code looks a bit different to 3.2.14, so I think this is a regression.
Additional Informationtubmud@mud:~/mudlib$ gdb ../mudbin/driver-3.2.16 core.3917
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...

warning: core file may not match specified executable file.
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done.
Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8
Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `/home/tubmud/mudbin/driver -Mkernel/master -DTUBMUD=1 -D__START_TIME__="_S_T_20'.
Program terminated with signal 11, Segmentation fault.
[New process 3917]
#0 0x00a4782c in sk_value () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
(gdb) bt
#0 0x00a4782c in sk_value () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
0000001 0x00fa08b8 in ssl3_accept () from /usr/lib/i686/cmov/libssl.so.0.9.8
0000002 0x00fb8d7a in SSL_accept () from /usr/lib/i686/cmov/libssl.so.0.9.8
0000003 0x00faae70 in ssl23_get_client_hello ()
   from /usr/lib/i686/cmov/libssl.so.0.9.8
0000004 0x00fab768 in ssl23_accept () from /usr/lib/i686/cmov/libssl.so.0.9.8
0000005 0x00fb8713 in SSL_do_handshake () from /usr/lib/i686/cmov/libssl.so.0.9.8
0000006 0x080bf38d in tls_continue_handshake (ip=0xa8ad054) at pkg-tls.c:774
0000007 0x08063977 in get_message (buff=0xbfcd3f38 "") at comm.c:2896
0000008 0x08054f81 in backend () at backend.c:537
0000009 0x080a3805 in main (argc=14, argv=0xbfcd58c4) at main.c:517
(gdb)
TagsNo tags attached.

Relationships

duplicate of 0000583 resolvedGnomi LDMud 3.3 TLS: context->client_CA free()'d while still in use 
related to 0000687 new LDMud RfC: Change defaults concerning MALLOC_SBRK / SBRK_OK 

Activities

zesstra

2009-09-19 15:55

administrator   ~0001279

Just a spontaneous shot in the dark: something like this (segfaults in used libraries) may happen, if our memory allocator does not replace the system allocator reliably, openssl uses (partially) the system allocator and the two interfere. It seems unlikely in this particular case, but it is not much effort: You may just try --enable-malloc-sbrk=no and see if that changes anything.

zesstra

2009-10-03 19:23

administrator   ~0001444

I just saw that there are references in the net to CRYPTO_malloc_init(), CRYPTO_set_mem_functions(), CRYPTO_set_mem_ex_functions(), CRYPTO_set_locked_mem_functions(), CRYPTO_set_locked_mem_ex_functions(). So there is a way to tell openssl to use a custom memory allocator. Which we use but don't tell openssl. Unfortunately - and quite typical for openssl - documentation seems to be missing. Or better: use the source, luke... *argl*

zesstra

2009-10-04 05:22

administrator   ~0001445

I had a look at Coogans Coredump. Relevant stacktrace with data:
(gdb) bt full
#0 sk_value (st=0x84e5060, i=4) at stack.c:310
No locals.
0000001 0x00e54888 in ssl3_accept (s=0x84e5060) at s3_srvr.c:393
    buf = <value optimized out>
    l = <value optimized out>
    Time = 1254609144
    cb = (void (*)(const SSL *, int, int)) 0
    num1 = <value optimized out>
    ret = <value optimized out>
    state = 8544
    skip = <value optimized out>
0000002 0x00e6cd4a in SSL_accept (s=0x84e5060) at ssl_lib.c:850
No locals.
0000003 0x00e5ee40 in ssl23_get_client_hello (s=0x84e5060) at s23_srvr.c:568
    buf_space = "\200d\001\003\001\000K\000\000\000\020"
    p = (unsigned char *) 0x84e9b91 ""
    d = (unsigned char *) 0x84da0cd ""
    i = <value optimized out>
    csl = 75
    cl = <value optimized out>
    n = 100
    j = <value optimized out>
    type = 2
0000004 0x00e5f738 in ssl23_accept (s=0x84e5060) at s23_srvr.c:203
    buf = (BUF_MEM *) 0x84d7ae8
    Time = 1254609144
    cb = (void (*)(const SSL *, int, int)) 0
    ret = <value optimized out>
    state = 139350112
0000005 0x00e6c6e3 in SSL_do_handshake (s=0x84e5060) at ssl_lib.c:2085
    ret = 0
0000005 0x00e6c6e3 in SSL_do_handshake (s=0x84e5060) at ssl_lib.c:2085
    ret = 0
0000006 0x080dd7df in tls_continue_handshake (ip=0x84d9400) at pkg-tls.c:774
    n = 139350112
    ret = 1
0000007 0x080ddd38 in f_tls_init_connection (sp=0x8124e38, num_arg=3) at pkg-tls.c:1012
    session = (SSL *) 0x84e5060
    argp = (svalue_t *) 0x8124e38
    ret = 0
    obj = (object_t *) 0x84b3f84
    ip = (interactive_t *) 0x84d9400


char *sk_value(const STACK *st, int i)
{
  if(!st || (i < 0) || (i >= st->num)) return NULL;
  return st->data[i];
}
It crashes upon st->data[i].
st is:
$3 = {num = 769, data = 0x2000, sorted = 15213344, num_alloc = 139295540,
  comp = 0x84d7be8}
Clearly, 0x2000 is not mapped into the process memory, and the values for num_alloc and sorted look rather strange, num probably as well. The address 0x84d7be8 should be a function pointer to an openssl function. But as far as I can see, they are mapped somewhere around 0x00e.... and 0x84d7be8 is part of the heap.
Note that the given address for the st is 0x84e5060 which is the address of the SSL context/session. And that is:
(gdb) print *s
$1 = {version = 769, type = 8192, method = 0xe82320, rbio = 0x84d7b34, wbio = 0x84d7be8, bbio = 0x84d7be8, rwstate = 1, in_handshake = 2, ...

line 393 in s3_srvr.c is
  ret=ssl3_send_certificate_request(s);
That one actually iterates over a stack in the session and calls sk_value (although it is called differently, these openssl people love playing tricks with defines). But strange enough, I would expect ssl3_send_certificate_request(s) to be in the stacktrace...

The stack is extracted from the SSL context by SSL_get_client_CA_list(). That one should return s->ctx->client_CA which is:
(gdb) print s->ctx->client_CA
$4 = (STACK *) 0x832fc58
(gdb) print *(s->ctx->client_CA)
$5 = {num = 1, data = 0x0, sorted = 0, num_alloc = 1886220131, comp = 0x8007461}
That would make more sense, but it is obviously not the address given to sk_value().

So... I don't know what openssl exactly does here... BTW: I assume, you use 0.9.8g-15+lenny5?

zesstra

2009-10-04 13:01

administrator   ~0001448

I checked this afternoon with tubmuds settings and tubmuds public mudlib on MacOS 10.6 and Debian Lenny.
Debian Lenny: crash as Coogan described (openssl-0.9.8g-15+lenny5)
MacOS x86_64: driver not working at all
MacOS i386: no crash, TLS works fine. (openssl-0.9.8)

zesstra

2009-10-04 15:03

administrator   ~0001449

The stack used by the sk_value() which segfaults is tls_session->ctx->client_CA, which is als context->client_CA in pkg-tls.c, created during tls_global_init() by SSL_CTX_new().
After creation it is
(gdb) print *context->client_CA
$20 = {num = 0, data = 0x931abf8, sorted = 0, num_alloc = 4, comp = 0}

When we get a new ssl session in f_tls_init_connection(), that context contains already the wrong data:
   SSL * session = SSL_new(context); // line 978
(gdb) print *session->ctx->client_CA
$13 = {num = 30, data = 0x20003, sorted = 170242664, num_alloc = 2, comp = 0x28}

So somewhere between tls_global_init() and f_tls_init_connection this context gets somehow changed. I tried to find out more by using gdb's watchpoints, but that seems to slow it down far to much. :-(

zesstra

2009-10-04 16:21

administrator   ~0001450

Mhmm:

Hardware watchpoint 9: ((struct stack_st *) 154250276)->num
Old value = 150
New value = 978082211
0xb7db5032 in BN_lshift (r=0x9322a78, a=0x9322a78, n=320) at bn_shift.c:151
1: *(struct stack_st *) 154250276 = {num = 978082211, data = 0x35b5d4c7,
  sorted = -557861449, num_alloc = 1591931666, comp = 0}
(gdb) list
146 f=a->d;
147 t=r->d;
148 t[a->top+nw]=0;
149 if (lb == 0)
150 for (i=a->top-1; i>=0; i--)
151 t[nw+i]=f[i];
152 else
153 for (i=a->top-1; i>=0; i--)
154 {
155 l=f[i];
(gdb) bt
#0 0xb7db5032 in BN_lshift (r=0x9322a78, a=0x9322a78, n=320) at bn_shift.c:151
0000001 0xb7dd5dc8 in DSA_generate_parameters_ex (ret=0x931a1cc, bits=1024, seed_in=0x0,
    seed_len=<value optimized out>, counter_ret=0x0, h_ret=0x0, cb=0xbf9685f4)
    at dsa_gen.c:236
0000002 0xb7dd7b43 in DSA_generate_parameters (bits=1024,
    seed_in=0xbf968644 ";-) :-( :-) :-( $\223\021\b$\223\021\b9\223\021\b$\223\021\bN\223\021\b\003", seed_len=20, counter_ret=0x0, h_ret=0x0, callback=0, cb_arg=0x0)
    at dsa_depr.c:99
0000003 0x080df54b in set_dhe1024 () at pkg-tls.c:130
0000004 0x080dfcec in tls_global_init () at pkg-tls.c:500
0000005 0x080b6a62 in main (argc=9, argv=0xbf9699d4) at main.c:390

(gdb) print *a
$39 = {d = 0x931abf8, top = 5, dmax = 16, neg = 0, flags = 0}


And then another:
Hardware watchpoint 9: ((struct stack_st *) 154250276)->num
Old value = 978082211
New value = 0
0x0804f4fa in _allocate_array (n=6, file=0x8107020 "interpret.c::allocate_uninit_array",
    line=12370) at array.c:271
271 *svp++ = const0;
1: *(struct stack_st *) 154250276 = {num = 0, data = 0x35b5d4c7, sorted = -557861449,
  num_alloc = 1591931666, comp = 0x931abdc}
(gdb) bt
#0 0x0804f4fa in _allocate_array (n=6,
    file=0x8107020 "interpret.c::allocate_uninit_array", line=12370) at array.c:271
0000001 0x080948a0 in eval_instruction (first_instruction=0x94618cf "R\001\001\033",
    initial_sp=0x8129a50) at interpret.c:12370
0000002 0x080a8cbf in apply_low (fun=0x92aa750 "inaugurate_master", ob=0x932afec, num_arg=1,
    b_ign_prot=1) at interpret.c:21684
0000003 0x080a8f15 in sapply_int (fun=0x92aa750 "inaugurate_master", ob=0x932afec,
    num_arg=1, b_find_static=1) at interpret.c:21796
0000004 0x080a94bd in apply_master_ob (fun=0x92aa750 "inaugurate_master", num_arg=1,
    external=1) at interpret.c:22084
0000005 0x080b6e12 in main (argc=9, argv=0xbf9699d4) at main.c:483

And then it goes on and on and on with changing st->num, before finally transfer_svalue_no_free_spc() writes the 'final' data into it.

So somehow that memory range gets just used twice, because openssl gets it from a different allocator than our own. That is confirmed by undefining SBRK_OK in machine.h before make.

Sooooo.... Workaround for Coogan: disable SBRK_OK. Unfortunately, in 3.2 that does not work with --enable-malloc-sbrk=no which I mentioned earlier, because that does not exist anymore. Und obviously, our detection whether it is safe to use SBRK_OK and replace the system allocator is again not working.

I guess, the more libraries we optionally use, the more severe our problems with sbrk and replacing the system allocator get. I suggest to change our defaults in 3.3. and 3.5 (but that is a different issue).

2009-10-04 16:36

 

0001-Register-our-own-allocator-functions-with-openssl.patch (1,724 bytes)   
From c03f18dbcbd52da259f14c1f77879573b5f56cec Mon Sep 17 00:00:00 2001
From: zesstra <zesstra@zesstra.de>
Date: Sun, 4 Oct 2009 22:30:39 +0200
Subject: [PATCH] Register our own allocator functions with openssl.

Use CRYPTO_set_mem_functions() to register pointers to our own allocator
in openssl. Otherwise, openssl may use the system allocator and memory
get used twice. (#677)
(pkg-tls.c)
---
 src/pkg-tls.c |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/src/pkg-tls.c b/src/pkg-tls.c
index 543152b..3c9b446 100644
--- a/src/pkg-tls.c
+++ b/src/pkg-tls.c
@@ -257,6 +257,34 @@ tls_verify_callback(int preverify_ok, X509_STORE_CTX *ctx)
 
 /*-------------------------------------------------------------------------*/
 #ifdef HAS_OPENSSL
+
+static void *
+tls_pkg_malloc (size_t size)
+/*
+ * Wrapper function for using our own allocator in openssl.
+ */
+{
+    return pxalloc(size);
+}
+
+static void
+tls_pkg_free (void * ptr)
+/*
+ * Wrapper function for using our own allocator in openssl.
+ */
+{
+    pfree(ptr);
+}
+
+static void *
+tls_pkg_realloc (void * ptr, size_t size)
+/*
+ * Wrapper function for using our own allocator in openssl.
+ */
+{
+    return prexalloc(ptr, size);
+}
+
 void
 tls_verify_init (void)
   
@@ -415,6 +443,9 @@ tls_global_init (void)
     debug_message("%s TLS: (OpenSSL) Keyfile '%s', Certfile '%s'\n"
                  , time_stamp(), keyfile, certfile);
 
+    // first register our own allocator function before calling any Openssl function.
+    CRYPTO_set_mem_functions(tls_pkg_malloc, tls_pkg_realloc, tls_pkg_free);
+
     SSL_load_error_strings();
     ERR_load_BIO_strings();
     if (!SSL_library_init())
-- 
1.6.4.3

zesstra

2009-10-04 16:41

administrator   ~0001451

Coogan: please apply the attached patch and try on your system. I believe it will solve the immediate problem. At least it does on my test system.


BTW: Even the OpenSSL developers are sceptical, that all memory allocation and deallocation in openssl takes place by using the functions supplied by the user with CRYPTO_set_mem_functions(). (The function is even undocumented.) So I believe, there may be more issues like this.
The only proper and robust fix would be
a) not to use SBRK_OK and replace the memory allocators or
b) use mmap() for allocating memory.

b) is only possibly in 3.3 and 3.5 and a) increases memory allocator overhead because our allocator then uses the system allocator for getting memory. It is also not that easy, because it can't be configured by configure in 3.2 (but in 3.3 and 3.5).

zesstra

2009-10-05 16:27

administrator   ~0001469

OK. For some reason, my idea yesterday was just plain wrong. Don't know why my test didn't crash, maybe I used the wrong binary. :-(

So, next gdb session today:
The client_CA stack is at 0x94594b4. I checked now all allocations and deallocations and finally found:

Breakpoint 18, sfree (ptr=0x94594b4) at smalloc.c:1147
1147 if (bsize > SMALL_BLOCK_MAX)
(gdb) print context->client_CA
$27 = (STACK *) 0x94594b4
(gdb) bt
#0 sfree (ptr=0x94594b4) at smalloc.c:1147
0000001 0x080f73d4 in xfree (ptr=0x94594b4) at smalloc.c:1245
0000002 0x080f88cd in pfree (p=0x94594b4) at smalloc.c:2891
0000003 0x080f89e2 in afree (p=0x94594b4) at smalloc.c:2987
0000004 0x080f988f in free (ptr=0x94594b4) at smalloc.c:3441
0000005 0xb7dc1cca in CRYPTO_free (str=0x94594b4) at mem.c:378
0000006 0xb7e2d925 in sk_free (st=0x94594b4) at stack.c:298
0000007 0xb7e2d979 in sk_pop_free (st=0x94594b4, func=0xb7e48880 <X509_NAME_free>)
    at stack.c:291
0000008 0xb7f01006 in SSL_CTX_set_client_CA_list (ctx=0x944febc, name_list=0x94594b4)
    at ssl_cert.c:549
0000009 0x080dd216 in tls_verify_init () at pkg-tls.c:395
0000010 0x080dd4c7 in tls_global_init () at pkg-tls.c:498
0000011 0x080b47ce in main (argc=9, argv=0xbf8a8914) at main.c:390

Ok... OpenSSL deliberately frees that stack.

static void set_client_CA_list(STACK_OF(X509_NAME) **ca_list,STACK_OF(X509_NAME) *name_list)
{
  if (*ca_list != NULL)
      sk_X509_NAME_pop_free(*ca_list,X509_NAME_free);
  *ca_list=name_list;
}
Yeah, it frees the old content before storing the new pointer.

Now the problem:
We do in tls_verify_init():
    stack = SSL_CTX_get_client_CA_list(context);
    if (trustdirectory != NULL) {
    SSL_add_dir_cert_subjects_to_stack(stack, trustdirectory);
    }
    if (stack != NULL) {
    SSL_CTX_set_client_CA_list(context, stack);
    }

So we set the old stack again. Now openssl frees the stack and stores the pointer to the stack it just free'd. No wonder that that stack is overwritten all the time. ;-)

zesstra

2009-10-05 17:01

administrator   ~0001470

I duplicated the fix for this issue in 3.3.x. This should now be fixed in r2755.

Coogan

2009-10-06 03:10

reporter   ~0001484

Please be aware, that this is a report against LDMud-3.2. A fix for 3.5 and/or 3.3 is nice but does not help me here.
As this seems to be a regression since 3.2.14, it should also get fixed in the 3.2 trunk.

Gnomi

2009-10-06 03:32

manager   ~0001485

The last note was a little ambiguous. Zesstra took a fix for 3.3 and applied it to 3.2. So this bug is now fixed in the 3.2 trunk.

Issue History

Date Modified Username Field Change
2009-09-19 15:06 Coogan New Issue
2009-09-19 15:55 zesstra Note Added: 0001279
2009-10-03 19:23 zesstra Note Added: 0001444
2009-10-04 05:22 zesstra Note Added: 0001445
2009-10-04 13:01 zesstra Note Added: 0001448
2009-10-04 13:01 zesstra Status new => confirmed
2009-10-04 15:03 zesstra Note Added: 0001449
2009-10-04 16:21 zesstra Note Added: 0001450
2009-10-04 16:36 zesstra File Added: 0001-Register-our-own-allocator-functions-with-openssl.patch
2009-10-04 16:41 zesstra Note Added: 0001451
2009-10-04 16:43 zesstra Status confirmed => feedback
2009-10-04 18:02 zesstra Relationship added related to 0000687
2009-10-05 16:27 zesstra Note Added: 0001469
2009-10-05 16:27 zesstra Assigned To => zesstra
2009-10-05 16:27 zesstra Status feedback => assigned
2009-10-05 16:36 zesstra Relationship added duplicate of 0000583
2009-10-05 17:01 zesstra Note Added: 0001470
2009-10-05 17:01 zesstra Status assigned => resolved
2009-10-05 17:01 zesstra Resolution open => fixed
2009-10-06 03:10 Coogan Note Added: 0001484
2009-10-06 03:10 Coogan Status resolved => feedback
2009-10-06 03:10 Coogan Resolution fixed => reopened
2009-10-06 03:32 Gnomi Note Added: 0001485
2009-10-06 03:32 Gnomi Status feedback => resolved
2009-10-06 03:32 Gnomi Resolution reopened => fixed