View Issue Details

IDProjectCategoryView StatusLast Update
0000607LDMud 3.3Runtimepublic2011-02-23 23:22
Reporterpeng Assigned Tozesstra  
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Product Version3.3.718 
Target Version3.3.720Fixed in Version3.3.720 
Summary0000607: check_alarm triggers because alarm is off
DescriptionIn FinalFrontier the driver has ramdomly problems with the alarm call. approx. 3-5 times a day the alarm is off and the check_alarm-fun in backend.c has to restart it. This causes 16 sec. without HB in the mud, which disturbes the players while fighting and such.
Additional InformationIn a driver-version which replaced the alarm()-calls with setitimer(), the prob doesn't occurred any longer.
Additional debug in check_alarm() shows, that the alarm is of and not just delayed.
TagsNo tags attached.

Activities

2009-02-13 11:08

 

setitimer-backend.c.patch (2,602 bytes)   
Index: backend.c
===================================================================
--- backend.c   (revision 2486)
+++ backend.c   (working copy)
@@ -423,6 +423,12 @@
          * cleanup doesn't always remove enough destructed objects.
          */
 
+    struct itimerval timer_value;
+    timer_value.it_interval.tv_sec = alarm_time;
+    timer_value.it_interval.tv_usec = 0;
+    timer_value.it_value.tv_sec = alarm_time;
+    timer_value.it_value.tv_usec = 0;
+    
     /*
      * Set up.
      */
@@ -438,7 +444,11 @@
         current_time = get_current_time();
         comm_time_to_call_heart_beat = MY_FALSE;
         time_to_call_heart_beat = MY_FALSE;
-        alarm(alarm_time);
+//        alarm(alarm_time);
+        if(setitimer( ITIMER_REAL, &timer_value, NULL )) {
+          fatal("Could not initialize the timer, errno %d.\n",
+                errno);
+        }                      
     }
 
     printf("%s LDMud ready for users.\n", time_stamp());
@@ -709,7 +719,7 @@
             /* Start the next alarm */
             comm_time_to_call_heart_beat = MY_FALSE;
             time_to_call_heart_beat = MY_FALSE;
-            alarm(alarm_time);
+            //alarm(alarm_time);
 
             /* Do the timed events */
            if (!synch_heart_beats
@@ -793,7 +803,12 @@
 {
     static mp_int last_alarm_time = 0;
     mp_int curtime = get_current_time();
+    struct itimerval timer_value;
 
+    timer_value.it_interval.tv_sec = alarm_time;
+    timer_value.it_interval.tv_usec = 0;
+    timer_value.it_value.tv_usec = 0;
+
     if (t_flag)  /* Timing turned off? */
         return;
 
@@ -811,11 +826,18 @@
                       "- restarting it.\n",
                       time_stamp(), curtime - last_alarm_time);
 
-        alarm(0); /* stop alarm in case it is still alive, but just slow */
+        //alarm(0); /* stop alarm in case it is still alive, but just slow */
+        timer_value.it_value.tv_sec = 0;
+        setitimer( ITIMER_REAL, &timer_value, NULL );
         comm_time_to_call_heart_beat = MY_TRUE;
         time_to_call_heart_beat = MY_TRUE;
         (void)signal(SIGALRM, (RETSIGTYPE(*)(int))catch_alarm);
-        alarm(alarm_time);
+        //alarm(alarm_time);
+        timer_value.it_value.tv_sec = alarm_time;
+        if(setitimer( ITIMER_REAL, &timer_value, NULL )) {
+          fatal("Could not initialize the timer, errno %d.\n",
+                errno);
+        }                      
 
         last_alarm_time = curtime; /* Since we just restarted it */
     }
setitimer-backend.c.patch (2,602 bytes)   

peng

2009-02-13 11:11

reporter   ~0000950

This is the before mentioned patch for setitimer. It just replaces the alarm-call in the startup with setitimer with auto-restart, so this is basically the same as restarting the alarm in catch_alarm instead of the way down in the backend cycle. The changes in check_alarm are straight forward, but check_alarm never triggered with this version.

zesstra

2009-02-13 17:17

administrator   ~0000952

Exchanging alarm() by setitimer() seems to me a reasonable idea, because it is simpler (at least in theory). But we should nevertheless find out, whats going wrong there.
So far nobody else reported such problems, but you can reproduce it on two different machines (as far as I understood). So we may try to find similarities between them which are otherwise rare. If you can think of anything which is common for your 2 systems or sets them apart from other systems, that would be great of course... ;-)
Could you give us information about hard- and software of the 2 machines? e.g. platform, versions of kernel, glibc, other libaries the driver uses, maybe 'uname -a', driver settings (config.h, machine.h). Is there a (basic) public variant of the mudlib you use which we may use? Or if not, can you test a different mudlib? ;-)

Other than that it seems we would have to scatter a bunch of debug messages with the current status of alarm() all over the code in comm.c/backend.c or does anyone think of a simpler approach?

zesstra

2009-05-26 05:48

administrator   ~0001160

Gnomi and me just discussed, that we both don't like to change code in response to a problem if we don't know the root cause of the problem. And because we have no information which might help us to reproduce the issue and we did not experience anything like this, I am inclined to close this as 'unable to reproduce'. If you have any further information which might help, please provide them.

zesstra

2010-02-19 16:45

administrator   ~0001743

I reworked the signal handling between r2872 - r2882 (3.5) and among other things I switched the alarm timer to ITIMER_REAL.
However, this series of changes is in 3.5. for the time being, because it is in sum a larger change and I would not apply it to 3.3.x without a longer test phase.

zesstra

2010-03-14 18:25

administrator   ~0001802

Just taking this for the time being - until I am confident enough for porting this to 3.3.x

zesstra

2010-07-13 18:04

administrator   ~0001876

I backported my patches for migration to sigaction() + setitimer() and the better sychronization between host and mud-internal time from 3.5 in r2923-r2926.
It is a pity that we could not reproduce the issue here, but I hope that will solve it then and sigaction() and setitimer() are anyway more convenient. Please report back if not.

Issue History

Date Modified Username Field Change
2009-02-12 17:33 peng New Issue
2009-02-13 11:08 peng File Added: setitimer-backend.c.patch
2009-02-13 11:11 peng Note Added: 0000950
2009-02-13 17:17 zesstra Note Added: 0000952
2009-04-09 10:32 zesstra Status new => feedback
2009-05-26 05:48 zesstra Note Added: 0001160
2010-02-19 16:45 zesstra Note Added: 0001743
2010-03-14 18:25 zesstra Note Added: 0001802
2010-03-14 18:25 zesstra Assigned To => zesstra
2010-03-14 18:25 zesstra Status feedback => assigned
2010-07-13 18:04 zesstra Note Added: 0001876
2010-07-13 18:04 zesstra Status assigned => resolved
2010-07-13 18:04 zesstra Fixed in Version => 3.3.720
2010-07-13 18:04 zesstra Resolution open => fixed
2011-02-23 23:22 zesstra Target Version => 3.3.720