First off I know I am being very general in my description
I have a 1.4 box and we have been having random asterisk restarts as far back as my logs go.
In digging thru the forums I see a common thread in Eemans posts about the evils of chan_local
I found a thread that said dont use ext based hunt groups, instead use device based huntgroups.
I have used ext based liberally throughout my clients (close to 800 phones total) could this cause random restarts of asterisk under heavy call load?
eeman?
Same Issue
I'm running Asterisk 1.4.26.3 and I too am experiencing random restarts. Do your restarts consistently happen at a particular time of the day? I use ext based hunt groups for all of my customers as well and I'm wondering if that is the cause. Have only been getting restarts recently.
do you have a bunch of core
do you have a bunch of core files in /tmp ? if so then you are experiencing 'crashes' not restarts :-)
For what its worth, I've seen
For what its worth, I've seen this happen with multiple PBX's (crashes, not simply restarts, even though if it crashes, it usually restarts itself so it can be confusing as to what is actually happening). Its always turned out to be the same two culprits every time, either the Digium FFA module or the H323 module. If you don't need the module, just don't load it. I hope this helps.
FSD
its really easy to see the
its really easy to see the cause, by simply using the debugger 'gdb' on the corefile and reading the backtrace. If you start seeing the last application as a macro like tl-ringgroup-base or destination channels that include the word Local/ then you'll know it was a chan_local bug (which there are many) that can cause a crash.
It is actually crashing...
cbbs70a is right I am actually getting a crash not a random restart. I have core files going back to when I started this server. It seems to be happening 1 to 2 times a day on bad days as many as five. Is there a way to debug a core file? or do I have to seek out the wizards of digium?
Finally...
At long last after a long weekend of upgrading our switch. Installing all the proper stuff so we can evaluate a core dump we found this...
Program terminated with signal 11, Segmentation fault.
#0 0x00e21649 in ast_masq_valetpark_call (chan=0xb4eab278, data=0xb627fed8)
at app_valetparking.c:312
312 app_valetparking.c: No such file or directory.
in app_valetparking.c
(gdb)
So it seems the problem with our switch crashing is app valetparking.c. I suspect I configured something wrong when installing it.
Could somebody give me a hint at what to look at.
WHAT DOES IT ALL MEAN?
My two scripts if you need it...
Park Multi
exten => s,1,Macro(tl-set-myvariables)
exten => s,n,Set(TIMEOUT=${ARG1})
exten => s,n,GotoIf($["${TIMEOUT}" != ""]?park)
exten => s,n,Set(TIMEOUT=360)
exten => s,n(park),ValetParkCall(auto|${tenant}|${TIMEOUT}|${MYEXTENSION}|1|from-inside${TL_DASH}${tenant})
UnPark Multi
exten => s,1,ValetUnParkCall(${MACRO_EXTEN:${ARG1}}|${tenant})
its a bug in the .c code. I
its a bug in the .c code. I dont remember what triggers it, but if its the cause of all your crashes then thats not good as there is no updated code that fixes it. The worst case I have seen of valetparking is one crash every few weeks. If all your core files debug to valetparking as the culprit then you might have to figure out how to recreate the problem, otherwise you'll have to abandon its use :-(
This is a brand new box with
This is a brand new box with latest version Asterisk (1.4.40) I have two core dumps this is the fault code for yesterday and today's so far...
(
Yesterdays Core Dump
Core was generated by `/usr/sbin/asterisk -f -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
#0 0x00e21649 in ast_masq_valetpark_call (chan=0xb4eab278, data=0xb627fed8)
at app_valetparking.c:312
312 app_valetparking.c: No such file or directory.
in app_valetparking.c
(gdb)
Todays Core Dump
Core was generated by `/usr/sbin/asterisk -f -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
#0 0x00dc0649 in ast_masq_valetpark_call (chan=0xb6937448, data=0xb62f7ed8)
at app_valetparking.c:312
312 app_valetparking.c: No such file or directory.
in app_valetparking.c
(gdb)
If any one has some ideas. I'd love to hear them. Or am I just screwed with app_valetparking.c
i think you're just going to
i think you're just going to have to abandon the application. The guy who wrote it refuses to maintain it, refuses to even publish simple patches I made to it so that it would compile in > 1.4.26, refuses to even answer my email when I try to contact him.
i couldnt tell you without
i couldnt tell you without more of the backtrace. One crash was when people were not picking up the parked calls. Maybe set the timeout to 1hr and screw em if they forget to pick up their parked calls ;-)
Still not sure...
Still not sure what part of that code was causing our crashes but I removed the scripts so it could not be accessed and sent out an email to not use the feature and so far today, No Crash.
-Knocking Heavily on Wood
Maybe also check your zaptel or dahdi timing
On earlier systems 1.2, 1.4 I have noticed that if there is no hardware card to obtain timing (like from a T1), and/or the dummy (zaptel) was not compiled or selected properly on systems without a hardware card, the system was very prone to crashing and restarts.
Asterisk 1.4.24.1 built by root @ distro-el5.asterroid.com on a i686 running Linux
PBX Manager 6.1.1.7