Hey gang, I'm in a bit of a pickle and hoping someone could assist (free or paid). Started getting choppy calls and latency with extensions showing as LAGGED. Noticed Asterisk is currently between 200% and 450% on our MT server. Currently have another asterisk based deployment with less vCPUs on the same VMware host with the same number of active calls hanging out at around 9%. Haven't installed any new software on the TLMT server. I can temporarily relieve the call problems with a core restart, but asterisk shoots right back up to 200%+ and eventually calls will start dropping again after a few hours.
Here's my 'top'...
Tasks: 407 total, 1 running, 405 sleeping, 0 stopped, 1 zombie
Cpu(s): 2.8%us, 1.4%sy, 1.2%ni, 73.3%id, 0.0%wa, 0.0%hi, 21.3%si, 0.0%st
Mem: 33012812k total, 14638936k used, 18373876k free, 386404k buffers
Swap: 2097148k total, 0k used, 2097148k free, 8085912k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16737 asterisk 20 0 6853m 144m 15m S 283.8 0.4 24:01.48 asterisk
4676 root 20 0 18.8g 3.1g 24m S 90.7 9.9 131:20.22 java
23399 root 30 10 410m 97m 6436 S 42.4 0.3 41:47.78 plackup
68 root 20 0 0 0 0 S 19.5 0.0 20:49.06 events/1
69 root 20 0 0 0 0 S 16.6 0.0 7:38.44 events/2
23397 root 30 10 279m 51m 3768 S 12.6 0.2 19:03.77 perl
3760 root 20 0 104m 32m 2252 S 10.6 0.1 1162:32 perl
77 root 20 0 0 0 0 S 3.3 0.0 45:11.88 events/10
3 root RT 0 0 0 0 S 2.6 0.0 1:01.26 migration/0
3189 root 20 0 10.9g 254m 16m S 0.7 0.8 74:15.20 java
22711 root 20 0 15296 1516 948 R 0.7 0.0 0:00.29 top
17 root 20 0 0 0 0 S 0.3 0.0 2:03.29 ksoftirqd/3
41 root 20 0 0 0 0 S 0.3 0.0 3:04.97 ksoftirqd/9
2538 root 20 0 2047m 65m 1428 S 0.3 0.2 52:49.65 turnserver
13331 root 20 0 4968m 253m 13m S 0.3 0.8 38:22.88 java
16784 asterisk 20 0 15472 2080 1856 S 0.3 0.0 0:00.67 mpg123
22247 asterisk 20 0 15472 2352 2128 S 0.3 0.0 0:00.21 mpg123
1 root 20 0 19340 1532 1232 S 0.0 0.0 0:24.95 init
Here's the ps -LlFm output showing multiple threads using oddly high cpu..
1 S asterisk - - 16841 19 - 80 0 - - poll_s - 15 10:24 - 00:02:32 -
1 R asterisk - - 16846 24 - 80 0 - - - - 10 10:24 - 00:03:10 -
1 S asterisk - - 16889 0 - 80 0 - - poll_s - 9 10:24 - 00:00:00 -
1 S asterisk - - 16891 0 - 80 0 - - hrtime - 11 10:24 - 00:00:00 -
1 S asterisk - - 16892 0 - 80 0 - - hrtime - 11 10:24 - 00:00:00 -
1 S asterisk - - 16893 0 - 80 0 - - futex_ - 9 10:24 - 00:00:00 -
1 S asterisk - - 16894 0 - 80 0 - - futex_ - 6 10:24 - 00:00:00 -
1 S asterisk - - 16895 0 - 80 0 - - futex_ - 9 10:24 - 00:00:00 -
1 S asterisk - - 16896 0 - 80 0 - - poll_s - 11 10:24 - 00:00:00 -
1 S asterisk - - 16905 0 - 80 0 - - futex_ - 8 10:24 - 00:00:00 -
1 S asterisk - - 18385 0 - 80 0 - - inotif - 2 10:26 - 00:00:00 -
1 S asterisk - - 21158 5 - 80 0 - - poll_s - 9 10:31 - 00:00:21 -
1 S asterisk - - 21612 7 - 80 0 - - poll_s - 8 10:31 - 00:00:26 -
1 S asterisk - - 22103 9 - 80 0 - - poll_s - 0 10:32 - 00:00:29 -
1 S asterisk - - 22622 8 - 80 0 - - poll_s - 0 10:33 - 00:00:20 -
1 S asterisk - - 22712 8 - 80 0 - - poll_s - 0 10:34 - 00:00:18 -
1 S asterisk - - 23365 11 - 80 0 - - poll_s - 1 10:35 - 00:00:15 -
1 S asterisk - - 23388 3 - 80 0 - - poll_s - 9 10:35 - 00:00:04 -
1 S asterisk - - 23834 4 - 80 0 - - poll_s - 7 10:36 - 00:00:03 -
1 S asterisk - - 23930 7 - 80 0 - - poll_s - 0 10:36 - 00:00:04 -
1 S asterisk - - 24160 7 - 80 0 - - poll_s - 3 10:37 - 00:00:00 -
1 S asterisk - - 24210 3 - 80 0 - - futex_ - 9 10:37 - 00:00:00 -
and here's the pstack of one of those high threads..
Thread 1 (process 16846):
#0 0x0000003d366db7ed in write () from /lib64/libc.so.6
#1 0x000000000056e5df in tcptls_stream_write ()
#2 0x0000003d36666e49 in _IO_cookie_write () from /lib64/libc.so.6
#3 0x0000003d3667180a in _IO_new_file_xsputn () from /lib64/libc.so.6
#4 0x0000003d366678bd in fwrite () from /lib64/libc.so.6
#5 0x000000000057ec0f in ast_careful_fwrite ()
#6 0x00000000004fe954 in process_events ()
#7 0x0000000000511d9f in session_do ()
#8 0x000000000056f573 in handle_tcptls_connection ()
#9 0x000000000057c3f8 in dummy_start ()
#10 0x0000003d36a07aa1 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003d366e8bcd in clone () from /lib64/libc.so.6
Before you restarted iptables
Before you restarted iptables, did you try a reboot?
Did it act the same after a reboot?
I didn't try a reboot before
I didn't try a reboot before restarting iptables. But ever since I restarted iptables, the CPU% of Asterisk seems to be crawling back up around 5% every few hours. It got up to 40% so I restarted iptables again to bring it back down to normal (9% during 15 active calls). I plan on doing a reboot in the middle of the night tonight to see if it matters.
Hi Volodya, nothing out of
Hi Volodya, nothing out of the ordinary. Just the usual error spam from users using thirdlane connect on their phones.
"chan_sip.c:17208 register_verify: 'UDP' is not a valid transport for '100-TENANTNAME'. we only use 'WS'! ending call."
Update: After stopping and restarting iptables, asterisk calmed back down to 9%. Now to figure out why :/