ices <defunct> - 100+ zombies processes

Read 19832 times
Hello! :
 2 running, 175 sleeping,   0 stopped, 104 zombie

:~# ps -el | grep -e '^.\sZ'
0 Z  1001  6864  2477  0  80   0 -     0 -      ?        00:00:00 nextsong <defunct>
1 Z  1001  6865  2477  0  80   0 -     0 -      ?        00:00:00 ices <defunct>
0 Z  1001  7647  2477  0  80   0 -     0 -      ?        00:00:00 nextsong <defunct>
1 Z  1001  7648  2477  0  80   0 -     0 -      ?        00:00:00 ices <defunct>
0 Z  1001  8411  2477  0  80   0 -     0 -      ?        00:00:00 nextsong <defunct>
1 Z  1001  8412  2477  0  80   0 -     0 -      ?        00:00:00 ices <defunct>
0 Z  1001  9155  2477  0  80   0 -     0 -      ?        00:00:00 nextsong <defunct>


what i can fix it? (it was in centova v2 too)
Based on what you quoted this is an ices issue, not a Centova Cast issue.

In any case, we don't receive many (read: none as far back as I can remember) reports of ices freezing up, so most likely there is a problem of some sort on your server... various kernel issues can do this, as can an NFS share becoming unreachable, etc.
Hm... ;)

Steve, what linux OS you recommend for centovacast v2 and v3 ?
we use kernel: 2.6.32-5-686-bigmem
Same problem here on CentOS 6.3 - 64bit

psa|grep defunct
29428 pts/0    S+     0:00  |       \_ grep defunct
25124 ?        ZN     0:00  \_ [ices] <defunct>
25164 ?        ZN     0:00  \_ [php] <defunct>
25166 ?        ZN     0:00  \_ [ices] <defunct>
25222 ?        ZN     0:00  \_ [php] <defunct>
25224 ?        ZN     0:00  \_ [php] <defunct>
25226 ?        ZN     0:00  \_ [ices] <defunct>
25275 ?        ZN     0:00  \_ [php] <defunct>
25277 ?        ZN     0:00  \_ [php] <defunct>
25279 ?        ZN     0:00  \_ [ices] <defunct>
......

Any solution?
Same problem with CentOS 6.3 64bit. Any Solution CentovaCast?

I'm thinking that the problem is in ices-cc, modified for Centova.

500      32732  0.0  0.0      0     0 ?        Z    09:33   0:00 [php] <defunct>
500      32734  0.0  0.0      0     0 ?        Z    09:33   0:00 [ices] <defunct>

http://icecast.imux.net/viewtopic.php?t=7951&sid=173faa06063214cbb8e9c97caee7c27f
Last Edit: November 16, 2012, 04:20:51 am by carlosjpr
top - 18:32:41 up 1 day, 11:01,  1 user,  load average: 4.60, 5.08, 9.48
Tasks: 307 total,   1 running, 178 sleeping,   0 stopped, 128 zombie

Same problem with CentOS 6.3 64bit. Any Solution CentovaCast?
No, and FWIW I cannot reproduce this under CentOS 6.3.

What does the output of `ps -eo ppid,pid,state,cmd` look like while this is going on?  As Karl stated this should only happen if the parent process stops reaping its children, but the parent process should be init for any processes spawned by Centova Cast (incl. ices).

If you see a 1 in the PPID field for your ices zombie processes, then init is not reaping its children and there is something very wrong with your system which has nothing to do with ices *or* Centova Cast.  If you see something else there, then Centova Cast is for some reason unable to reparent the ices process to init (which would also likely indicate a problem with your server since it works everywhere else, and we're using a standard fork/setsid/fork to daemonize, but it'd at least be something we could further investigate).
Last Edit: November 20, 2012, 11:25:46 am by Centova - Steve B.
ps -eo ppid,pid,state,cmd


 3845  2446 Z [ices] <defunct>
 3888  2449 Z [nextsong] <defunct>
 3888  2450 Z [ices] <defunct>
 3767  2451 Z [nextsong] <defunct>
 3767  2452 Z [ices] <defunct>
 3819  2474 Z [ices] <defunct>
30217  2476 Z [ices] <defunct>

no ppid with 1....
Your filtering (grep or whatever) makes this somewhat useless... without the rest of the output we can't identify processes 3845, 3888, etc. to see which parent processes are failing to reap their children.
    1   889 S /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/va
  889   978 S /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user
    1  1070 S /usr/libexec/postfix/master
 1070  1077 S qmgr -l -t fifo -u
    1  1080 S proftpd: (accepting connections)
    1  1088 S /usr/sbin/httpd
    1  1096 S crond
    1  1110 S /sbin/mingetty /dev/tty1
    1  1112 S /sbin/mingetty /dev/tty2
    1  1114 S /sbin/agetty /dev/hvc0 38400 vt100-nav
    1  1115 S /sbin/mingetty /dev/tty3
  259  1117 S /sbin/udevd -d
    1  1118 S /sbin/mingetty /dev/tty4
    1  1120 S /sbin/mingetty /dev/tty5
    1  1122 S /sbin/mingetty /dev/tty6
 1088  1123 S /usr/sbin/httpd
 1088  1124 S /usr/sbin/httpd
 1088  1125 S /usr/sbin/httpd
 1088  1126 S /usr/sbin/httpd
 1088  1127 S /usr/sbin/httpd
 1088  1128 S /usr/sbin/httpd
 1088  1129 S /usr/sbin/httpd
 1088  1130 S /usr/sbin/httpd
 1088  1141 S /usr/sbin/httpd
    1  1210 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/radio
    1  1284 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/orbit
    1  1294 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/radio
    1  1303 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/radioebene
    1  1310 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/radio
    1  1319 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/radiofasip
    1  1327 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/radio
    1  1334 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/radiometod
    1  1343 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/radio
    1  1350 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/radio
    1  1357 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/radiosomdo
    1  1365 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/skyle
    1  1372 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/skyler/etc
    1  1380 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/tomso
    1  1388 S /home/centovacast/shoutcast/sc_serv /home/centovacast/vhosts/yahwe
    1  1395 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/yahweh/etc
    1  1403 S /usr/local/bin/icecast -c /home/centovacast/vhosts/93fm/etc/server
    1  1410 S /usr/local/bin/icecast -c /home/centovacast/vhosts/agenciasindical
    1  1417 S /usr/local/bin/icecast -c /home/centovacast/vhosts/araranguaam/etc
    1  1424 S /usr/local/bin/icecast -c /home/centovacast/vhosts/culturafm/etc/s
    1  1432 S /usr/local/bin/icecast -c /home/centovacast/vhosts/energiafm/etc/s
    1  1439 S /usr/local/bin/icecast -c /home/centovacast/vhosts/fm97/etc/server
    1  1446 S /usr/local/bin/icecast -c /home/centovacast/vhosts/iradio/etc/serv
    1  1454 S /usr/local/bin/icecast -c /home/centovacast/vhosts/meridional/etc/
    1  1461 S /usr/local/bin/icecast -c /home/centovacast/vhosts/meridionalcolor
    1  1468 S /usr/local/bin/icecast -c /home/centovacast/vhosts/meridionaljaru/
    1  1475 S /usr/local/bin/icecast -c /home/centovacast/vhosts/meridionalvilhe
    1  1482 S /usr/local/bin/icecast -c /home/centovacast/vhosts/ondasulfmhd/etc
    1  1489 S /usr/local/bin/icecast -c /home/centovacast/vhosts/radioa4pma/etc/
    1  1497 S /usr/local/bin/icecast -c /home/centovacast/vhosts/radiocidade/etc
    1  1504 S /usr/local/bin/icecast -c /home/centovacast/vhosts/radioguaruja/et
    1  1511 S /usr/local/bin/icecast -c /home/centovacast/vhosts/radiomoc/etc/se
    1  1518 S /usr/local/bin/icecast -c /home/centovacast/vhosts/radiorecordrj/e
    1  1525 S /usr/local/bin/icecast -c /home/centovacast/vhosts/radiozoom/etc/s
    1  1532 S /usr/local/bin/icecast -c /home/centovacast/vhosts/rdireitodeviver
    1  1539 S /usr/local/bin/icecast -c /home/centovacast/vhosts/regionalfm/etc/
    1  1546 S /usr/local/bin/icecast -c /home/centovacast/vhosts/ssmhd/etc/serve
    1  1553 S /usr/local/bin/icecast -c /home/centovacast/vhosts/testehd/etc/ser
    1  1560 S /usr/local/bin/icecast -c /home/centovacast/vhosts/transamericafm/
 1088  5498 S /usr/sbin/httpd
 1070 15981 S pickup -l -t fifo -u
    1 16109 S /usr/local/bin/icecast -c /home/centovacast/vhosts/tocantinsfmarag
 1088 16126 S /usr/sbin/httpd
    1 16145 S /usr/local/bin/icecast -c /home/centovacast/vhosts/tocantinsfmguru
 1088 19483 S /usr/sbin/httpd
 1088 19509 S /usr/sbin/httpd
 1395 19914 Z [php] <defunct>
 1395 19916 Z [ices] <defunct>
 1357 19934 Z [ices] <defunct>
 1357 19935 Z [php] <defunct>
 1357 19937 Z [ices] <defunct>
 1372 19983 Z [php] <defunct>
 1372 19985 Z [ices] <defunct>
  754 20058 S sleep 60
  853 20069 S sshd: root@pts/0
20069 20076 S -bash
20076 20092 R ps -eo ppid,pid,state,cmd
Last Edit: November 26, 2012, 09:54:04 am by carlosjpr
Steve i can provide you you shell access ...
OK that confirms basically what I've received from another client as well.  This does in fact appear to be an ices issue.  As Karl noted in the thread carlosjpr linked to previously, a defunct process indicates a problem in the *parent* process.

Taking two of your processes as an example:

1395 19914 Z [php] <defunct>
1395 19916 Z [ices] <defunct>

...these zombie processes are owned by PID 1395... which is:

1  1395 S /usr/local/ices/bin ices -v -c /home/centovacast/vhosts/yahweh/etc

...ices.  So ices is the parent, and ices is failing to reap its child processes when they exit.

As for why, I don't have a clue... it's pretty certain to be a bug of some sort, and it appears to happen exclusively on CentOS 6.3 according to the reports we've received, so clearly something has changed (perhaps in the default kernel used by CentOS 6.3 or somesuch) to bring this bug to light.

FWIW, I've created several CentOS 6.3 VMs and ices has worked fine on all of them, which is somewhat puzzling.  We'll continue to investigate as best we can but just a reminder to everyone that ices is NOT our product and we are NOT the developers of ices, so our recourse is somewhat limited here. 
Steve,

Thank you for your attention to this bug. Would you like to access one of my VM`s with CentOS 6.3 to check the difference of config of your VM`s?

I use XenServer
Last Edit: November 27, 2012, 05:46:05 am by carlosjpr
Thank you for your attention to this bug. Would you like to access one of my VM`s with CentOS 6.3 to check the difference of config of your VM`s?
Thanks but I've already obtained access to an affected server and I've found no perceptible difference between the affected machine and one of our own (which does note exhibit the problem), right down to the exact kernel version.

Clearly *something* has changed in CentOS 6.3 as this doesn't seem to affect users of ANY other Linux distribution, but it doesn't appear to be kernel-specific so I'm stumped.

I did make some changes in the latest version of the control daemon to change how the daemon forks new processes though, so if anyone wants to update and give that a try (being sure to kill off all ices processes after upgrading to ensure a "clean start") I'd be interested to hear if that makes a difference.  Technically the changes *shouldn't* make a difference, but they safeguard against a similar problem that affects certain oldschool UNIX environments so I figured it was worth a shot anyway.