Forums

Hey All,

Ran into a strange issue here - have a ticket in now for days - but its just back and forth no solutions yet.

Let me start by saying I have Pingdom, and UptimeRobot set up to both alert me for if my server goes down, or if my control panel website goes down.

For the last 5 or 6 weeks - on either Monday or Tuesdays - I am getting a notification from both my monitoring sites that the HTTP log in site has gone down.

Weird part - the Auto DJs that are running are still playing - DJs can log in - go live, etc. But no one can reach the Centova panel.

The only way I have been able to remedy the situation is to reboot the entire server.

Tech support gave me some commands to try. They told me:

Please run this command in your SSH terminal:

/usr/local/centovacast/sbin/fixperms

Now, restart the Web service again:

/etc/init.d/centovacast stop-web
/etc/init.d/centovacast start-web

I tried this - no fix.

Has anyone had this happen? How do I fix it? Weirdest part its only occurring on either Mon or Tues... Same auto DJs always running, no other common DJs live, or anything like that. Server stays up, Web service down.

Any help is appreciated!

Todd

Not seeing anything on the shut down in the logs. They were empty.

When you enter that FixPerms command - shouldnt that take some time to run? For me it was immediate. Split second.

I have since rebooted the server bec I needed the panel - without having it rectified - so Web Service is back up. But most likely will crash again on Mon or Tues. It extremely strange.

Quote from: GroovinMind on December 26, 2013, 09:04:55 am

I had this happen to me a while back and the above remedy that you posted fixed my issue..

/etc/init.d/centovacast stop-web
/etc/init.d/centovacast start-web

Check your logs, it will tell you why it shut down.

I had the same issue but did what you said and fixed it. Thanks for posting.

My problem was every 2 days it would say 502 bad gateway error and I had to reinstall the CPanel when it would happen to me.

Last Edit: January 26, 2014, 11:43:23 am by Todd73NJ

Ok, My frustration is continuing to grow - as are my expenses.

I have been in contact with support many times now, and am struggling to get this problem rectified. My messages were even escalated to the developers.

My server, was an OVH - cheap stuff - and this was the error log:

Dec 29 22:20:22 ks20321 kernel: PAX: size overflow detected in function atomic_add_return /var/home/fx/src/ovh-kernel/ovhkernel-xxxx-grs-ipv6-64/linux-3.10.9/arch/x86/include/asm/atomic.h:337 cicus
Dec 29 22:20:22 ks20321 kernel: CPU: 0 PID: 4433 Comm: sc_serv Not tainted 3.10.9-xxxx-grs-ipv6-64 #1
Dec 29 22:20:22 ks20321 kernel: Hardware name: /DH67BL, BIOS BLH6710H.86A.0156.2012.0615.1908 06/15/2012
Dec 29 22:20:22 ks20321 kernel: 0000000000000000 ffff8801fafc7dd8 ffffffff81da3b50 ffff8801fafc7de8
Dec 29 22:20:22 ks20321 kernel: ffffffff8119bb34 ffff8801fafc7df8 ffffffff811af257 ffff8801fafc7e18
Dec 29 22:20:22 ks20321 kernel: ffffffff81b7fc56 0000000000000000 ffff8801f7cc1e00 ffff8801fafc7f08
Dec 29 22:20:22 ks20321 kernel: Call Trace:
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81da3b50>] dump_stack+0x19/0x21
Dec 29 22:20:22 ks20321 kernel: [<ffffffff8119bb34>] report_size_overflow+0x24/0x30
Dec 29 22:20:22 ks20321 kernel: [<ffffffff811af257>] get_next_ino+0x77/0x80
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81b7fc56>] sock_alloc+0x26/0x80
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81b82c00>] SYSC_accept4+0x70/0x290
Dec 29 22:20:22 ks20321 kernel: [<ffffffff811b3bd0>] ? mntput_no_expire+0x40/0x140
Dec 29 22:20:22 ks20321 kernel: [<ffffffff811b456f>] ? mntput+0x1f/0x40
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81108c94>] ? ktime_get_ts+0x54/0xf0
Dec 29 22:20:22 ks20321 kernel: [<ffffffff811e2531>] ? poll_select_copy_remaining+0x91/0x230
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81b842b9>] SyS_accept4+0x9/0x20
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81bbddea>] compat_sys_socketcall+0x1da/0x310
Dec 29 22:20:22 ks20321 kernel: [<ffffffff81db26bc>] sysenter_dispatch+0x7/0x24

I was told by Centova there was a serious hardware problem. We proceeded to upgrade the kernels, then when that didnt work, downgraded the kernels to an older, stable version according to kernels.org. But no fix. The web service portion of the server continued to crash every 80 to 120 hours - pretty much like clock work.

I contacted the data center - they confirmed the server was NOT down - and that the problem was software related.

So where does that leave me? Centova telling me its hardware, DC telling me its softwate.

This past Wednesday, I purchased a new server, not an OVH, but from Online.Net. Triple the price, better specs. Had CentOS installed, and paid Centova $120 to perform the migration to this new server.

Problem solved? Nope.

At 1 pm today, 88 hours after the migration was completed - guess what? The webservice goes down.

I have two alerts set up, one pinging the IP address of the server, and one on IP:2199 - the only one that goes down again is the IP:2199

Hopefully I can put the new log here this evening when I am not mobile, and maybe someone can help with a solution. It cant be another hardware issue on a totally different server, different kernel version, different DC.

I have another support request in to Centova. But any help here is greatly appreciated.

Edit: One of the major problems here is - anytime I bring the web service back up by rebooting, it takes me 80 to 120 hours or so to figure out if it worked.. because that the time frame that it goes down.

So here are the basics of the entire situation, I do have a new ticket in with Centova - but maybe one of you guys will see something:

I had been running my Centova on a Kimsufi/OVH server. The server itself never actually went down in the entire time that I have had it in my possession and been monitoring it. However, every 80 to 120 hours the web service for Centova went down, and in order to bring it back up, I was forced to reboot the server. This solved the problem for another 80 to 120 hours or so.

Centova support escalated the issue to a dev.

He was kind enough to look at my logs, and found all sorts of Kernel errors, which he diagnosed as a serious hardware issue with the server. The listing of the errors is in the above response.

I contacted my DC, and they said there were no hardware issues with the server. We also upgraded the Kernels to the current stable version 3.12.9 (according to Kernel.org - rebooted the server - and the webservice crashed some 88 hours later. We then rolled back the Kernels to 2.6.34 - rebooted - and again the webservice crashed near the 100 hour mark.

As this point, I decided to chalk it up to being a bad server - contrary to what the DC was telling me

On Wed, 1/22, I contracted Centova to migrate over from my OVH/Kimsufi Server to my Online.Net server. This job was completed approx 10pm EST on 1/22.

Everything appeared to be running great for a few days.

But today, 1/26, at 12:58 EST, the Webservice monitor reported that it was down. I checked Streams.CentralSocial3d.com:2199 and received:

502 Bad Gateway

cc-web/1.2.9

However, the monitor on just the server IP address showed the server was up and running running fine.. I was able to ping the server, but unable to reach the Centova Panel. The problem again appears to be the web service only.

Ironically, this down time occurred 84 hours and 55 minutes from the migration to the new server. Which happens to be in the same window of time that the other servers web service would go down on a consistent basis.

I was forced to reboot several times to get the web service to come back up. On the 3rd reboot it did. And it is now running flawlessly again. However, judging by what I saw today, I would fully expect the web service to crash again between Wednesday night and Friday night.

I desperately need to fine a solution to this problem

Here are all the logs. (They are based on Paris time - so the time that the server went down would be between 18:53 and 18:58. I cannot get a specific minute because my monitor checks the web service and the server once every 5 minutes.

CC-AppServer Logs:
http://brooketv.net/cc-appserver.log
http://brooketv.net/cc-appserver.log.1.log
http://brooketv.net/cc-appserver.log.2.log
http://brooketv.net/cc-appserver.log.3.log

CC-Web Logs:
http://brooketv.net/cc-web.log
http://brooketv.net/cc-web.log.1.log
http://brooketv.net/cc-web.log.2.log
http://brooketv.net/cc-web.log.3.log

Var/logs:
Jan 20 - 26 http://brooketv.net/messages-20140126.txt

Jane 26th - current http://brooketv.net/messagesnew.txt

If any of you see anything jump out at you, Id appreciate your input. I really need to get to the bottom of this problem.

Thanks!

Ok.. so its been 3 days, and 17 hours - and everything was fine till this morning.

Puts me right back in that 80 to 120 window.

All along server load has been .05 to .10. Auto DJs running, live events.. no issues. Memory used 1/8. 0 Swap.

This morning.. Load was at 5.. then 10.. a few hours later 20.. now 30+! Memory still shows 1/8. 0 Swap.

So Id sense the server is about to crash.

I attempted to get an SSH prompt.. took me about 15 attempts... but here is the TOP report.

Can anyone make anything of this???

http://i1215.photobucket.com/albums/cc506/Todd73NJ/TOPreport_zps3c189bd1.jpg

Im starting to feel like I bought a software package thats in beta testing or something... 5 support replies from me, 7 days go by - and nothing from Centova.

I just dont get it. Centova has been the ONLY software running now on 2 totally different server packages - but yet the blame continues to be the hardware, and me needing to figure out the problem.

I had a crash about 80 hours ago as I detailed in prior posts where something caused the server to reach 90 load over the course of 12 hours, from basically no load.

Well last night - around 3am.. I stopped monitoring my server from my computer. Load was .01, .02 - even with a few hundred listeners on live events, all the auto DJs running as they normally do. No issues what so ever.

So I decide to check in this morning - and for the first time - I noticed Swap memory had been used... 412168k to be exact. What would make this happen? No DJs were live, just the auto DJs running as they had been for the past 80 hours.

The server load is now showing slightly higher... Im seeing a lot of .20 to .40s range readings. Something that I had not been seeing at all since the last reboot.

Ironically - we are now in that 80++ hour window where two different servers only running Centova have crashed - and the assessment from support is hardware issues.

The commands that were suggested are being used in cron jobs, not helping the issue.

Someone has to have some better insight for me... my frustration is building. Over $600 in license, install, migration and service fees.... and now my second server is still experiencing the same problems that the one I migrated away from was experiencing.

Im no server guru - hence why I bought a professional package with support available. You sort of expect it to work.

Any help would be appreciated!

I was with OVH. (Kimsufi Brand)

I got the server there - put a monitor on it. Ran straight for a month - decided to use it for Centova. I have no clue how to do installs. Hired them to do it.

80 hours ran great - web service crashed - music continued to play for another day - then all crashed. But server was still up, pingable, showing a very light load.

Took some of their support methods, tried them - no matter what I tried - every 80 to 120 hours the same events occurred.

The Devs told me from looking at my logs I had a very serious hardware problem (due to the kernels). The DC said the hardware was fine - its the software. I tried upgrading and reverting back to other kernels. same results. Runs great for 80 or so hours... then stops.

Kimsufi is the cheapest server on the market. (However, that being said, we have one with Wowza running on it for 6 months straight without a reboot or hitch - so they cant be all that bad) But I decided to take the Devs assessment.

I now have a server with Online.Net. Runs great. Double the specs of the other server. No issues at all the first week I had it. So I had Centova migrated over.

Guess what? 80 to 120 hours.. same problem.

I am no techie - but just using some logic it seems like something is overflowing. Maybe based on use, hence it happening in the same time window. MySql? Logs files? I have no idea.

But this same issue cant be happeneing in two different DCs. Different CPUs, specs, kernels, etc.

I thought about that - I have tried the settings for auto DJ on both re-encode and also have had that off.

And honestly - the max amount of auto DJ on any single stream is only 3GB. The problem doesnt occur daily - it occurs every 80 to 120 hours. So if that were the case I think it would happen much more frequently.

Ive tried both. And if that was the case - wouldnt it happen every time that file played?? I just looked at all the auto DJs.. the most songs anyone has is about 190.. which would be a maximum of 15 hours of music. So Id think it happen in that time frame? Not the 80 to 120 hour mark every time.

IS there a Centova process that runs at 5am est? TOP report usually shows 1 running, sometimes 2 - just showed 5, with a load of 2.0 and within 2 mins.. was back at a load of .04

So this is the longest I have ever been up and running...

Now 11 days that Centova Support has now ignored my support messages and screen shots...

1) We have set up a cron job resetting all the Centova processes nightly at 4:30am EST
2) A second crob hourly to reset the cached memory. I have found that with Centova running, the server eats up the memory - and foreces SWAP usage within about 2 hours. But when Centova is not running, and all process are up - this acts much more normally and is a very slow process with memory released as needed.
3) Added additional DDOS protection

All this has led to 7 days and 10 hours or uptime - with the server running at loads of basically 0.00

BUT - I took a glance this morning and became very alarmed. I was able to get one screen shot. One of the SC-SERV processes has exploded to top of the usage list - at 54% CPU. It flashes a very high number - and then off.

In addition the MyQSQL, when it pops up every thrid or 4th time is popping up 1600%!!!!

This cannot be normal. No one is live. All my auto DJs are playing like normal like they have been the past 7 days.

Suggestions? Comments? PLEASE!

Says attachment too large.. so here is a link to it:

http://i61.tinypic.com/rtq4cj.jpg

Update..

So we killed the process ... and it came right back - at the crazy usage percentages:

http://i60.tinypic.com/2utiyrp.jpg

Have to love the support response 11 days later..

After my first server had "serious" hardware issues prompting me to migrate (and fwiw its run fine for two weeks without a hitch with other applications once Centova was off it)...

Now the Hard Drive on my current server is the diagnosis.

Can anyone look at those screen shots above and make a better guess? There has to be a reason why one file is out of control - and cant be reset even when killing the PID or restarting all streams from the panel.

Web Service goes down - But Music Continues to Play on Auto DJ - Advice please

Web Service goes down - But Music Continues to Play on Auto DJ - Advice please

December 24, 2013, 08:21:11 pm

December 26, 2013, 09:51:17 am, #1

January 02, 2014, 01:09:16 pm, #2

January 06, 2014, 12:54:55 am, #3

January 26, 2014, 11:38:34 am, #4

January 26, 2014, 11:44:50 pm, #5

February 02, 2014, 11:29:11 am, #6

February 06, 2014, 10:46:19 am, #7

February 06, 2014, 11:17:43 pm, #8

February 07, 2014, 10:07:10 am, #9

February 07, 2014, 01:31:25 pm, #10

February 08, 2014, 02:04:22 am, #11

February 10, 2014, 09:37:45 am, #12

February 10, 2014, 09:41:04 am, #13

February 10, 2014, 07:12:45 pm, #14