Forums

February 21, 2013, 12:40:43 am Non-Latin (Cyrillic) Encoding

Some notes before I ask my question:

We have mp3 file with ID3 v.1 in UTF8 and ID3 v2.4 in UTF8. All ID3 info in Cyrillic, encoded in UTF8.

The file is loaded into Media Library - no problem with ID3 info, everything is readable, every cyrillic symbol shows in right way. As far as I understood Centova uses UFT8 encoding for all web pages.

In settings for this channel we have Character Encoding set to CP1251: Cyrillic (it could be set to ISO-8859-5: Latin/Cyrillic too).

If I am right, the system works this way: ID3 info in UTF8 converts to CP1251. Shoutcast DNAS Status page shows information in UFT8 by default. In this case we see unreadable info in Cyrillic (see CP1251_page_encoding screenshot). If we set page encoding in our browser as Windows-1251 all info in Cyrillic is readable (see UTF8_page_encoding screenshot).

Most of the web builders in West Europe and Russia are using UTF8 for there sites as it's a standard nowadays. To get cyrillic info in proper way from shoutcast status page they need to convert info from CP1251 into UTF8 back. Some Centova users are using Marci service for showing song info that is working in UTF8 and thus way are not showing right info in cyrillic.

Now question. Is it possible to add one extra point for Character Encoding in Settings like UFT8 or DO NOT CONVERT? So that ID3 info could not be converted from UTF8 into other character encodings if user would like to work with UTF8 only.

Thanks a lot for the great product.

February 21, 2013, 02:26:56 pm, #1 Re: Non-Latin (Cyrillic) Encoding

Quote from: CheRny on February 21, 2013, 12:40:43 am

We have mp3 file with ID3 v.1 in UTF8 and ID3 v2.4 in UTF8. All ID3 info in Cyrillic, encoded in UTF8.

ID3v1 does not support Unicode so that part is not possible.

The rest sounds fine.

Quote from: CheRny on February 21, 2013, 12:40:43 am

As far as I understood Centova uses UFT8 encoding for all web pages.

Yes, Centova Cast uses UTF8 for everything internally.

Quote from: CheRny on February 21, 2013, 12:40:43 am

In settings for this channel we have Character Encoding set to CP1251: Cyrillic (it could be set to ISO-8859-5: Latin/Cyrillic too).

This is irrelevant if you have Unicode ID3v2 tags in your MP3s. The ID3v2 spec provides for two character encodings -- "Unicode" or "not Unicode". Obviously that's a bit silly because if you're using "not Unicode" then you have no way of knowing what encoding you ARE using, which is why the "Character encoding" setting exists.

But if you have properly encoded the MP3s with Unicode ID3v2 tags, then they will be marked as such in the ID3v2 tag headers. In that case, Centova Cast will completely ignore the character encoding setting and will process the tags directly as Unicode (choosing UTF8, UTF16, etc. appropriately depending on the BOM).

Quote from: CheRny on February 21, 2013, 12:40:43 am

If I am right, the system works this way: ID3 info in UTF8 converts to CP1251.

No. Per above, if it's UTF8 (or otherwise Unicode) encoded it is used as-is.

Quote from: CheRny on February 21, 2013, 12:40:43 am

Shoutcast DNAS Status page shows information in UFT8 by default.

That depends on which SHOUTcast DNAS version you're talking about. If you mean DNAS1, then no... it has no concept of character encoding and just dumps the raw strings in its output. That leads to all kinds of character encoding problems.

DNAS2 uses UTF8 output by default, though.

Quote from: CheRny on February 21, 2013, 12:40:43 am

In this case we see unreadable info in Cyrillic (see CP1251_page_encoding screenshot).

It's hard to follow your example without seeing any context -- where are those screenshots from? DNAS? Or a Centova Cast widget? If they're from DNAS, that's a black background which implies DNAS1, in which case it's not unexpected to see character encoding problems due to DNAS1's lack of proper character encoding support.

Quote from: CheRny on February 21, 2013, 12:40:43 am

Now question. Is it possible to add one extra point for Character Encoding in Settings like UFT8 or DO NOT CONVERT?

No, that's the default behavior already so it's unnecessary to add a separate option for it.

February 22, 2013, 02:03:58 am, #2 Re: Non-Latin (Cyrillic) Encoding

Steve, thanks a lot for the great explanation.

Sorry, I am rookie and now I see my mistakes. You are right I was using Shoutcast V1 for testing. As soon as I switched to Icecast V2 all issues with cyrillic encoding were gone. Everything concerting encoding works great and properly. Unfortunately, for now I am not able to use Shoutcast V2 + sc_tranc V2 due to randomly server restarting (centova log show that server is not running and there was a successful attempt of restarting the server. /var/log/messages shows Executable '/usr/local/centovacast/shoutcast2/sc_serv' doesn't belong to any package).

Just note: Shoucast V1 + ices-cc for several accounts work great for almost two months. The only problem with Shoucast V2 + sc_trance V2.

I am wondering if you have some kind payable service for investigating my issue with shoutcast2 + sc_tranc2 and make a conclusion if it's a matter of system software setup, sc_tranc or centova issue. Of course, for the reasonable price.

Once again thanks for the help.

February 22, 2013, 04:25:39 pm, #3 Re: Non-Latin (Cyrillic) Encoding

Quote from: CheRny on February 22, 2013, 02:03:58 am

You are right I was using Shoutcast V1 for testing. As soon as I switched to Icecast V2 all issues with cyrillic encoding were gone.

Great. It's unfortunate that DNAS1 doesn't have better character encoding support, but then again it *is* nearly 10 years old.

Quote from: CheRny on February 22, 2013, 02:03:58 am

Unfortunately, for now I am not able to use Shoutcast V2 + sc_tranc V2 due to randomly server restarting

It's probably crashing rather than restarting (Centova Cast is probably just detecting it as down and starting it up after the crash.) It was noted in another thread that DNAS2 (or perhaps it was sctrans2, I don't recall offhand) was crashing for some users when multiple mountpoints were in use. Apparently if you just use one mount point, it stops crashing.

Quote from: CheRny on February 22, 2013, 02:03:58 am

I am wondering if you have some kind payable service for investigating my issue with shoutcast2 + sc_tranc2 and make a conclusion if it's a matter of system software setup, sc_tranc or centova issue.

Unfortunately as the problem is within DNAS2/sctrans2 it's not something we can really do anything about since we don't develop those products.

If someone could come up with a consistently reproducible way to trigger a crash, I would be more than happy to pass on a bug report to the lead developer for the SHOUTcast toolset... but to date, not only has nobody reported a consistent way to reproduce a crash (it's always just a random crash after an indeterminate period of time), I haven't even been able to get it to crash ONCE on our own machines, so I really don't have a lot to go on unfortunately.

Forums

Non-Latin (Cyrillic) Encoding

Non-Latin (Cyrillic) Encoding

CheRny

February 21, 2013, 12:40:43 am

Non-Latin (Cyrillic) Encoding

Centova - Steve B.

Administrator

February 21, 2013, 02:26:56 pm, #1

Re: Non-Latin (Cyrillic) Encoding

CheRny

February 22, 2013, 02:03:58 am, #2

Re: Non-Latin (Cyrillic) Encoding

Centova - Steve B.

Administrator

February 22, 2013, 04:25:39 pm, #3

Re: Non-Latin (Cyrillic) Encoding