Warans
4th March 2002, 09:21
Hi...
Can someone explain me how the Multi-byte (MB) system works in BaaN. I need all the details about it. for example, if BaaN is implemented on Oracle8 on Unix OS in a country like China for example.
Thanks in advance..
Francesco
5th March 2002, 22:43
Well, I can kick it off and let others add there two cents.
A normal string is defined as a single array, where each position holds a single character (byte).
|t|h|i|s| |i|s| |a| |s|t|r|i|n|g|
A mulitbyte string holds up to 4 characters (bytes) in each position and can therefore be seen as a two-dimensional array (with a depth of 4) or an "array of strings".
|this| is | a m|ulti|byte|stri|ng |
Being able to hold 4 bytes per position, allows it to store coded characters like Chinese and Japanese (TSS or local).
victor_cleto
6th March 2002, 12:31
Wouldn't be more, like if you use standard characters, the string would still be |t|h|i|s| |i|s| |a| |s|t|r|i|n|g| but the 3 extra bytes would be empty?
This is somewhat more like unicode works...
Francesco
6th March 2002, 18:13
Correct, because you don't put regular characters in a multibyte string, but you use multibyte characters, such as Kanji.
Each sequence in a multibyte character set represents a single character.
A mulitbyte character can be a one byte sequence that is a character from the basic (C) character set, or a sequence of two or more bytes that is implementation defined.
For example, in EUC (Extended Unix Coding, whih is one of 4 different ways to code Japanese characters), the first byte determines the length of the multibyte character. If the first byte is in the 0x00 - 0x7F range, it is a member of basic character set and will therefore be only one byte long. All other starting bytes indicate a character lenght of two bytes.
(somebody shut me up already)
In JIS (ISO-2022-JP, not to be confused with the ugly shift-JIS encoding), multibyte characters have state-dependent encoding. The inpterpretation of each byte depends on a conversion state that in addition to the previous system, also uses a shift-state, determined by bytes earlier in the sequence.
anyway...
Victor_cleto is correct when he says the string would still look the same, because the multibyte characters in the string would consist of single byte characters, being part of the basic character set.
Now if you would want to use a more complex character set to say the same thing (and with Baan being multi-lingual this is part of the standard functionality), you would use up to four bytes per character (or position).
When querying Baan, people often treat Multi-byte strings as regular strings, or worse....they convert them into regular strings in their code. This is acceptable _only_ if you work in a language that uses the basic character set only, where you have a one-on-one translation between mulitbyte and singlebyte characters.
I am not sure what the position is of German, Spanish or Portugese for example, since they have quite some additional characters.
Warans
7th March 2002, 11:43
How about chinese...will it be having additional characters (i'm sure it may!) wherein there is no one-to-one translation between normal string single character and multi-byte string character ?
Francesco
7th March 2002, 17:47
Right, simplified Chinese and traditional Chinese use multi-byte characters only.
There are no single byte characters that match any Chinese locale and therefore there is never a relationship.
shah_bs
7th March 2002, 20:02
It will need some digging around, but there is a lot of 'technical' description about this in the on-line Help. You need to look for the subject called TSS (TRITON Super Set) characters. Also, if you look up the help on table definitions, it gives very good description of how the multibyte are stored, sorted and related operations. You could also look up the 'mb.*' functions in the Tools Man Pages.
victor_cleto
8th March 2002, 11:12
I am not sure what the position is of German, Spanish or Portugese for example, since they have quite some additional characters.
They only use 8 bit characters, the conversion/setting of the extra characters in made thru the Locale settings.