Identify Multi-Byte String [Archive]

MilindV

7th June 2008, 18:27

Hi,

I am reading a csv file.

Using gets() I am stoaring line in a string "buffer".
Then I am scanning that string using string.scan() as

ret_val = string.scan(buffer, "%s,%s,%s", s1, s2, s3)

so I want to check whether any of the string s1 ,s2, s3 contains multi byte characters or whether strings are multi byte strings.

I have defined s1, s2, s3 as strings.

Thanks

NPRao

7th June 2008, 22:29

Refer to the Tools function - mb.type() (http://www.baanboard.com/programmers_manual_baanerp_help_functions_multibyte_strings_mb_type)

MilindV

9th June 2008, 08:25

Thanks NP

I have used mb.type() function
but according to user guide this function checks whether string is multy-byte or not, not the content of string.

So mb.type not giving me desired result.

NPRao

9th June 2008, 20:26

Try these 2 functions - mb.char.info() (http://www.baanboard.com/programmers_manual_baanerp_help_functions_multibyte_strings_mb_char_info), mb.width() (http://www.baanboard.com/programmers_manual_baanerp_help_functions_multibyte_strings_mb_width)

I found some more information in my notes:

Native String
Take care that Baan 3GL(BaanERP) has no method to manipulate the Native multi-byte encoding correctly.
Sometime you have to handle MB characters in the Native multi-byte encoding. It is when you have to read some file from third party product or from the customer, and when you have to output files for the customers or for the third party product.
When you read some file from outside of BaanERP, you have to convert the encoding of them to TSS encoding before manipulate them.
And, when you output some file to outside of BaanERP, you have to convert the TSS encoding to the native encoding.,
The function mb.import$() and mb.import.raw$() converts native encoding to TSS encoding.
The function mb.export$() and mb.export.raw$() converts TSS encoding to native encoding.
Take care about the size of string variable. TSS encoding needs double size of native encoding. It means, when you read a line from the file, you have to use double size of the max byte size of the line.
- mb.import$()/mb.export$() - mb.import.raw()/mb.export.raw()
These functions are used to convert character data between TSS encoding and the native encoding. However, be careful when you use these functions. These functions does special treatment on some characters.
Control codes of the range 0x00 through 0x1f ( and '^' and '\' ) are escaped by using a caret or a backslash and a character whose ASCII code is 0x40 through 0x5f at exporting. For example, 0x01 of TSS code is converted to "^A" at exporting. The length of strings may not be the one expected. Also a caret is escaped by adding another caret. ie. "^" is converted to "\^" at exporting. The opposite transformation is done at importing of course.
LDC and CF whose code range is 0x80 through 0x9f are escaped by using a leading backslash and numbers in hexadecimal which denote their actual code value at exporting. For example, 0x80 is converted to "\0x80". Here, 0x80 is 1 byte code and "\0x80" are 5 byte codes. Also a backslash is escaped by adding another backslash. ie. "\" is converted to "\\" at exporting.
When you can not accept such additional conversion, use mb.import.raw()/mb.export.raw().
7.7.1 Take care of data type
When you convert TSS encoding string to Native/UTF-8 string, TSS encoding string has to be stored in 'STRING MB' string variable. But the result(Native/UTF-8 encoding string ) has to be stored in 'STRING' variable.
Otherwise, seq.puts() and so on will output such corrupted data.

For example,
function STRING to.utf8( string tss.str(4000) mb )
{
string utf8str(4000*4) |* one mb char will be convert 3 or 4 bytes.
utf8.export( utf8str, tss.str
return (utf8str)
}

string tss1( 200) mb
string utf1(200*4)

utf1 = to.utf8( tss1)

seq.puts( utf1, fp)
...
Besides, please take care not to call functions to manipulate string data after converting to UTF8. It means you have to convert to Native/UTF-8 just before outputting.