I had a similar question to this here: Delphi XE – should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.
Please let me know your opinion about it.
Thanks.
Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.
Best Answer
RawByteString
is anAnsiString
with no code page set by default.When you assign another
string
to thisRawByteString
variable, you'll copy the code page of the sourcestring
. And this will include a conversion. Sorry.But there is one another use of
RawByteString
, which is to store plain byte content (e.g. a database BLOB field content, just like anarray of byte
)To summarize:
RawByteString
should be used as a "code page agnostic" parameter to a method or function;RawByteString
can be used as a variable type to store some BLOB data.If you want to reduce conversion, and would rather use 8 bit char
string
in your application, you should better:AnsiString
type, which will depend on the current system code page, and by which you'll loose data;UnicodeString
;That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:
WideString
was not an option because it's dead slow and you've got the same problem of implicit conversions.But, in order to achieve best speed, we write some optimized functions to handle our custom string type:
And we reserved the
RawByteString
type for handling BLOB data:Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like
Pos
) to avoid conversion, or More information about how we handled text in the framework is available here.Last word:
If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default
AnsiString
type in your program. But in this case, you should better add theAnsiStrings
unit in youruses
clause to have overloaded string functions which will avoid most unwanted conversion.