How do the new string types work in Delphi 2009/2010?

I have to convert a large legacy application to Delphi 2009 which uses strings, AnsiStrings, WideStrings and UTF8 data all over the place and I have a hard time to understand how the new string types work and how they should be used.

The application fully supported Unicode using TntUnicodeControls and there are 3rd party DLLs which require strings in specific encodings, mostly UTF8 and UTF16, making the conversion task not as trivial as one would suspect.

I especially have problems with the C DLL calls and choosing the right type. I also get the impression that there are many implicit string conversions happening, because one of the DLL seems to always receive UTF-8 encoded strings, no matter how the Delphi string is encoded.

Can someone please provide a short overview about the new Delphi 2009 string types UnicodeString and RawByteString, perhaps some usage hints and possible pitfalls when converting a pre 2009 application?


Watch my CodeRage 4 talk on "Using Unicode and Other Encodings in your Programs" this friday, or wait until the replay of it is available online.

I'm going to cover some encodings and explain about the string format.

The slides will be available shortly (I'll try to get them online today) and contain a lot of references to stuff you should read on the internet (but I must admit I forgot the link to Joel on Unicode that eed3si9n posted).

Will edit this answer today with the uploads and the links.


Edit:

If you have a small sample where you can show that your C/C++ DLL receives the strings UTF8 encoded, but thought they should be encoded otherwise, please post it (mail me; almost anything at the pluimers dot com gets to me, especially if you use my first name before the at sign).

Session materials can be downloaded now, including the "Using Unicode and Other Encodings in your Programs" session.

These are links from that session:

Read these:

  • Marco Cantu, Whitepaper “Delphi and Unicode”
  • Marco Cantu, Presentation “Delphi and Unicode”
  • Nick Hodges, Whitepaper “Delphi in a Unicode World”
  • Relevant on-line help topics:

  • What's New in Delphi and C++Builder 2009
  • String Types: Base: ShortString, AnsiString, WideString, UnicodeString
  • String Types: Unicode (including internal memory layouts of the string types)
  • String Types: Enabling for Unicode
  • String Types: RawByteString (AnsiString with CodePage $ffff)
  • String Types: UTF8String (AnsiString with CodePage 65001)
  • String <-> PChar conversions: PChar fundamentals
  • String <-> PChar conversions: Returning a PChar Local Variable
  • String <-> PChar conversions: Passing a Local Variable as a PChar
  • Hope this gets you going. If not, mail me and I'll try to extend the answer here.


    See Delphi and Unicode, a white paper written by Marco Cantù and I guess The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), written by Joel.

    One pitfall is that the default Win32 API call has been mapped to use the W (wide string) version instead of the A (ANSI) version, for example ShellExecuteA If your code is doing tricky pointer code assuming internal layout of AnsiString , it will break. A fallback is to substitute PChar with PAnsiChar , Char with AnsiChar , string with AnsiString , and append A at the end of Win32 API call for that portion of code. After the code actually compiles and runs normally, you could refactor your code to use string ( UnicodeString ).


    Note that it does not only hit real string code. It also hits code where PCHAR is used to trawl through buffers, or interface with APIs.

    Eg initialization code of headers that load the DLL dynamically (getprocedureaddress/loadlibray)

    链接地址: http://www.djcxy.com/p/91048.html

    上一篇: 在Delphi 7中运行的ZeroConf / Bonjour代码在2009年不工作

    下一篇: Delphi 2009/2010中的新字符串类型如何工作?