精华内容
下载资源
问答
  • TCHAR,WCHAR,LPSTR,LPWSTR和LPCTSTR,这几个看起来差不多,也很容易混淆,他们都是跟字符有密切关系的。要想知道他们的由来以及作用就必须先了解字符的编码。 0.字符编码 在操作系统里面一般使用两种编码:ansi...

    TCHAR,WCHAR,LPSTR,LPWSTR和LPCTSTR,这几个看起来差不多,也很容易混淆,他们都是跟字符有密切关系的。要想知道他们的由来以及作用就必须先了解字符的编码。

    0.字符编码

    在操作系统里面一般使用两种编码:ansi和unicode。我们在用记事本另存为一个文件的时候可以看到这两种编码,还有其他的比如UTF-8、UTF-16和GB2312都是他们的扩展。c/c++当然也支持这两种字符编码,ansi字符用char(1字节)表示,unicode用wchart_t(2字节)表示。用""包围的是ansi字符串,前面加个L,也就是用L""包围的就是unicode字符串。比如:
    "ansi string";//ansi字符串
    L"unicode string";//unicode字符串




    有了这些知识下面的就好说了。

    1.TCHAR

    TCHAR其实不是数据类型,他只是c++的一个类型别名。因为ansi字符和unicode字符的长度是不一样的,ansi是1字节的,而unicode用两字节表示,但我们在写程序的时候是不知道这个程序会被用到那个字符编码的,所以在c/c++中有一个宏_UNICODE用来标识当前的字符编码。当_UNICODE被定义时,使用unicode编码,未定义使用ansi编码。再回到TCHAR,先看看他的定义:

    #ifdef _UNICODE
    typedef wchar_t TCHAR;
    #else
    typedef char TCHAR;
    #endif



    这就是TCHAR的类型定义,这下应该清楚了吧。如果是ansi编码,TCHAR就是char,一个字节;unicode的话是wchar_t,两个字节。这样我们在编写程序的时候就可以放心的用TCHAR定义字符,让系统来判断长度吧。

    2.WCHAR

    WCHAR就是wchar_t,还有一个CHAR就是char,应该是为了写法上好看吧。

    3.LPSTR,LPWSTR和LPCTSTR

    他们都是单词的缩写,我们可以这样来理解:
    • L - Long(长)
    • P - Pointer(指针,也就是数组)
    • C - Constant(常量)
    • W - Wide(宽,也就是WCHAR的意思)
    • T - TCHAR(上面说过了)
    • STR - String(字符串)
    于是
    • LPSTR=指向字符串的长指针=char
    LPWSTR=指向unicode字符串的长指针=wchar_t LPCTSTR=指向unicode或ansi字符串常量的长指针=const TCHAR * 你也许会问为什么要用长指针呢,也没见他怎么长啊。这是以前遗留的问题,随着现在进入32位时代,没有了短指针,但这命名还是保留了下来。还有其他类似的LPCSTR、LPCWSTR等都可以这么解释

    转载于:https://my.oschina.net/superpdm/blog/363226

    展开全文
  • TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR究竟是神马!!

    TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR究竟是神马!!!<一>

    许多Windows平台上的 c++猿友常常困惑于一些奇怪的数据类型的定义,如:TCHAR、LPCTSTR...,我在此简单说明下。

    一般来说,一个字符能用一个或两个字节来存储。我们称一个字节保存的字符为ANSI字符----所有的英文字符都是这种编码。而称两个字节保存的字符为Unicode,这是包括了世界上所有语言的编码方式。
    VC++编译器分别使用char和wchar_t来支持ANSI和Unicode字符集。尽管对Unicode有更具体的定义,但是为了便于理解,假定它是两字节的字符。
    (注:Windows不仅仅只使用2字节来表示Unicode字符集,而使用UTF-16字符编码)

    如果你想要你的c/c++代码独立于字符集,该如何做呢?
    建议:使用通用数据类型和名称来表示字符和字符串。
    举个例子:
    不要使用
    char cResponse;
    char sUsername[64];
    wchar_t cResponse; 
    wchar_t sUsername[64];
    为了支持多语种(如:Unicode),你可以把代码写成更通用的方式:
    #include<TCHAR.H>
    TCHAR cResponse;
    TCHAR sUsername[64];

    以下项目General设置页面描述用于编译哪个字符集:(通用- >字符集)



    通过这种方式,当你工程被编译成Unicode时,TCHAR就被定义成wchar_t。如果使用ANDI编译,则被翻译成char。你能自由的使用char和wchar_t,工程设置并不会改变其定义。

    TCHAR的定义:
    #ifdef _UNICODE
    typedef wchar_t TCHAR;
    #else
    typedef char TCHAR;
    #endif

    当你设置“ Use Unicode Character Set”时,宏_UNICODE就被定义了,因此TCHAR也就是wchar_t。而当你设置字符集为“ Use Multi-Byte Character Set”,那么TCHAR就是char。
    同样,为了支持多种字符集和多语言,而使用单一代码库,也要使用特定的函数(宏)。如:使用wcscpy,wcslen,wcscat代替strcpy,strlen,strcat
    strlen的定义是:
    size_t strlen(const char*);
    wcslen定义:
    size_t wcslen(const wchar_t* );
    你也许会使用_tcslen,其逻辑定义是:
    size_t _tcslen(const TCHAR* );

    WC是宽字符,因此wcs就是宽字符串。_tcs意味着_T字符串,_T逻辑上是char或者wchar_t。
    但是,事实上_tcslen(还有其他_tcs函数)并不是函数,而是宏:
    #ifdef _UNICODE
    #define _tcslen wcslen 
    #else
    #define _tcslen strlen
    #endif

    你也许会疑惑为什么定义成宏而不是函数,原因很简单:库或DLL可以用相同的名称和原型导出一个函数(不包括C++中的重载)。
    举个例子:当你导出一个函数
    void _TPrintChar(char);

    但是如下情况时:

    <span style="font-family: Arial, Helvetica, sans-serif;">void _TPrintChar(wchar_t);</span>
    客户端如何来调用?
    _TPrintChar函数不能神奇的转换成双字节字符,必须有两个函数的各自定义:

    void PrintCharA(char); // A = ANSI 
    void PrintCharW(wchar_t); // W = Wide character
    同时包括一个简单的宏来隐藏其区别:
    #ifdef _UNICODE
    void _TPrintChar(wchar_t); 
    #else 
    void _TPrintChar(char);
    #endif

    此时客户端就能如下调用:
    TCHAR cChar;
    _TPrintChar(cChar);

    注意,TCHAR和_TPrintChar将映射到Unicode或ANSI,因此cChar和函数参数也被映射成char或者wchar_t。

    宏的确避免了这些复杂性,允许我们操作字符或者字符串时代替使用ANSI或者Unicode函数。很多使用字符串或者字符的Windows函数都利用宏的便利性,对于程序员来说也更方便,因为只要使用一个函数就够了,例如SetWindowText函数:
    // WinUser.H
    #ifdef UNICODE
    #define SetWindowText  SetWindowTextW
    #else
    #define SetWindowText  SetWindowTextA
    #endif // !UNICODE

    展开全文
  • TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR究竟是神马!!!

    TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR究竟是神马!!!<三>



    接着上面继续介绍。。。

    TCHAR宏是针对单个字符的,你可以直接声明TCHAR的数组。如果你想表示字符指针或者const字符指针,该是下面的哪一个呢?
    // ANSI characters 
    foo_ansi(char*); 
    foo_ansi(const char*); 
    /*const*/ char* pString; 
    
    // Unicode/wide-string 
    foo_uni(WCHAR*); 
    wchar_t* foo_uni(const WCHAR*); 
    /*const*/ WCHAR* pString; 
    
    // Independent 
    foo_char(TCHAR*); 
    foo_char(const TCHAR*); 
    /*const*/ TCHAR* pString;
    在看了上文介绍的TCHAR字符数组之后,你就会选择最后一个。它表示起来更加灵活。你需要引入头文件Windows.h。注意:如果你工程直接或者简介包含了Windows.h,就不再需要引入TCHAR.h

    为了更好的理解,先来回顾一般字符串的版本。比如strlen:
    size_t strlen(const char*);
    可能别表示成:
    size_t strlen(LPCSTR);
    标识符LPCSTR是通过typedef来定义的:
    typedef const char* LPCSTR; 
    也就是这样的意思:
    LP - Long Pointer
    C - Constant
    STR - String
    
    从本质上讲,意味着(长)指针,指向一个常量字符串。

    我们来使用新的命名风格表示strcpy:
    LPSTR strcpy(LPSTR szTarget, LPCSTR szSource);
    szTarget的类型是LPSTR,类型中没有C,它是这样定义的:
    typedef char* LPSTR;
    注意:szSource是LPCSTR类型的,因为strcpy函数并不会改变源字符串数组,因此便带有const属性,返回值的类型不是const的:LPSTR。

    这些str版本的函数都是基于ANSI字符串的操作。但是我们想要2字节表示的Unicode字符串例子。同样,宽字符版本的str函数也是有提供的。例如,计算宽字符表示的字符串长度的函数:wcslen:
    size_t nLength;
    nLength = wcslen(L"Unicode");
    wcslen函数的定义:
    size_t wcslen(const wchar_t* szString); // 或者 WCHAR*
    也可以用下面的代替:
    size_t wcslen(LPCWSTR szString);
    LPCWSTR的定义如下:
    typedef const WCHAR* LPCWSTR;
    // const wchar_t*
    拆开来分析:
    • LP - Pointer
    • C - Constant
    • WSTR - Wide character String
    同样,strcpy等价于wcscpy,对于以下Unicode字符串:
    wchar_t* wcscpy(wchar_t* szTarget, const wchar_t* szSource)
    可以使用下面替换
    LPWSTR wcscpy(LPWSTR szTarget, LPWCSTR szSource);

    存在的等价wcs-函数和str-函数。 str-函数将用于ANSI字符串而wcs-函数将用于Unicode字符串。

    不过,我前面建议使用Unicode字符集函数,而不是ANSI或者YCHAR标记的函数。原因很简单:应用程序必须只是Unicode,甚至你不应该关心代码以ANSI编译时候的可移植性。但为了此文章的完整性,在这里提及了通用的映射。


    计算字符串的长度,你也许会使用_tcslen函数(它是一个宏)。本质上来说,它的定义如下:
    size_t _tcslen(const TCHAR* szString);
    或者:
    size_t _tcslen(LPCTSTR szString);
    其中:
    • LP - Pointer
    • C - Constant
    • T = TCHAR
    • STR = String
    根据具体的工程设置,LPCTSTR将被映射成LPCSTR(ANSI)或者LPCWSTR(Unicode)。
    注意:strlen,wcslen或者_tcslen将会返回字符串的字符个数,而不是字节数。

    因此,_tcscpy的定义:
    size_t _tcscpy(TCHAR* pTarget, const TCHAR* pSource);
    可以被翻译成:
    size_t _tcscpy(LPTSTR pTarget, LPCTSTR pSource);



    使用例子来说明。

    int main()
    {
        TCHAR name[] = "Saturn";
        int nLen; // Or size_t
    
        lLen = strlen(name);
    }

    使用ANSI字符集编译,代码将会成功通过编译,因为TCHAR被翻译成char,因此 name就是char的数组。调用strlen计算name长度也是正常的工作。
    然而,当使用Unicode编译的时候(在工程设置中字符集设置成: "Use Unicode Character Set"),便会出错:
    • error C2440: 'initializing' : cannot convert from 'const char [7]' to 'TCHAR []'
    • error C2664: 'strlen' : cannot convert parameter 1 from 'TCHAR []' to 'const char *'
    程序员可能会以这种方式修改错误:
    TCHAR name[] = (TCHAR*)"Saturn";
    这并不起效,因为从TCHAR*到TCHAR[7]的转换不会成功。当本地ANSI字符串传递Unicode版本的函数时也会出错:
    nLen = wcslen("Saturn");
    // ERROR: cannot convert parameter 1 from 'const char [7]' to 'const wchar_t *'
    然而,这样的错误能使用C风格的类型转换实现:
    nLen = wcslen((const wchar_t*)"Saturn");
    你可能感觉你又学会了使用指针的新的技能!你错了,虽然编译通过,但是得不到正确的结果!而且在很多情况下会引起访问异常。

    字符串"Satrun"由7字节的序列组成:
    'S' (83)'a' (97)'t' (116)'u' (117)'r' (114)'n' (110)'\0' (0)
    但是当你给wcslen传递相同的字节数时,它将认为两个字节表示一个字符。因此前两个字节[97,83]会被当做一个字符来处理,第二个字节是[117,116]...
    因此类型转换的时候需要特别注意。

    于是,你应该这样进行改进:
    TCHAR name[] = _T("Saturn");
    它会将字符串从7个字节转换成14个字节来存储。调用wcslen也就会变为:
    wcslen(L"Saturn");

    展开全文
  • What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)? By Ajay Vijayvargiya, 19 Apr 2012 Many C++ Windows programmers get confused over what bizarre identifiers like TCHAR, LPCTSTR are. I

    What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?

    By Ajay Vijayvargiya, 19 Apr 2012

    Many C++ Windows programmers get confused over what bizarre identifiers like TCHAR, LPCTSTR are. In this article, I would attempt by best to clear out the fog.

    In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character - all English characters are represented through this encoding. And let's say a 2-byte character is Unicode, which can represent ALL languages in the world.

    The Visual C++ compiler supports char and wchar_t as native data-types for ANSI and Unicode characters, respectively. Though there is more concrete definition of Unicode, but for understanding assume it as two-byte character which Windows OS uses for multiple language support.

    There is more to Unicode than 2-bytes character representation Windows uses. Microsoft Windows use UTF-16 character encoding.

    What if you want your C/C++ code to be independent of character encoding/mode used?

    Suggestion: Use generic data-types and names to represent characters and string.

    For example, instead of replacing:

    char cResponse; // 'Y' or 'N'
    char sUsername[64];
    // str* functions

    with

    wchar_t cResponse; // 'Y' or 'N'
    wchar_t sUsername[64];
    // wcs* functions

    In order to support multi-lingual (i.e., Unicode) in your language, you can simply code it in more generic manner:

    #include<TCHAR.H> // Implicit or explicit include
    TCHAR cResponse; // 'Y' or 'N'
    TCHAR sUsername[64];
    // _tcs* functions

    The following project setting in General page describes which Character Set is to be used for compilation: (General -> Character Set)

    This way, when your project is being compiled as Unicode, the TCHAR would translate to wchar_t. If it is being compiled as ANSI/MBCS, it would be translated to char. You are free to use char and wchar_t, and project settings will not affect any direct use of these keywords.

    TCHAR is defined as:

    #ifdef _UNICODE
    typedef wchar_t TCHAR;
    #else
    typedef char TCHAR;
    #endif

    The macro _UNICODE is defined when you set Character Set to "Use Unicode Character Set", and therefore TCHAR would mean wchar_t. When Character Set if set to "Use Multi-Byte Character Set", TCHAR would mean char.

    Likewise, to support multiple character-set using single code base, and possibly supporting multi-language, use specific functions (macros). Instead of using strcpy, strlen, strcat (including the secure versions suffixed with _s); or wcscpy, wcslen, wcscat (including secure), you should better use use _tcscpy, _tcslen, _tcscat functions.

    As you know strlen is prototyped as:

    size_t strlen(const char*);

    And, wcslen is prototyped as:

    size_t wcslen(const wchar_t* );

    You may better use _tcslen, which is logically prototyped as:

    size_t _tcslen(const TCHAR* );

    WC is for Wide Character. Therefore, wcs turns to be wide-character-string. This way, _tcs would mean _T Character String. And you know _T may be char or what_t, logically.

    But, in reality, _tcslen (and other _tcs functions) are actually not functions, but macros. They are defined simply as:

    #ifdef _UNICODE
    #define _tcslen wcslen 
    #else
    #define _tcslen strlen
    #endif

    You should refer TCHAR.H to lookup more macro definitions like this.

    You might ask why they are defined as macros, and not implemented as functions instead? The reason is simple: A library or DLL may export a single function, with same name and prototype (Ignore overloading concept of C++). For instance, when you export a function as:

    void _TPrintChar(char);

    How the client is supposed to call it as?

    void _TPrintChar(wchar_t);

    _TPrintChar cannot be magically converted into function taking 2-byte character. There has to be two separate functions:

    void PrintCharA(char); // A = ANSI 
    void PrintCharW(wchar_t); // W = Wide character

    And a simple macro, as defined below, would hide the difference:

    #ifdef _UNICODE
    void _TPrintChar(wchar_t); 
    #else 
    void _TPrintChar(char);
    #endif

    The client would simply call it as:

    TCHAR cChar;
    _TPrintChar(cChar);

    Note that both TCHAR and _TPrintChar would map to either Unicode or ANSI, and therefore cChar and the argument to function would be either char or wchar_t.

    Macros do avoid these complications, and allows us to use either ANSI or Unicode function for characters and strings. Most of the Windows functions, that take string or a character are implemented this way, and for programmers convenience, only one function (a macro!) is good. SetWindowText is one example:

    // WinUser.H
    #ifdef UNICODE
    #define SetWindowText  SetWindowTextW
    #else
    #define SetWindowText  SetWindowTextA
    #endif // !UNICODE

    There are very few functions that do not have macros, and are available only with suffixed W or A. One example is ReadDirectoryChangesW, which doesn't have ANSI equivalent.


    You all know that we use double quotation marks to represent strings. The string represented in this manner is ANSI-string, having 1-byte each character. Example:

    "This is ANSI String. Each letter takes 1 byte."

    The string text given above is not Unicode, and would be quantifiable for multi-language support. To represent Unicode string, you need to use prefix L. An example:

    L"This is Unicode string. Each letter would take 2 bytes, including spaces."

    Note the L at the beginning of string, which makes it a Unicode string. All characters (I repeat all characters) would take two bytes, including all English letters, spaces, digits, and the null character. Therefore, length of Unicode string would always be in multiple of 2-bytes. A Unicode string of length 7 characters would need 14 bytes, and so on. Unicode string taking 15 bytes, for example, would not be valid in any context.

    In general, string would be in multiple of sizeof(TCHAR) bytes!

    When you need to express hard-coded string, you can use:

    "ANSI String"; // ANSI
    L"Unicode String"; // Unicode
    
    _T("Either string, depending on compilation"); // ANSI or Unicode
    // or use TEXT macro, if you need more readability

    The non-prefixed string is ANSI string, the L prefixed string is Unicode, and string specified in _T or TEXT would be either, depending on compilation. Again, _T and TEXT are nothing but macros, and are defined as:

    // SIMPLIFIED
    #ifdef _UNICODE 
     #define _T(c) L##c
     #define TEXT(c) L##c
    #else 
     #define _T(c) c
     #define TEXT(c) c
    #endif

    The ## symbol is token pasting operator, which would turn _T("Unicode") into L"Unicode", where the string passed is argument to macro - If _UNICODE is defined. If _UNICODE is not defined, _T("Unicode") would simply mean "Unicode". The token pasting operator did exist even in C language, and is not specific about VC++ or character encoding.

    Note that these macros can be used for strings as well as characters. _T('R') would turn into L'R' or simple 'R' - former is Unicode character, latter is ANSI character.

    No, you cannot use these macros to convert variables (string or character) into Unicode/non-Unicode text. Following is not valid:

    char c = 'C';
    char str[16] = "CodeProject";
    
    _T(c);
    _T(str);

    The bold lines would get successfully compiled in ANSI (Multi-Byte) build, since _T(x) would simply be x, and therefore _T(c) and _T(str) would come out to be c and str, respectively. But, when you build it with Unicode character set, it would fail to compile:

    error C2065: 'Lc' : undeclared identifier
    error C2065: 'Lstr' : undeclared identifier

    I would not like to insult your intelligence by describing why and what those errors are.

    There exist set of conversion routine to convert MBCS to Unicode and vice versa, which I would explain soon.

    It is important to note that almost all functions that take string (or character), primarily in Windows API, would have generalized prototype in MSDN and elsewhere. The function SetWindowTextA/W, for instance, be classified as:

    BOOL SetWindowText(HWND, const TCHAR*);

    But, as you know, SetWindowText is just a macro, and depending on your build settings, it would mean either of following:

    BOOL SetWindowTextA(HWND, const char*);
    BOOL SetWindowTextW(HWND, const wchar_t*);

    Therefore, don't be puzzled if following call fails to get address of this function!

    HMODULE hDLLHandle;
    FARPROC pFuncPtr;
    
    hDLLHandle = LoadLibrary(L"user32.dll");
    
    pFuncPtr = GetProcAddress(hDLLHandle, "SetWindowText");
    //pFuncPtr will be null, since there doesn't exist any function with name SetWindowText !

    From User32.DLL, the two functions SetWindowTextA and SetWindowTextW are exported, not the function with generalized name.

    Interestingly, .NET Framework is smart enough to locate function from DLL with generalized name:

    [DllImport("user32.dll")]
    extern public static int SetWindowText(IntPtr hWnd, string lpString);

    No rocket science, just bunch of ifs and else around GetProcAddress!

    All of the functions that have ANSI and Unicode versions, would have actual implementation only in Unicode version. That means, when you call SetWindowTextA from your code, passing an ANSI string - it would convert the ANSI string to Unicode text and then would call SetWindowTextW. The actual work (setting the window text/title/caption) will be performed by Unicode version only!

    Take another example, which would retrieve the window text, using GetWindowText. You call GetWindowTextA, passing ANSI buffer as target buffer. GetWindowTextA would first call GetWindowTextW, probably allocating a Unicode string (a wchar_t array) for it. Then it would convert that Unicode stuff, for you, into ANSI string.

    This ANSI to Unicode and vice-versa conversion is not limited to GUI functions, but entire set of Windows API, which do take strings and have two variants. Few examples could be:

    • CreateProcess
    • GetUserName
    • OpenDesktop
    • DeleteFile
    • etc

    It is therefore very much recommended to call the Unicode version directly. In turn, it means you should always target for Unicode builds, and not ANSI builds - just because you are accustomed to using ANSI string for years. Yes, you may save and retrieve ANSI strings, for example in file, or send as chat message in your messenger application. The conversion routines do exist for such needs.

    Note: There exists another typedef: WCHAR, which is equivalent to wchar_t.


    The TCHAR macro is for a single character. You can definitely declare an array of TCHAR. What if you would like to express a character-pointer, or a const-character-pointer - Which one of the following?

    // ANSI characters 
    foo_ansi(char*); 
    foo_ansi(const char*); 
    /*const*/ char* pString; 
    
    // Unicode/wide-string 
    foo_uni(WCHAR*); 
    wchar_t* foo_uni(const WCHAR*); 
    /*const*/ WCHAR* pString; 
    
    // Independent 
    foo_char(TCHAR*); 
    foo_char(const TCHAR*); 
    /*const*/ TCHAR* pString;

    After reading about TCHAR stuff, you would definitely select the last one as your choice. There are better alternatives available to represent strings. For that, you just need to include Windows.h. Note: If your project implicitly or explicitly includes Windows.h, you need not include TCHAR.H

    First, revisit old string functions for better understanding. You know strlen:

    size_t strlen(const char*);

    Which may be represented as:

    size_t strlen(LPCSTR);

    Where symbol LPCSTR is typedef'ed as:

    // Simplified
    typedef const char* LPCSTR;  

    The meaning goes like:

    • LP - Long Pointer
    • C - Constant
    • STR - String

    Essentially, LPCSTR would mean (Long) Pointer to a Constant String.

    Let's represent strcpy using new style type-names:

    LPSTR strcpy(LPSTR szTarget, LPCSTR szSource);

    The type of szTarget is LPSTR, without C in the type-name. It is defined as:

    typedef char* LPSTR;

    Note that the szSource is LPCSTR, since strcpy function will not modify the source buffer, hence the const attribute. The return type is non-constant-string: LPSTR.

    Alright, these str-functions are for ANSI string manipulation. But we want routines for 2-byte Unicode strings. For the same, the equivalent wide-character str-functions are provided. For example, to calculate length of wide-character (Unicode string), you would use wcslen:

    size_t nLength;
    nLength = wcslen(L"Unicode");

    The prototype of wcslen is:

    size_t wcslen(const wchar_t* szString); // Or WCHAR*

    And that can be represented as:

    size_t wcslen(LPCWSTR szString);

    Where the symbol LPCWSTR is defined as:

    typedef const WCHAR* LPCWSTR;
    // const wchar_t*

    Which can be broken down as:

    • LP - Pointer
    • C - Constant
    • WSTR - Wide character String

    Similarly, strcpy equivalent is wcscpy, for Unicode strings:

    wchar_t* wcscpy(wchar_t* szTarget, const wchar_t* szSource)

    Which can be represented as:

    LPWSTR wcscpy(LPWSTR szTarget, LPWCSTR szSource);

    Where the target is non-constant wide-string (LPWSTR), and source is constant-wide-string.

    There exist set of equivalent wcs-functions for str-functions. The str-functions would be used for plain ANSI strings, and wcs-functions would be used for Unicode strings.

    Though, I already advised to use Unicode native functions, instead of ANSI-only or TCHAR-synthesized functions. The reason was simple - your application must only be Unicode, and you should not even care about code portability for ANSI builds. But for the sake of completeness, I am mentioning these generic mappings.

    To calculate length of string, you may use _tcslen function (a macro). In general, it is prototyped as:

    size_t _tcslen(const TCHAR* szString); 

    Or, as:

    size_t _tcslen(LPCTSTR szString);

    Where the type-name LPCTSTR can be classified as:

    • LP - Pointer
    • C - Constant
    • T = TCHAR
    • STR = String

    Depending on the project settings, LPCTSTR would be mapped to either LPCSTR (ANSI) or LPCWSTR (Unicode).

    Note: strlen, wcslen or _tcslen will return number of characters in string, not the number of bytes.

    The generalized string-copy routine _tcscpy is defined as:

    size_t _tcscpy(TCHAR* pTarget, const TCHAR* pSource);

    Or, in more generalized form, as:

    size_t _tcscpy(LPTSTR pTarget, LPCTSTR pSource);

    You can deduce the meaning of LPTSTR!

    Usage Examples

    First, a broken code:

    int main()
    {
        TCHAR name[] = "Saturn";
        int nLen; // Or size_t
    
        lLen = strlen(name);
    }

    On ANSI build, this code will successfully compile since TCHAR would be char, and hence name would be an array of char. Calling strlen against name variable would also work flawlessly.

    Alright. Let's compile the same with with UNICODE/_UNICODE defined (i.e. "Use Unicode Character Set" in project settings). Now, the compiler would report set of errors:

    • error C2440: 'initializing' : cannot convert from 'const char [7]' to 'TCHAR []'
    • error C2664: 'strlen' : cannot convert parameter 1 from 'TCHAR []' to 'const char *'

    And the programmers would start committing mistakes by correcting it this way (first error):

    TCHAR name[] = (TCHAR*)"Saturn";

    Which will not pacify the compiler, since the conversion is not possible from TCHAR* to TCHAR[7]. The same error would also come when native ANSI string is passed to a Unicode function:

    nLen = wcslen("Saturn");
    // ERROR: cannot convert parameter 1 from 'const char [7]' to 'const wchar_t *'

    Unfortunately (or fortunately), this error can be incorrectly corrected by simple C-style typecast:

    nLen = wcslen((const wchar_t*)"Saturn");

    And you'd think you've attained one more experience level in pointers! You are wrong - the code would give incorrect result, and in most cases would simply cause Access Violation. Typecasting this way is like passing a float variable where a structure of 80 bytes is expected (logically).

    The string "Saturn" is sequence of 7 bytes:

    'S' (83)'a' (97)'t' (116)'u' (117)'r' (114)'n' (110)'\0' (0) 

    But when you pass same set of bytes to wcslen, it treats each 2-byte as a single character. Therefore first two bytes [97, 83] would be treated as one character having value: 24915 (97<<8 | 83). It is Unicode character: ?. And the next character is represented by [117, 116] and so on.

    For sure, you didn't pass those set of Chinese characters, but improper typecasting has done it! Therefore it is very essential to know that type-casting will not work! So, for the first line of initialization, you must do:

    TCHAR name[] = _T("Saturn");

    Which would translate to 7-bytes or 14-bytes, depending on compilation. The call to wcslen should be:

    wcslen(L"Saturn");

    In the sample program code given above, I used strlen, which causes error when building in Unicode. The non-working solution is C-sytle typecast:

    lLen = strlen ((const char*)name);

    On Unicode build, name would be of 14-bytes (7 Unicode characters, including null). Since string "Saturn" contains only English letters, which can be represented using original ASCII, the Unicode letter 'S' would be represented as [83, 0]. Other ASCII characters would be represented with a zero next to them. Note that 'S' is now represented as 2-byte value 83. The end of string would be represented by two bytes having value 0.

    So, when you pass such string to strlen, the first character (i.e. first byte) would be correct ('S' in case of "Saturn"). But the second character/byte would indicate end of string. Therefore, strlen would return incorrect value 1 as the length of string.

    As you know, Unicode string may contain non-English characters, the result of strlen would be more undefined.

    In short, typecasting will not work. You either need to represent strings in correct form itself, or use ANSI to Unicode, and vice-versa, routines for conversions.

    (There is more to add from this location, stay tuned!)


    Now, I hope you understand the following signatures:

    BOOL SetCurrentDirectory( LPCTSTR lpPathName );
    DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);

    Continuing. You must have seen some functions/methods asking you to pass number of characters, or returning the number of characters. Well, like GetCurrentDirectory, you need to pass number of characters, and not number of bytes. For example:

    TCHAR sCurrentDir[255];
     
    // Pass 255 and not 255*2 
    GetCurrentDirectory(sCurrentDir, 255);

    On the other side, if you need to allocate number or characters, you must allocate proper number of bytes. In C++, you can simply use new:

    LPTSTR pBuffer; // TCHAR* 
    
    pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.

    But if you use memory allocation functions like malloc, LocalAlloc, GlobalAlloc, etc; you must specify the number of bytes!

    pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );

    Typecasting the return value is required, as you know. The expression in malloc's argument ensures that it allocates desired number of bytes - and makes up room for desired number of characters.

    展开全文
  • TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR
  • 原文地址:http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc Many C++ Windows programmers get confused over what bizarre identifiers likeTCHAR,LPCTSTRare. In this...
  • 自: http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc 解释的超详细!!! 转载于:https://www.cnblogs.com/cindy-hu-23/p/3549899.html
  • TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR 之间的联系与区别  许多C++程序员在面对那些像TCHAR,LPCTSTR等奇怪的标示符时,很疑惑。这里,我将简要地介绍一些他们的来龙去脉。   一般来说,一个...
  • TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR区别

    千次阅读 2013-05-27 11:45:01
    在C++的窗口应用程序开发过程中,我们经常对TCHAR,LPCTSTR这样的关键字迷惑。接下来将详细解释他们之间的区别。  通常,一个字符可以用1个字节或两个... 在VC++编译器中,分别用char和wchar_t数据类型来标志ANSI和U
  • 免责申明(必读!):本博客提供的所有教程的翻译原稿均来自于互联网,仅供学习交流之用,切勿进行商业传播...原文链接:http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc...
  • TCHAR,WCHAR,LPSTR,LPWSTR,LPCSTR的区别

    千次阅读 2013-03-30 13:50:30
    在C++的窗口应用程序开发过程中,我们经常对TCHAR,LPCTSTR这样的关键字迷惑。接下来将详细解释他们之间的区别。  通常,一个字符可以用1个字节或两个字节来表示。... 在VC++编译器中,分别用char和wchar_t数
  • LPSTR strcpy(LPSTR szTarget, LPCSTR szSource); szTarget 是 LPSTR 类型,定义如下: typedef char* LPSTR; 上面的函数都是针对 ANSI 字符串集,如果要支持 Unicode 字符集,要计算宽字符的长度使用 wcslen ...
  • 呵呵~ 由于windows内核采用的是UNICODE,UNICODE版的程序必然比ASCII版的程序效率高(比如不用在调用函数时在堆里分配空间把参数成 UNICODE,然后再调用UNICODE版的函数),所以我们最好是在程序的开头加上#define...
  • char* 替换: LPSTR const char* 替换: LPCSTR WCHAR* 替换: LPWSTR const WCHAR* 替换: LPCWSTR (C在W之前, 因为 const 在 WCHAR之前) TCHAR* 替换: LPTSTR const TCHAR* 替换: LPCTSTR 现在,希望你可以理解...
  • VC++使用char和wchar_t的内置数据类型来分别作为表示ANSI和Unicode字符。如果你想让你的C/C++程序是字符集无关的,该怎么做呢?如果你用通常的字符集来写,你可能会写成这样的。char cRespon...
  • TCHAR可以根据定义编译变量不同解释为char, wchar_t。用时需要加入对头文件 tchar.h 的文件 LPXXX其实是指向相应类型的字符串的指针(LP -- long pointer) 详细情况见下文: In general, a character can be...
  • LPSTR const char*  替换:  LPCSTR WCHAR*  替换:  LPWSTR const WCHAR*  替换:  LPCWSTR  ( C 在 W之前 , 因为  const  在  WCHAR之前 ) TCHAR*  替换:  LPTSTR const TCHAR*  替换:  ...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 8,260
精华内容 3,304
关键字:

wchar转lpstr