Home | API | MFC | C++ | C | Previous | Next

Programming Windows API

Data Types and Character Sets

Data Types in Win32 API

Windows does not make widespread use of standard c/c++ data types but instead uses a collection of type defined data types found with the windows.h header file. A selection of these is listed below

BOOL – This type of data has two values – 0 or 1.
BYTE –  The same as unsigned char. Declared as typedef unsigned char BYTE,
DWORD – 32-bit unsigned integer.
INT – 32-bit integer. Declared as typedef int INT.
LONG – A 32-bit signed integer.
UINT – 32-bit unsigned integer. Declared as typedef unsigned int UINT.
HANDLE – 32-bit integer used to identify a resource.
HBITMAP – Handle to a bitmap.
HBRUSH – Handle to a brush.
HCURSOR – Handle to a cursor.
HDC – A device context handle.
HFONT – Handle to a font.
HINSTANCE – Handle to the application instance.
HMENU – Handle to a menu.
HPEN – Handle to a pen.
HWND – Handle to a window.
LPCSTR – 32-bit pointer to a constant null-terminated string of 8-bit Windows (ANSI) characters .
LPCWSTR – a 32-bit pointer to a constant string of 16-bit Unicode characters, which MAY be null-terminated
LPCTSTR – An LPCWSTR if UNICODE is defined, an LPCSTR otherwise.
LPSTR – A 32-bit pointer to a string of 8-bit characters, which MAY be null-terminated.
LPWSTR – is a 32-bit pointer to a string of 16-bit Unicode characters, which MAY be null-terminated.
LPTSTR – An LPWSTR if UNICODE is defined, an LPSTR otherwise
TCHAR – A WCHAR if UNICODE is defined, a CHAR otherwise.
LPARAM – A message parameter.
LRESULT – Value, returned by the window procedure of type long.
WPARAM – A message parameter.

For a full list of windows data types
https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types

Identifier Constants

Every windows program will feature a large number of identifiers. These are constants used to represent numerical values. These will typically be in uppercase and consist of a two or three letter prefix donating the general category, followed by an underscore and the name of the constant. An selection of type prefixes and associated message is listed below

Prefix Description Example
CS Class style CS_HREDRAW | CS_VREDRAW
CW Create window CW_USEDEFAULT CW_USEDEFAULT
DT Draw text DT_CENTER DT_LEFT DT_RIGHT
IDI Icon identifier IDI_ASTERISK IDI_ERROR IDI_HAND
IDC Cursor identifier IDC_ARROW IDC_HAND
MB Message box options MB_HELP MB_OK MB_OKCANCEL
SND Sound option SND_ASYNC SND_NODEFAULT
WM Window message WM_NULL WM_CREATE WM_DESTROY
WS Window style WS_OVERLAPPED WS_SYSMENU WS_BORDER

Naming Conventions

Microsoft follows a set naming conventions know as Hungarian notation. This a naming convention that uses short, lowercase prefixes to indicate the data type followed by the variable name, which begins with a capital letter. Function names should start with a capital letter and no type prefix. For further reading on MS coding style conventions

https://docs.microsoft.com/en-us/windows/win32/stg/coding-style-conventions

Character sets

Text and numbers are encoded in a computer as patterns of binary digits known as character codes. For computers to be able to communicate there must be an agreed standard that defines which character code is used for which character.  A complete collection of characters is a character set.  Two very common characters sets are ASCII and Unicode.

ASCII

ASCII is a character encoding system that can represent 128 characters. It uses 7 bits to represent each character since the first bit of the byte is always 0. The code set allows the use of 95 printable characters and 33 non-printable Control characters .

Extended ASCII

Although the 128 characters supported by standard ASCII are enough to represent all the standard English characters, it cannot represent all character special characters found in other languages. Extended ASCII uses eight bits to represent a character as opposed to seven.   Despite extended ASCII doubling the number of characters available, it does not include nearly enough characters to support all languages therefore other forms of character encoding such as unicode are now commonly used.

UNICODE

The Unicode Standard is a universal character-encoding standard that can represent data in any combination of languages by assigning a unique code, known as a code point, to every character and symbol in that language. A Unicode transformation format (UTF) is an algorithmic mapping of every Unicode code point to a unique byte sequence. The two most common Unicode implementations for encoding the Unicode standard are UTF-8 and  UTF-16. 

UTF-8 – A character in UTF8 can be from 1 to 4 bytes long. The first 128 Unicode code are the same as ASCII making it backward compatible. This backward compatible is particularly useful for older API functions. UTF-8 is the preferred encoding for e-mail and web pages.

UTF-16 –  is a variable length character set, with a minimum of two bytes(16 bits). UTF-16 is not backward compatible with ASCII. In Windows, strings are either ANSI, or UTF-16LE.

Unicode in the Windows API

Unicode has been standard in Windows since Widows NT.  Windows API functions that use or return a string are generally implemented in one of three formats: a version that is based on ANSI (called “A”), a wide version (called “W“) to deal with Unicode and a generic function prototype. The generic prototype gets resolved into one explicit function prototypes at compile time with the root function name having a single character suffix added at compile time. For instance root function CreateWindowEx can be suffixed with a ‘A’ (indicating ANSI) or ‘W’ (indicating Unicode) depending on the compilation environment. 

Working with Strings

C++ has 4  built-in character types: char, wchar_t, char16_t and char32_t. Both C and C++ introduced fixed-size character types char16_t and char32_t in 2011 to deal with the UTF-16 and UTF-32 formats.  Since the width of wchar_t is compiler-specific any program that needs to be compiler portable should avoid using wchar_t for storing Unicode text.

Any string literal should also use the prefix L,u or U to indicate a wchar_t, char16_t and char32_t character string -

char *ascii_example = "This is an ASCII string.";
wchar_t *Unicode_example = L"This is a wide char string.";
char16_t * char16_example = u"This is a char16_t Unicode string.";
char32_t * char32_example = U"This is a char32_t Unicode string.";

TCHAR and the TEXT MACRO

To make applications portable between Unicode and non-Unicode systems, Microsoft introduced the macro TCHAR. When a developer needs to support both a Unicode and earlier non-Unicode compliant operating systems, the use of TCHAR enables the compilation of the same code in either environment by automatically mapping strings to Unicode or ANSI. To complement TCHAR, the TEXT()  or  _T() macro can automatically define a string as being Unicode or ANSI. For example

TCHAR *autostring = TEXT("This message can be either ASCII or UNICODE!");

For further detailed reading on dealing with character encoding - https://docs.microsoft.com/en-us/windows/win32/learnwin32/working-with-strings


Creating a Simple Window | Common Elements | Data Types and Character Sets | The Device Context | Graphics Device Interface | Displaying Text | Displaying Graphics | Mapping Modes | Keyboard Input | Working with the Mouse | Menus | Child Windows | ScrollBar Control | The Dialog Box | Windows Message Box | Common Dialog Box | Bitmaps | Common Controls | Creating a Toolbar | Multiple Document Interface | Timers | DLL’s | Creating Custom Controls | Owner Drawn Controls | API Hooking and DLL Injection | File Management Functions | String Manipulation | System Information Functions