The gen on function calling conventions.

If you want to see the list of sections in this document, to more easily navigate to a specific section, enable the navigation bar support in your WWW browser.

In machine code, a function is called with a JSR or CALL instruction of some kind, and the function returns to its caller with a RTS or RET instruction of some kind. If one is writing assembly language, these menmonics are what one writes. But when one writes a function call with a (compiled) high-level language such as C, C++, Pascal, Modula, Fortran, and so forth, this is insufficient. In high level languages, functions have parameters and return values. Even in assembly language, programmers need to know, for example, what processor registers a function needs, and is liable to have modified upon return to the caller. (Macro assemblers, such as Borland's Turbo Assembler, provide high-level-language-like directives such as ARG for dealing in function parameters much like a high-level language.)

A function calling convention is a convention that specifies how the parameters are passed to the function when it is called, how the return value is returned to the caller, and what the processor register state is expected to be upon entry to and exit from the function. It is an interface contract between function callers and the called function.

The interface contract encompasses several aspects:

The distance to the function. This is what sort of return address the caller pushes onto the call stack, for the called function to pop off via its return instruction: a "far" address that is a full, long form, machine address, or "near" address that is partial, short form, machine address.
How parameters are passed. Parameters to a function are variously passed as additional words pushed onto the call stack or as data in CPU registers. Calling conventions dictate which parameters are passed where, and in what order.
How return values are retrieved. Return values from functions are variously retrieved by a caller from CPU registers; from pre-allocated areas of memory (usually on the call stack) whose addresses the caller specified as additional, hidden, parameters to the call; or from statically allocated areas of memory belonging to the called function.
Any additional stack framing. Additional stack framing can comprise things such as extra registers pushed onto the call stack that the called function is expected to pop off with its return instruction.
What processor state may have been modified. The interface contract includes a specification of which processor registers a caller can expect to have been modified by the called function, and which the called function guarantees to preserve across any call.

Although they are, strictly speaking, not parts of such interface contracts, other things are often specified via the same mechanisms (i.e. extension keywords, linkage specifiers, and so forth) that are used in high-level languages to specify calling conventions:

The naming convention for the function. This is how the name of the function in the high-level language is translated to a symbol name in object code, for calling code to be linked to by the linker. Together, the naming convention and the calling convention comprise the nebulous C and C++ language concept of "linkage".
The function prologue and epilogue. These are internal matters for a called function. But the mechanisms used by a compiler for a high-level language to control function prologues and epilogues are usually grouped with the mechanisms for controlling calling conventions.

As stated, calling conventions encompass what CPU registers the caller can expect to change and to remain the same across a function call. This is the notion of register volatility:

Volatile registers are freely modifiable by the called function. If a caller has a value that it wishes to preserve in a volatile register before a call, it must execute code to save and restore the contents of that register before and after the call to the function.
Non-volatile registers are guaranteed to be preserved by the called function. If the called function wishes to use a non-volatile register for some purpose, it must comprise code to save the value of that register as it was upon function entry, in order to restore it in any of its code paths for function exit.

Calling conventions can be grouped into three main categories:

Calling conventions dictated by the platform. These are the calling conventions required by the platform, such as the calling convention for functions in the operating system's own API. Most compiled languages targetting a platform will, of simple necessity, support the platform's system API calling convention.
Calling conventions dictated by the architecture. These are calling conventions tailored for a particular use on a particular instruction set architecture, irrespective of operating system.
Compiler-specific calling conventions. These are calling conventions that are provided by one, or more, compilers, either as "go faster" features or as simple alternatives. Some are the calling conventions employed by functions in the compiler's own runtime libraries, where those conventions differ from any platform or architecture calling convention.

Platform-defined calling conventions

Platform-defined calling conventions usually comprise just system API calling conventions. Platforms generally don't dictate what the calling conventions are for things that are not the system API on those platforms.

System API calling conventions are, as stated, the calling conventions required by the API for the operating system itself. In any compiled high-level language that supports program code calling operating system API functions directly, the compiler either has to use the system API calling convention as its default calling convention, or provide some mechanism for functions to be declared as having the system API calling convention.

Using something other than the system API calling convention is one of mistakes to avoid when designing DLLs. It's also a mistake to avoid when designing statically-linked libraries that are designed to be callable by code written in arbitrary languages and compiled with arbitrary compilers.

OS/2 system API calling conventions

The OS/2 system API calling convention is, technically, "APIENTRY". That is the macro, defined by the system API C and C++ language header <os2.h>, that expands to whatever the compiler's particular keywords for specifying the requisite calling convention actually are.

"APIENTRY" is, however, two distinct calling conventions, one for 16-bit OS/2 and one for 32-bit OS/2, whose calling conventions differ from each other. The APIENTRY macro expands to two different sets of things, depending from whether one is using the 16-bit OS/2 Developers' Toolkit for OS/2 version 1.x or the 32-bit OS/2 Developers' Toolkit for OS/2 version 2.x.

This is further compounded by the fact that the 32-bit OS/2 version 2.x code can still call the 16-bit OS/2 system API if it wants to. The 32-bit OS/2 Developers' Toolkit's <os2.h> header provides the APIENTRY16 macro, for specifying the 16-bit OS/2 system API calling convention for function declarations in 32-bit code.

32-bit OS/2 system API calling convention

The 32-bit APIENTRY calling convention is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer types are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7) GS
The following processor registers are non-volatile: EBX ESI EDI EBP ESP CS DS ES FS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

16-bit OS/2 system API calling convention

The 32-bit APIENTRY16 macro corresponds to the following compiler-specific calling convention specifiers:

Borland C/C++ for OS/2: __far16 __pascal keywords
Watcom C/C++ (32-bit compiler): __far16 __pascal keywords (a.k.a. _far16 _pascal and _Far16 _Pascal)
IBM VisualAge for C/C++ for OS/2: _Far16 _Pascal keywords

The 16-bit APIENTRY macro corresponds to the following compiler-specific calling convention specifiers:

Watcom C/C++ (16-bit compiler): __far __pascal keywords (a.k.a. _far _pascal and far pascal)
IBM CSet C/C++ for OS/2: far pascal keywords
Microsoft C/C++ version 6 for OS/2: far pascal keywords

The 16-bit APIENTRY calling convention is as follows:

Arguments are pushed onto the call stack by the caller in left-to-right lexical order.
Arguments of 8-bit integer types are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: AX BX CX DX ES ST(0)–ST(7)
The following processor registers are non-volatile: SI DI BP SP (see below) CS DS FS GS
The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a RETF n instruction, popping the far return address and n additional bytes off the stack as it returns. So whilst non-volatile register SP won't, strictly speaking, have its value exactly preserved across a call, it will be changed by a fixed amount relative to the value that it had upon entry.
8-bit integer, 16-bit integer, and 0:16 pointer return values will be stored by the called function in the AX register.
32-bit integer and 16:16 pointer return values will be stored by the called function in the AX/DX register pair.
64-bit integers, 0:32 pointers, and 16:32 pointers cannot be returned.
For return values of structure or class type and of all floating point types, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.
The FPU registers are not used to return values.

Windows system API calling conventions

The Windows system API calling convention is, technically, "WINAPI". Similarly, the Windows application callback calling convention is, technically, either "CALLBACK" or "CALLBACK EXPORT". Those are the macros, defined by the system API C and C++ language header <windows.h>, that expand to whatever the compiler's particular keywords for specifying the requisite calling convention actually are. (CALLBACK was first introduced with the DOS-Windows version 3.1 Developers' Toolkit. Before then, the macros were "FAR PASCAL", and you might see a fair amount of code here and there that failed to catch up with the DOS-Windows 3.1 toolkit in this regard. It has been twenty years since then, however. It's been CALLBACK for Win16 for that long, and it has always been CALLBACK for Win32.)

"WINAPI" is, however, two distinct calling conventions, one for the Win16 API and one for the Win32 API, whose calling conventions differ from each other. Similarly, "CALLBACK (EXPORT)" is two distinct calling conventions, one for Win16 applications and one for Win32 applications. The WINAPI and CALLBACK macros expand to two different sets of things, depending from whether one is using the 16-bit Windows Developers' Toolkit for Win16 or the 32-bit Windows Developers' Toolkit for Win32.

This is further compounded by the fact that "CALLBACK" is more properly "CALLBACK EXPORT" or "CALLBACK loadds" on Win16. It is in fact three distinct calling conventions in its own right, dependent from whether the function is in a DLL or an EXE, and whether MakeProcInstance() needs to be called or the function is a "smart callback".

A final complication is that while Win16 only exists on x86 architectures, Win32 exists on x86, x86-64, IA64, Alpha, PowerPC, and MIPS architectures, each with their individual calling conventions. The latter three are mainly of academic interest nowadays, given that Windows NT has been discontinued for those processor architectures. This leaves just x86, x86-64, and IA64 whose Win32 calling conventions are of practical interest.

32-bit x86 Windows system API and application callback calling conventions

The 32-bit WINAPI and CALLBACK macros correspond to the following compiler-specific calling convention specifiers:

Borland C/C++ for OS/2: __stdcall keyword
Borland C/C++ for Windows (32-bit compiler): __stdcall keyword
MetaWare High C/C++ for Windows: Nothing.
Watcom C/C++ (32-bit compiler): __stdcall keyword (a.k.a. _stdcall and stdcall)
Microsoft Visual C/C++ for Windows: __stdcall keyword

This is the default calling convention for:

Borland C/C++ for OS/2
Borland C/C++ for Windows (32-bit compiler)
MetaWare High C/C++ for Windows

The 32-bit WINAPI and 32-bit CALLBACK calling conventions for x86 processors are identical, and are as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer types are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7)
The following processor registers are non-volatile: EBX ESI EDI EBP ESP (see below) CS DS ES FS GS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a (near) RET n instruction, popping the near return address and n additional bytes off the stack as it returns. So whilst non-volatile register ESP won't, strictly speaking, have its value exactly preserved across a call, it will be changed by a fixed amount relative to the value that it had upon entry.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.

For return values of structure or class type that are "plain old data structures" 33 to 64 bits in size, the called function stores the structure/class value in the EAX/EDX register pair.

Note: Several compilers get this part wrong, including Watcom C/C++/Fortran and Borland C/C++ for OS/2. They will erroneously expect 33 to 64 bit values to be returned in caller-allocated memory, not in registers. This is bug #490 in the OpenWatcom bug list.

For return values of structure or class type that are not "plain old data structures" or that are larger than 64 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

64-bit x86-64 Windows system API and application callback calling conventions

The 64-bit WINAPI and 64-bit CALLBACK calling conventions for x86-64 processors are identical. They are documented by Microsoft in the Visual C/C++ Programming Guide, and are as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments smaller than 64 bits are widened to 64 bits by padding at the MSB end. Note that this is widening, not promotion. The padding bits are not guaranteed by the caller to be zeroes.

Any arguments larger than 64 bits are stored in memory, required to be aligned to a 16-byte boundary, and their addresses, instead of their values, passed as parameters.

1-byte to 8-byte arguments of structure or class type are passed as if they were values of (8-bit to 64-bit) integer type. Values of larger structure or class types, or of 128-bit SIMD vector type, are treated as larger than 64-bit arguments and passed by reference.

Values of 32-bit or 64-bit floating point type are passed as 64-bit values. If the function is a variable-argument function and no type for a floating point argument has been declared, or if it is an unprototyped function, a value of 32-bit floating point type is promoted to a 64-bit floating point type. Otherwise the 32-bit value is simply padded.
The called function is responsible for keeping RSP aligned to a 16-byte boundary. If it knows that the parameters plus return address on its call stack are not a integer multiple of 16-bytes, it is responsible for making up the difference as it constructs its stack frame in its function prologue.
The first four arguments — either the arguments themselves or the addresses thereof — are stored in registers. The caller also pushes dummy values onto the call stack for those arguments, to allow the called function to easily spill the argument values from registers to memory should it need to use the registers for other purposes.

To account for variable-argument and unprototyped functions, the caller is required to always act as if at least four arguments are passed to every function, reserving space on the call stack for four arguments even if the declared parameter list for the function is shorter.

The registers for arguments 1 to 4 are, respectively, XMM(0)–XMM(3) for floating point values and RCX, RDX, R8, and R9 for all other types. If the function is a variable-argument function and no type for the argument has been declared, or if it is an unprototyped function, the caller must place the value in both registers for the argument, since it cannot know which register the called function will be expecting the value in.
The following processor registers are volatile: RAX RCX RDX R8 R9 R10 R11 XMM(0)–XMM(5) ST(0)–ST(7) x87 status word MXCSR status word
The following processor registers are non-volatile: RBX RSI RDI RBP RSP R12 R13 R14 R15 XMM(6)–XMM(15) GS x87 control word MXCSR control word
The direction (DF) flag in the RFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a RET instruction, popping the near return address off the stack as it returns.
8-bit integer, 16-bit integer, 32-bit integer, 64-bit integer, 1-byte to 8-byte structure, and pointer return values will be stored by the called function in the RAX register.
32-bit floating point, 64-bit floating point, and 128-bit SIMD vector return values will be stored by the called function in the XMM(0) register.
For return values of structure or class type larger than 8 bytes, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, as if declared before all other parameters and treated exactly as if it were another ordinary parameter. The called function writes the return value to this address, and returns the address in the RAX register.

16-bit Windows system API calling convention

The 16-bit WINAPI calling convention is as follows:

Arguments are pushed onto the call stack by the caller in left-to-right lexical order.
Arguments of 8-bit integer types are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: AX BX CX DX ES ST(0)–ST(7)
The following processor registers are non-volatile: SI DI BP SP (see below) CS DS FS GS
The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a RETF n instruction, popping the far return address and n additional bytes off the stack as it returns. So whilst non-volatile register SP won't, strictly speaking, have its value exactly preserved across a call, it will be changed by a fixed amount relative to the value that it had upon entry.
8-bit integer, 16-bit integer, and 0:16 pointer return values will be stored by the called function in the AX register.
32-bit integer and 16:16 pointer return values will be stored by the called function in the AX/DX register pair.
64-bit integers, 0:32 pointers, and 16:32 pointers cannot be returned.
For return values of structure or class type and of floating point type, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.
The FPU registers are not used to return values.

16-bit Windows application callback calling convention

The 16-bit CALLBACK EXPORT calling convention is as follows:

Arguments are pushed onto the call stack by the caller in left-to-right lexical order.
Arguments of 8-bit integer types are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: AX BX CX DX ES ST(0)–ST(7)
The following processor registers are non-volatile: SI DI BP SP (see below) CS DS FS GS
The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a RETF n instruction, popping the far return address and n additional bytes off the stack as it returns. So whilst non-volatile register SP won't, strictly speaking, have its value exactly preserved across a call, it will be changed by a fixed amount relative to the value that it had upon entry.
8-bit integer, 16-bit integer, and 0:16 pointer return values will be stored by the called function in the AX register.
32-bit integer and 16:16 pointer return values will be stored by the called function in the AX/DX register pair.
64-bit integers, 0:32 pointers, and 16:32 pointers cannot be returned.
For return values of structure or class type and of floating point type, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.
The FPU registers are not used to return values.
The function prologue must be preceded by the following 3-byte sequence, which the Windows image loader expects to be able to overwrite with 3 nop instructions:
```
push ds
pop ax
nop
```
And the prologue and epilogue themselves must comprise code to set the (non-volatile) DS register to whatever the AX register is on entry to the prologue:
```
push ds
mov ds,ax
…
pop ds
```
The value of AX on entry to a function is modified by an "instance thunk" created by MakeProcInstance(). (Hence AX is a volatile register, but not one that could be used for potentially storing argument values in a "go faster" modification to the calling convention.)

C/C++ compiler options controlling what the calling conventions are for CALLBACK EXPORT functions.

Generate the original "load DS from DS" prefix code, requiring MakeProcInstance() in order to be callable:
- Borland C/C++ for Windows (16-bit compiler): Not available.
- Watcom C/C++/Fortran (16-bit compiler): -zW (with -zw, all CALLBACK functions are treated as if they were CALLBACK EXPORT functions)
- Microsoft C/C++ version 7 for Windows: /GEa
- Digital Mars C/C++ for Windows (16-bit compiler): -Wa
Generate "load DS from DGROUP" prefix code, suitable for DLLs only.
- Borland C/C++ for Windows (16-bit compiler): -WE or -WDE (with -W and -WD, all CALLBACK functions are treated as if they were CALLBACK EXPORT functions)
- Watcom C/C++/Fortran (16-bit compiler): Not available.
- Microsoft C/C++ version 7 for Windows: /GEd
- Digital Mars C/C++ for Windows (16-bit compiler): -Wd
Generate "load DS from SS" prefix code, suitable for EXEs only.
- Borland C/C++ for Windows (16-bit compiler): -WSE (with -WS, all CALLBACK functions are treated as if they were CALLBACK EXPORT functions)
- Watcom C/C++/Fortran (16-bit compiler): -zWs
- Microsoft C/C++ version 7 for Windows: /GEs
- Digital Mars C/C++ for Windows (16-bit compiler): -Ws

As Raymond Chen reports, Michael Geary discovered a trick in 1989 that did away with the necessity for MakeProcInstance() entirely. This trick was later incorporated into Borland's, Watcom's, Microsoft's, and the Digital Mars compilers. The upshot of the trick was this:

Callback functions in EXEs still have to be declared CALLBACK EXPORT. However, the compilers can be told to recognize such functions and turn them into "smart callbacks" that have the following replacement 3-byte prefix to the prologue:
```
mov ax,ss
nop
```
Callback functions in DLLs can be declared CALLBACK EXPORT. However, the compilers can be told to recognize such functions and turn them into "smart callbacks" that have the following replacement 3-byte prefix to the prologue:
```
mov ax,DGROUP
nop
```
Callback functions in DLLs can be instead declared CALLBACK loadds, which again causes the the following replacement 3-byte prefix to the prologue to be used:
```
mov ax,DGROUP
nop
```

The Windows image loader will not overwrite any of these latter 3-byte prefixes, since they are not forms that it recognizes.

Unfortunately, however, there is no longer, with this modified "smarter" perilogue, a uniform approach that can be used in both DLLs and EXEs. Code for a CALLBACK EXPORT function compiled for a DLL cannot be linked into and properly used in an EXE, and vice versa. EXEs will have different DGROUPs for different instances, and in DLLs SS is not equal to the DLL's data segment. Hence, any statically-linked library providing CALLBACK EXPORT functions has to be provided in two forms, one for linking into EXEs and one for linking into DLLs.

On the gripping hand, generally only window procedures will be CALLBACK EXPORT, and it is more usually the case for statically-linked libraries to provide functions that use the WINAPI or some other calling convention, to which all of these considerations for CALLBACK EXPORT do not apply.

Unix system API calling conventions

Given the plethora of processor architectures that Unices and Linux are available for, there isn't a single system API calling convention for Unices and Linux, as there is for OS/2, Win32, and Win16. Rather, Unices and Linux adhere to what is known as an ABI, an Application Binary Interface. Most ABIs are descended from the AT&T System V Unix ABI definitions. They are defined, for Unix, for a given processor architecture, and any Unix or Linux for that processor architecture will usually adhere to the architecture's Unix ABI in its system call library.

This hasn't stopped some Unix vendors, such as Apple, defining their own individual ABIs for various processor architectures, that are specific to their operating systems.

To an extent there's a "Unix-tinted glasses" effect that Unix programmers suffer from when talking about Unix ABIs. They can tend to paint them as architecture-specific rather than as platform-specific. However, as can be seen from the differences between them and the OS/2, Win16, and Win32 calling conventions for the same architectures, they really are platform-specific, pertaining to Unix in particular and not to an entire processor architecture in general.

This is often given away by the formal full names of the ABI specifications, which explicitly state their platform-specific natures. For example: The Apple ABI is, formally, the MacOS X ABI (i.e. specific to the MacOS 10 platform), which is in turn based upon the System V Application Binary Interface (i.e. the ABI specifically for AT&T Unix System V).

MacOS version 10 x86 ABI calling convention

The x86 ABI for MacOS version 10 (documented by Apple in the MacOS 10 Reference Library) includes the following calling convention, as used by the system API library:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments pushed onto the call stack are padded up to the next multiple of 4 bytes (for integers and pointers), and the argument block as a whole is padded (at the right-hand end) to keep the stack pointer that the called function receives, on entry, aligned to a multiple of 16 bytes.
Special treatment is given to 128-bit SIMD vector values, passing them in the processor's SIMD registers (XMM(0)–XMM(3)) if possible.
The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7)
The following processor registers are non-volatile: EBX ESI EDI EBP ESP CS DS ES FS GS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer, and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.

For return values of structure or class type that are "plain old data structures" 33 to 64 bits in size, the called function stores the structure/class value in the EAX/EDX register pair.

For return values of structure or class type that are not "plain old data structures" or that are larger than 64 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

64-bit x86-64 System V Unix ABI calling convention

The 64-bit System V Unix ABI calling convention for x86-64 processors (documented by AMD) is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments smaller than 64 bits are promoted to 64 bits by padding at the MSB end.
The argument passing conventions are insanely overcomplex, making the Win32 x86-64 system API calling convention seem simple by comparison. Arguments are classified according to a byzantine set of rules, laid out over four pages in AMD's ABI, that end up with their eventually being classified as INTEGER, MEMORY, X87, COMPLEX, 8-byte SSE, or 16-byte AVX:
- MEMORY arguments are pushed onto the call stack, passing them by value.
- X87 and COMPLEX arguments are stored in memory, with their addresses pushed onto the call stack, passing them by address.
- INTEGER arguments are passed in the first available register of RDI, RSI, RDX, RCS, R8, and R9, used in that order (with no gaps); or on the call stack if all registers have been used.
- 8-byte SSE arguments are passed in the first available register of XMM(0)–XMM(7), used in that order (with no gaps); or on the call stack if all registers have been used.
- 16-byte AVX arguments are passed in the first available register of YMM(0)–YMM(7), used in that order (with no gaps); or on the call stack if all registers have been used. Use of an XMM register by an 8-byte SSE argument precludes use of the overlapping YMM register by a 16-byte AVX argument.
Note that the rules are not straightforward and obvious. Whether an argument of structure or class type ends up as MEMORY or INTEGER depends from what sort of C++ constructor it has, for example. Structures are classified by their data members, so potentially a structure type could end up classified as X87, SSE, AVX, or COMPLEX. Some floating point types are SSE, and some X87, and this is not a simple cut-off at one particular bit-size. And there are non-trivial rules on how 128-bit integer and 80-bit floating point types are decomposed across registers and in words on the call stack.

For variable-argument and unprototyped functions, the caller is required to place the number of registers used to pass arguments in the AL register, so that the called function can know when to stop looking in registers for arguments and when to start looking on the call stack. Variable-argument and unprototyped functions thus have to compare the argument number that they are looking for against the contents of the AL register in order to work out where to find each argument. (Contrast this with the Win32 x86-64 system API calling convention, which requires that the caller populate all of the registers that the called function might look in, and always reserve stack locations corresponding to register parameters, allowing a called function to hardwire the register or stack location that it accesses for any given parameter, without it having to worry about how many registers were actually used by the caller.)
Arguments pushed onto the call stack are padded up to the next multiple of 4 bytes, and the argument block as a whole is padded (at the right-hand end) to keep the stack pointer that the called function receives, on entry, aligned to a multiple of 16 bytes.
The following processor registers are volatile: RAX RCX RDX RSI RDI R8 R9 R10 R11 XMM(0)–XMM(15) ST(0)–ST(7) x87 status word MXCSR status word
The following processor registers are non-volatile: RBX RBP RSP R12 R13 R14 R15 x87 control word MXCSR control word
The processor must be in x87 mode on entry to and on exit from the function.
The direction (DF) flag in the RFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a RET instruction, popping the near return address off the stack as it returns.
The return value passing conventions are, again, insanely overcomplex, making the Win32 x86-64 system API calling convention seem simple by comparison. Return values are classified according to the same byzantine set of rules as arguments are, again ending up with their eventually being classified as INTEGER, MEMORY, X87, COMPLEX, 8-byte SSE, or 16-byte AVX:
- For MEMORY return values the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, treated as an extra parameter coming before all others (and so affecting the classifications of arguments passed to the function). The called function writes the return value to this address, and returns the address in the RAX register.
- INTEGER return values are stored by the called function in the first available register of RAX and RDX, in that order.
- X87 return values are stored by the called function in the ST(0) register.
- COMPLEX return values are stored by the called function in the ST(0) and ST(1)registers.
- 8-byte SSE return values are stored by the called function in the first available register of XMM(0) and XMM(1), in that order.
- 16-byte AVXE return values are stored by the called function in the first available register of YMM(0) and YMM(1), in that order. Use of an XMM register by an 8-byte SSE return value, or an ST register by an X87 return value, precludes use of the overlapping YMM register by a 16-byte AVX return value.

IA64 System V Unix ABI calling convention

The System V Unix ABI calling convention for IA64 processors is documented by Intel.

Architecture-defined calling conventions

Architecture-defined calling conventions are, as stated, calling conventions defined by the instruction set architecture itself, independently of the platform.

x86 interrupt calling conventions

The "interrupt" calling convention for the x86 family of processors is the calling convention required by the instruction set architecture for functions that are to be used directly as processor interrupt handling functions.

The interrupt calling convention is as follows:

There are no arguments passed by the caller.
No processor registers are volatile.
The following processor registers are non-volatile: EAX EBX ECX EDX ESI EDI EBP ESP CS DS ES FS GS
The called function will exit with a IRET or IRETD instruction.
There are no return values returned to the caller.

Although the caller is unable to pass arguments to an interrupt function, in high-level languages interrupt functions can (if the compiler supports doing so) operate as if a caller had passed, as arguments, the processor registers as they were on entry to the function, allowing direct access to those registers values from within the function code as if they were ordinary function parameters. On Watcom C/C++, for example, an interrupt function can be declared with a union INTPACK argument. The compiler automatically generates code in the called function to save all of the register values upon entry to the function, in such a way that they can be accessed as if they were normal function parameters.

This calling convention, as provided by C and C++ compilers, also introduces an otherwise entirely foreign concept into the C and C++ languages: pass-by-value-return parameters. C and C++ function parameters are either pass-by-value or pass-by-reference. But the register block parameters to an interrupt function are pass-by-value-return. The values in the parameters are copied back into the processor registers upon function exit. Thus, and oddly for the C and C++ languages, modifications to what appear, syntactically, to be parameters passed by value actually propagate out to the original values outside of the function being called.

Compiler-defined calling conventions

Compiler-defined calling conventions are, as stated, calling conventions defined by a particular compiler. Compilers often define their own calling conventions, in addition to the platform-defined ones. Mainly this is in cases where the compiler-defined calling conventions are in some respects superior to the platform-defined ones, since they can take advantage of the fact that they are usually language-specific, whereas the platform-defined calling conventions have to be language-neutral.

There are two main groupings of compiler-defined calling conventions:

compiler-defined calling conventions that attempt to provide a common "Microsoft C-like" calling convention
compiler-defined calling conventions that take platform-defined stack-based calling conventions and add "go faster" register-based improvements to them

In most cases, a compiler-defined calling convention is in fact the default calling convention used by the compiler, and platform-defined or architecture-defined calling conventions have to be explicitly specified either by compiler command-line options or via explicit calling convention specifiers in the program source code.

Except for the "Microsoft C-like" calling convention, most compiler-defined calling conventions are also compiler specific, in that there's no way to specify exactly the same calling convention with another compiler. Sometimes the keywords look the same, but the conventions differ. Sometimes there simply isn't a keyword at all.

Even Watcom C/C++, whose #pragma aux facility allows programmers to define new keywords for user-specified calling conventions, is not flexible enough to be capable of all compiler-defined calling conventions. (A programmer cannot define IBM's Optlink convention with Watcom C/C++'s #pragma aux, for example, since it has no mechanism for specifying either placeholders on the call stack or arguments passed in the CPU registers. Watcom C/C++ contains an internal bodge for working around this that is not expressible in source code form.)

"common" `cdecl` calling conventions

The cdecl calling convention is a last vestige of the days of PC/MS/DR-DOS. Back before the existence of Windows NT, OS/2, or even DOS-Windows, the only (major) operating system API in the world of the IBM PC/AT compatible was the PC/MS/DR-DOS system API. The DOS system API, unlike the system APIs of its aforementioned successors, was not directly callable from high-level languages. Therefore the language bindings to the PC/MS/DR-DOS API comprised wrapper functions, private to the runtime library of each language.

The DOS system API's calling conventions were thus language-defined rather than platform-defined. The DOS API bindings for Microsoft C had one calling convention. The DOS API bindings for Microsoft PASCAL had another. Microsoft FORTRAN and COBOL had specific calling conventions for their DOS API wrappers, too. Thus came about calling conventions known as cdecl, pascal, fortran, and so forth. Each was whatever calling convention the Microsoft compiler for that language used for its DOS API wrapper functions.

The notion of language-defined calling conventions disappeared over twenty years ago. Operating systems with system APIs that were directly callable from high-level languages appeared, and by the time of the DOS-Windows 3.1 Developers' Toolkit, which finally replaced FAR PASCAL with CALLBACK, the notion of the platform-defined calling convention had almost entirely replaced the notion of the language-defined calling convention.

The one remaining vestige of language-defined calling conventions is the common cdecl convention still supported by most C/C++ compilers for x86, both 16-bit and 32-bit. The calling convention remains "whatever Microsoft C used to do" for its 16-bit DOS API wrappers.

Although commonly thought of as the default calling convention for C and C++ compilers, the cdecl calling convention is actually not the default in all but a very few C/C++ compilers. Everyone else defaults to their own compiler-defined calling conventions, defaults to to whatever the target platform's platform-defined system API calling convention is, or (in one oddball case) defaults to the Win32 system API calling convention irrespective of target platform. The reality is that the cdecl calling convention is, everywhere outside of Microsoft's own compilers and 16-bit Borland C/C++, an alternative non-default compatibility option, for doing "whatever Microsoft C used to do" if that is desired.

16-bit `cdecl` calling convention

The 16-bit cdecl calling convention is, as stated, "whatever Microsoft C used to do" for its 16-bit DOS API wrappers, and is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit integer type are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: AX BX CX DX ST(0)–ST(7) ES
The following processor registers are non-volatile: SI DI BP SP CS DS FS GS
The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, and near pointer return values will be stored by the called function in the AX register.
64-bit integer, 32-bit integer, and far pointer return values will be stored by the called function in the AX/DX register pair.
The FPU registers are not used to return values.
For return values of structure or class type, or of floating-point type, the called function is expected to allocate space for a value of that type, writing the return value to this address, and returning the address in the AX register.

Because the called function cannot use the call stack for this, since the call stack pointer is restored by the caller after function exit, and because it cannot use the heap either, it must use a fixed-address portion of static non-constant data storage for such structure values. This makes the 16-bit cdecl calling convention inherently not thread-safe when it comes to returning values of structure, class, or floating-point type.

32-bit `cdecl` calling convention

The 32-bit cdecl calling convention is provided by 32-bit compilers that have 16-bit predecesors, and is the logical extension of the 16-bit cdecl calling convention to 32-bits, albeit that there was, of course, no 32-bit Microsoft C for DOS and thus nothing to mimic. It differs from the 16-bit calling convention in several major respects:

The EBX register is non-volatile.
Whether the FS register is non-volatile depends both from what platform the compiler is targetting and what compiler is being employed. (The Win32 and OS/2 platforms dictate the volatility of FS. But on extended DOS, the choice is left to compilers, and different compilers have different choices.)
The return of floating-point values is thread-safe.
The return of structure or class type values is incompatible across different compilers.

The full calling convention, including the variations across compilers and target platforms, is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer type are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7) ES GS

With the extended-DOS-targetting DJGPP C/C++ compilers, so too are: FS
The following processor registers are non-volatile: EBX ESI EDI EBP ESP CS DS

With the extended-DOS-targetting Watcom C/C++/Fortran compilers, so too are: FS

With all OS/2-targetting and Win32-targetting compilers, so too are: FS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and near pointer return values will be stored by the called function in the EAX register.
64-bit integer, and far pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type, there is wide incompatibility amongst compilers. Some make the return thread-safe, by breaking compatibility with the 16-bit cdecl calling convention. Some retain compatibility, at the expense of their 32-bit cdecl calling convention not being thread-safe. The ones that break compatibility don't all agree with one another on how to do so.

Watcom C/C++/Fortran (32-bit compiler) simply does the same as for the 16-bit cdecl calling convention:
- For return values of structure or class type, the called function is expected to allocate space for a value of that type, writing the return value to this address, and returning the address in the EAX register.
Borland C/C++ for OS/2 and Borland C/C++ for DOS/Windows (32-bit compiler) both do the same (erroneous) thing that they do for the 32-bit x86 WINAPI calling convention (i.e. Their cdecl calling conventions match their default calling conventions in this particular regard.):
- For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.
- For return values of structure or class type that are not "plain old data structures" or that are larger than 32 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.
Microsoft Visual C/C++ does the same as what it does for the 32-bit x86 WINAPI calling convention and its own Fastcall calling convention (i.e. It does the same across all three conventions.):
- For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.
  
  For return values of structure or class type that are "plain old data structures" 33 to 64 bits in size, the called function stores the structure/class value in the EAX/EDX register pair.
  
  For return values of structure or class type that are not "plain old data structures" or that are larger than 64 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

IBM's `Optlink` calling convention

The Optlink calling convention is the default calling convention used by several of IBM's x86 compilers — including VisualAge for C/C++ for OS/2, VisualAge for C/C++ for Windows, and COBOL for Windows. It is a "go faster" variant of the 32-bit OS/2 system API (APIENTRY) calling convention, that places up to three integer and (near) pointer and up to four floating point arguments in CPU registers (but with the same space left for them on the stack as would be in APIENTRY).

The full convention (documented by IBM for its COBOL for Windows compiler) is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer types are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.

The first three (32-bit) integer arguments are stored in, respectively, the EAX, ECX, and EDX registers. The first four floating point arguments are stored in, respectively, the ST(0)–ST(3) registers. The caller also pushes dummy values onto the call stack for each of those arguments, to allow the called function to easily spill the argument values from registers to memory should it need to use the registers for other purposes. (Generally the calling code simply subtracts the sizes of the register arguments from ESP and doesn't bother touching the stack locations corresponding to those arguments.)
The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7)
The following processor registers are non-volatile: EBX ESI EDI EBP ESP CS DS ES FS GS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters (and not treated as a normal parameter, that can be passed in a register). The called function writes the return value to this address, and returns the address in the EAX register.

Watcom's `Watcall` calling conventions

The Watcall calling convention is the default calling convention used by Watcom's x86 C, C++, and Fortran compilers. It is, in fact, two different calling conventions. The default calling convention used by the compiler, and the meaning of the __watcall keyword, vary according to whether the -3r/-4r/-5r or the -3s/-4s/-5s compiler options are used. The r variants specify a register-based Watcall calling convention, and the s variants specify a stack-based Watcall calling convention. (The default, if no option is specified, is register-based.)

The inability to specify the "other" Watcall calling convention with a keyword means that the C/C++ library headers declaring C/C++ library functions declare them with one of two calling conventions according to what command-line option was specified. Watcom C, C++, and Fortran thus ship with two sets of runtime libraries, one compiled with the register-based Watcall calling convention and one compiled with the stack-based Watcall calling convention.

The register-based Watcall calling convention is essentially a "go faster" version of the stack-based Watcall calling convention, that places up to four integer and (near) pointer arguments and (POD) structure/class type return values in CPU registers instead of on the stack or in memory.

Both the register-based and the stack-based conventions come in 16-bit and 32-bit flavours, analogous to each other but differing in how floating-point values are returned in non-FPU registers, whether (E)DX is used for return values, and whether (some) floating point arguments can be passed in registers. Both also come in near and far forms, that differ solely in the distance attribute of the function.

The following descriptions describe the actual calling convention, as employed by code compiled with the Watcom C/C++/Fortran compilers and by the code in their C/C++/Fortran run-time libraries. Note that there are significant differences between these descriptions and what the OpenWatcom documentation states. The OpenWatcom documentation does not correctly describe the actual output generated by the compilers. The following descriptions are based upon actual observations of generated code and inspection of the source code for the OpenWatcom 1.x compilers.

Given that the descriptions are of what the run-time library code calling conventions are, the following descriptions do not take the -zdp, -zdf, -zfp, -zff, -zgp, and -zgf compiler options into account. Although these options directly modify the effect of the __watcall keyword, and the compiler's default calling convention, they do not change what libraries are linked to, or the code in those libraries. In fact, they will cause linkage to the C, C++, and Fortran run-time libraries to break, because they will alter how the header files effectively declare the run-time library functions (which are declared using the __watcall keyword), and thus how application code will generate calls to run-time library functions.

Conversely, the -r compiler option is taken into account in the following, because it relates to a modification to the Watcall calling convention that is existent in run-time libraries. This option causes the Watcom C/C++/Fortran version 9, 10, and 11 compilers, and the OpenWatcom 1.x compilers, to revert to the same Watcall calling convention as used by the Watcom C/C++/Fortran version 8 compiler, whose run-time libraries are compiled to employ that particular calling convention.

Watcom's 16-bit stack-based `Watcall` calling convention

The 16-bit stack-based Watcall calling convention is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit integer type are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.

The following processor registers are volatile: AX CX DX ST(0)–ST(7)

In addition, and if the -r option is not in effect, the following processor registers are volatile, according to memory model, target platform, language, and (even!) host platform:

Memory model	32-bit C compiler (with `__far16`)		16-bit C compiler		32-bit C++ compiler (with `__far16`)		16-bit C++ compiler
Memory model	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets
Flat/Tiny	`GS` `FS`	`GS` `FS`	None	None	`FS` `GS`	`GS`	`ES`	`ES`
Small	`ES` `FS` `GS`	`ES` `FS` `GS`	`ES`	`ES`	`FS` `GS`	`GS`	`ES`	`ES`
Compact	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`ES`	`DS` `ES`	`DS` `FS` `GS`	`DS` `GS`	`ES`	`DS` `ES`
Medium	`ES` `FS` `GS`	`ES` `FS` `GS`	`ES`	`ES`	`FS` `GS`	`GS`	`ES`	`ES`
Large	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`ES`	`DS` `ES`	`DS` `FS` `GS`	`DS` `GS`	`ES`	`DS` `ES`
Huge	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`ES`	`DS` `ES`	`DS` `FS` `GS`	`DS` `GS`	`ES`	`DS` `ES`

The following processor registers are non-volatile: BX SI DI BP SP CS

The following processor registers are also conditionally non-volatile:

If the -r option is in effect: DS ES FS GS

Otherwise, according to memory model, target platform, language, and (even!) host platform:

Memory model	32-bit C compiler (with `__far16`)		16-bit C compiler		32-bit C++ compiler (with `__far16`)		16-bit C++ compiler
Memory model	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets
Flat/Tiny	`DS` `ES`	`DS` `ES`	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `ES`	`FS` `DS` `ES`	`DS` `FS` `GS`	`DS` `FS` `GS`
Small	`DS`	`DS`	`DS` `FS` `GS`	`DS` `FS` `GS`	`DS` `ES`	`FS` `DS` `ES`	`DS` `FS` `GS`	`DS` `FS` `GS`
Compact	None	None	`DS` `FS` `GS`	`FS` `GS`	`ES`	`FS` `ES`	`DS` `FS` `GS`	`FS` `GS`
Medium	`DS`	`DS`	`DS` `FS` `GS`	`DS` `FS` `GS`	`DS` `ES`	`FS` `DS` `ES`	`DS` `FS` `GS`	`DS` `FS` `GS`
Large	None	None	`DS` `FS` `GS`	`FS` `GS`	`ES`	`FS` `ES`	`DS` `FS` `GS`	`FS` `GS`
Huge	None	None	`DS` `FS` `GS`	`FS` `GS`	`ES`	`FS` `ES`	`DS` `FS` `GS`	`FS` `GS`

The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near or far) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, and 0:16 pointer return values will be stored by the called function in the AX register.
64-bit integer, 32-bit integer, and 16:16 pointer return values will be stored by the called function in the AX/DX register pair.
If the -fpi or -fpi87 compiler option is in effect:
- Floating point return values will be stored by the called function in the ST(0) FPU register.
If the -fpc compiler option is in effect:
- 32-bit floating point return values are stored by the called function in the AX/DX register pair. 64-bit floating point return values are stored by the called function in the AX/BX/CX/DX register quadruplet. It is not possible to return 80-bit floating point values.
- The FPU registers are not used to return values.
For return values of structure or class type, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the AX register.

Watcom's 16-bit register-based `Watcall` calling convention

The 16-bit register-based Watcall calling convention is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit integer type are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The first four (16-bit) integer or 0:16 pointer arguments are stored in, respectively, the AX, BX, CX, and DX registers. Register allocation proceeds in left-to-right lexical order and stops at the first argument enountered that is not a (16-bit) integer or 0:16 pointer.

The caller does not push dummy values onto the call stack for any of those arguments. If the called function needs to spill the argument values from these registers to memory, it has to supply space itself.

If the -r option is in effect, no processor registers are volatile, not even the ones used by the caller to store parameter values. Otherwise, the following processor registers are volatile, according to memory model, target platform, language, and (even!) host platform:

Memory model	32-bit C compiler (with `__far16`)		16-bit C compiler		32-bit C++ compiler (with `__far16`)		16-bit C++ compiler
Memory model	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets
Flat/Tiny	`GS` `FS`	`GS` `FS`	None	None	`FS` `GS`	`GS`	`ES`	`ES`
Small	`ES` `FS` `GS`	`ES` `FS` `GS`	`ES`	`ES`	`FS` `GS`	`GS`	`ES`	`ES`
Compact	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`ES`	`DS` `ES`	`DS` `FS` `GS`	`DS` `GS`	`ES`	`DS` `ES`
Medium	`ES` `FS` `GS`	`ES` `FS` `GS`	`ES`	`ES`	`FS` `GS`	`GS`	`ES`	`ES`
Large	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`ES`	`DS` `ES`	`DS` `FS` `GS`	`DS` `GS`	`ES`	`DS` `ES`
Huge	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`ES`	`DS` `ES`	`DS` `FS` `GS`	`DS` `GS`	`ES`	`DS` `ES`

The following processor registers are non-volatile: AX BX CX DX SI DI BP SP CS ST(0)–ST(7)

The following processor registers are also conditionally non-volatile:

If the -r option is in effect: DS ES FS GS

Otherwise, according to memory model, target platform, language, and (even!) host platform:

Memory model	32-bit C compiler (with `__far16`)		16-bit C compiler		32-bit C++ compiler (with `__far16`)		16-bit C++ compiler
Memory model	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets	targetting 32-bit extended Win16	all other targets	targetting Win16	all other targets
Flat/Tiny	`DS` `ES`	`DS` `ES`	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `ES`	`FS` `DS` `ES`	`DS` `FS` `GS`	`DS` `FS` `GS`
Small	`DS`	`DS`	`DS` `FS` `GS`	`DS` `FS` `GS`	`DS` `ES`	`FS` `DS` `ES`	`DS` `FS` `GS`	`DS` `FS` `GS`
Compact	None	None	`DS` `FS` `GS`	`FS` `GS`	`ES`	`FS` `ES`	`DS` `FS` `GS`	`FS` `GS`
Medium	`DS`	`DS`	`DS` `FS` `GS`	`DS` `FS` `GS`	`DS` `ES`	`FS` `DS` `ES`	`DS` `FS` `GS`	`DS` `FS` `GS`
Large	None	None	`DS` `FS` `GS`	`FS` `GS`	`ES`	`FS` `ES`	`DS` `FS` `GS`	`FS` `GS`
Huge	None	None	`DS` `FS` `GS`	`FS` `GS`	`ES`	`FS` `ES`	`DS` `FS` `GS`	`FS` `GS`

The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near or far) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, and 0:16 pointer return values will be stored by the called function in the AX register.
64-bit integer, 32-bit integer, and 16:16 pointer return values will be stored by the called function in the AX/DX register pair.
If the -fpi or -fpi87 compiler option is in effect:
- Floating point return values will be stored by the called function in the ST(0) FPU register.
If the -fpc compiler option is in effect:
- 32-bit floating point return values are stored by the called function in the AX/DX register pair. 64-bit floating point return values are stored by the called function in the AX/BX/CX/DX register quadruplet. It is not possible to return 80-bit floating point values.
- The FPU registers are not used to return values.
For return values of structure or class type that are "plain old data structures" 16 bits or smaller in size, the called function stores the structure/class value in the AX register.

For return values of structure or class type that are "plain old data structures" 17 to 32 bits in size, the called function stores the structure/class value in the AX/DX register pair.

For return values of structure or class type that are not "plain old data structures" or that are larger than 32 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the AX register.

Watcom's 32-bit stack-based `Watcall` calling convention

The 32-bit stack-based Watcall calling convention is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer type are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.

The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7)

In addition, and if the -r option is not in effect, the following processor registers are volatile, according to memory model, target platform, and (even!) host platform:

Memory model	C compiler (32-bit only)		C++ compiler (32-bit only)
Memory model	targetting 32-bit extended Win16	all other targets	targetting 32-bit extended Win16	all other targets
Flat/Tiny	`GS` `FS`	`GS` `FS`	`FS` `GS`	`GS`
Small	`ES` `FS` `GS`	`ES` `FS` `GS`	`FS` `GS`	`GS`
Compact	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `FS` `GS`	`DS` `GS`
Medium	`ES` `FS` `GS`	`ES` `FS` `GS`	`FS` `GS`	`GS`
Large	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `FS` `GS`	`DS` `GS`
Huge	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `FS` `GS`	`DS` `GS`

The following processor registers are non-volatile: EBX ESI EDI EBP ESP CS

The following processor registers are also conditionally non-volatile:

If the -r option is in effect: DS ES FS GS

Otherwise, according to memory model, target platform, and (even!) host platform:

Memory model	C compiler (32-bit only)		C++ compiler (32-bit only)
Memory model	targetting 32-bit extended Win16	all other targets	targetting 32-bit extended Win16	all other targets
Flat/Tiny	`DS` `ES`	`DS` `ES`	`DS` `ES`	`FS` `DS` `ES`
Small	`DS`	`DS`	`DS` `ES`	`FS` `DS` `ES`
Compact	None	None	`ES`	`FS` `ES`
Medium	`DS`	`DS`	`DS` `ES`	`FS` `DS` `ES`
Large	None	None	`ES`	`FS` `ES`
Huge	None	None	`ES`	`FS` `ES`

The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near or far) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer, and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
If the -fpi or -fpi87 compiler option is in effect:
- Floating point return values will be stored by the called function in the ST(0) FPU register.
If the -fpc compiler option is in effect:
- 32-bit floating point return values are stored by the called function in the EAX register. 64-bit floating point return values are stored by the called function in the EAX/EDX register pair. It is not possible to return 80-bit floating point values.
- The FPU registers are not used to return values.
For return values of structure or class type, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

Watcom's 32-bit register-based `Watcall` calling convention

The 32-bit register-based Watcall calling convention is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer type are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The first four (32-bit) integer, 32-bit floating point, or 0:32 pointer arguments are stored in, respectively, the EAX, EBX, ECX, and EDX registers. Register assignment proceeds in left-to-right lexical order and stops at the first argument enountered that is not a (32-bit) integer32-bit floating point value, or 0:32 pointer.

The caller does not push dummy values onto the call stack for any of those arguments. If the called function needs to spill the argument values from these registers to memory, it has to supply space itself.

If the -r option is in effect, no processor registers are volatile, not even the ones used by the caller to store parameter values.

Otherwise, the following processor registers are volatile, according to memory model, target platform, and (even!) host platform:

Memory model	C compiler (32-bit only)		C++ compiler (32-bit only)
Memory model	targetting 32-bit extended Win16	all other targets	targetting 32-bit extended Win16	all other targets
Flat/Tiny	`GS` `FS`	`GS` `FS`	`FS` `GS`	`GS`
Small	`ES` `FS` `GS`	`ES` `FS` `GS`	`FS` `GS`	`GS`
Compact	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `FS` `GS`	`DS` `GS`
Medium	`ES` `FS` `GS`	`ES` `FS` `GS`	`FS` `GS`	`GS`
Large	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `FS` `GS`	`DS` `GS`
Huge	`DS` `ES` `FS` `GS`	`DS` `ES` `FS` `GS`	`DS` `FS` `GS`	`DS` `GS`

The following processor registers are non-volatile: EAX EBX ECX EDX ESI EDI EBP ESP CS ST(0)–ST(7)

The following processor registers are also conditionally non-volatile:

If the -r option is in effect: DS ES FS GS

Otherwise, according to memory model, target platform, and (even!) host platform:

Memory model	C compiler (32-bit only)		C++ compiler (32-bit only)
Memory model	targetting 32-bit extended Win16	all other targets	targetting 32-bit extended Win16	all other targets
Flat/Tiny	`DS` `ES`	`DS` `ES`	`DS` `ES`	`FS` `DS` `ES`
Small	`DS`	`DS`	`DS` `ES`	`FS` `DS` `ES`
Compact	None	None	`ES`	`FS` `ES`
Medium	`DS`	`DS`	`DS` `ES`	`FS` `DS` `ES`
Large	None	None	`ES`	`FS` `ES`
Huge	None	None	`ES`	`FS` `ES`

The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple (near or far) RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer, and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
If the -fpi or -fpi87 compiler option is in effect:
- Floating point return values will be stored by the called function in the ST(0) FPU register.
If the -fpc compiler option is in effect:
- 32-bit floating point return values are stored by the called function in the EAX register. 64-bit floating point return values are stored by the called function in the EAX/EDX register pair. It is not possible to return 80-bit floating point values.
- The FPU registers are not used to return values.
For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.

For return values of structure or class type that are not "plain old data structures" or that are larger than 32 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

Microsoft's `Fastcall` calling convention

Microsoft's x86 fastcall calling convention is a "go faster" variant of the 32-bit x86 Win32 system API calling convention, that places up to two integer and (near) pointer arguments in CPU registers instead of on the stack or in memory. (The 32-bit x86 WINAPI calling convention already returns small POD structures in registers.)

It is documented by Microsoft in the Visual C/C++ Programming Guide, and is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer types are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The first two (32-bit) integer or 0:32 pointer arguments are stored in, respectively, the ECX, and EDX registers. Register allocation proceeds in left-to-right lexical order and stops at the first argument enountered that is not a (32-bit) integer or 0:32 pointer.

The caller does not push dummy values onto the call stack for any of those arguments. If the called function needs to spill the argument values from these registers to memory, it has to supply space itself.
The following processor registers are volatile: EAX ECX EDX ST(0)–ST(7)
The following processor registers are non-volatile: EBX ESI EDI EBP ESP (see below) CS DS ES FS GS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a (near) RET n instruction, popping the near return address and n additional bytes off the stack as it returns. So whilst non-volatile register ESP won't, strictly speaking, have its value exactly preserved across a call, it will be changed by a fixed amount relative to the value that it had upon entry.
8-bit integer, 16-bit integer, 32-bit integer, and 0:32 pointer return values will be stored by the called function in the EAX register.
64-bit integer and 16:32 pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.

For return values of structure or class type that are "plain old data structures" 33 to 64 bits in size, the called function stores the structure/class value in the EAX/EDX register pair.

For return values of structure or class type that are not "plain old data structures" or that are larger than 32 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

Borland's Register calling conventions

Borland's x86 Register calling conventions are "go faster" variants of its own cdecl calling conventions, that place up to three integer and (near) pointer arguments in CPU registers instead of on the stack or in memory.

What combinations of registers are used and what lexical order the arguments are assigned to registers are undocumented and not guaranteed, not even from version to version of Borland's own compilers. Therefore:

It is not possible to mix functions using Borland's Register calling convention with code written in any other language, since it is not possible to specify exactly what Borland's Register calling convention is.
It is not possible to safely mix functions using Borland's Register calling convention with code compiled with other compilers, because even with compilers such as Watcom C/C++ that are capable of employing user-defined calling conventions, it is not possible to specify exactly what Borland's Register calling convention is.
It is not possible to safely mix functions using Borland's Register calling convention with code compiled even with different versions of Borland's own compilers.

Borland's 16-bit Register calling convention

Borland's 16-bit Register calling convention is documented (to an extent) by Borland in its Borland C/C++ version 3.1 for DOS/Windows User Guide, in Appendix A. (The following fills in some of the missing parts from Borland's 16-bit cdecl calling convention.) That documentation was dropped from its Borland C/C++ version 4.0 for DOS/Windows User Guide.

It is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit integer type are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
Any 8-bit integer, 16-bit integer, 32-bit integer, or 0:16 pointer arguments are stored, if possible, in some combination of the AX, BX, and DX registers:
- 8-bit integers, 16-bit integers, and 0:16 pointers are stored in any of AX, BX, and DX, in an unspecified lexical order and for an unspecified number of arguments.
- 32-bit integers are stored in the AX/DX register pair. This obviously precludes the use of both of these registers for 8-bit integers, 16-bit integers, and 0:16 pointers.
The caller does not push dummy values onto the call stack for any of those arguments. If the called function needs to spill the argument values from these registers to memory, it has to supply space itself.
The following processor registers are volatile: AX BX CX DX ST(0)–ST(7) ES
The following processor registers are non-volatile: SI DI BP SP CS DS FS GS
The direction (DF) flag in the FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, and near pointer return values will be stored by the called function in the AX register.
64-bit integer, 32-bit integer, and far pointer return values will be stored by the called function in the AX/DX register pair.
The FPU registers are not used to return values.
For return values of structure or class type, or of floating-point type, the called function is expected to allocate space for a value of that type, writing the return value to this address, and returning the address in the AX register.

Because the called function cannot use the call stack for this, since the call stack pointer is restored by the caller after function exit, and because it cannot use the heap either, it must use a fixed-address portion of static non-constant data storage for such structure values. This makes the 16-bit Register calling convention inherently not thread-safe when it comes to returning values of structure, class, or floating-point type.

Borland's 32-bit Register calling convention

Borland's 32-bit Register calling convention is hardly documented at all in its Borland C/C++ version 4.0 for DOS/Windows User Guide. It is, however, more fully albeit still incompletely documented in its Borland C/C++ version 2.0 for OS/2 User Guide, in appendix A. (The following fills in some of the missing parts from Borland's 32-bit cdecl calling convention.)

It is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit and 16-bit integer type are promoted to 32-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
Any 8-bit integer, 16-bit integer, 32-bit integer, or 0:32 pointer arguments are stored, if possible, in some combination of the EAX, EBX, and EDX registers, in an unspecified lexical order and for an unspecified number of arguments.

The caller does not push dummy values onto the call stack for any of those arguments. If the called function needs to spill the argument values from these registers to memory, it has to supply space itself.
The following processor registers are volatile: EAX EBX ECX EDX ST(0)–ST(7) ES
The following processor registers are non-volatile: ESI EDI EBP ESP CS DS FS GS
The direction (DF) flag in the EFLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, 32-bit integer, and near pointer return values will be stored by the called function in the EAX register.
64-bit integer, and far pointer return values will be stored by the called function in the EAX/EDX register pair.
Floating point return values will be stored by the called function in the ST(0) FPU register.
For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.

For return values of structure or class type that are not "plain old data structures" or that are larger than 32 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

Borland's `FastThis` calling conventions

Borland's x86 FastThis calling convention (called the "Object Data" calling convention in version 3.1 of its DOS/Windows compiler) is a variant of its own cdecl calling convention, that places the implicit this pointer, passed as a hidden parameter in calls to non-static function members of classes, in a register instead of on the stack.

Obviously enough, it only applies to the C++ compiler. It is as follows:

Arguments are pushed onto the call stack by the caller in right-to-left lexical order.
Arguments of 8-bit integer type are promoted to 16-bit integers. In the cases of variable-argument functions where the type of the parameter is not specified, and of unprototyped functions, arguments of 32-bit floating point type are promoted to 64-bit floating point.
The hidden this argument to non-static function members is passed in the (E)SI register (in near data models) or in the DS:(E)SI register pair (in far data models).
The following processor registers are volatile: (E)AX BX (16-bit compiler only) (E)CX (E)DX ST(0)–ST(7) ES
The following processor registers are non-volatile: EBX (32-bit compiler only) (E)SI (E)DI (E)BP (E)SP CS DS FS GS
The direction (DF) flag in the (E)FLAGS register must be set to zero on entry to and on exit from the function.
The called function will exit with a simple RET instruction. It is the caller's responsibility to pop the function arguments back off the call stack.
8-bit integer, 16-bit integer, (in the 32-bit compiler) 32-bit integer, and near pointer return values will be stored by the called function in the (E)AX register.
64-bit integer, (in the 16-bit compiler) 32-bit integer, and far pointer return values will be stored by the called function in the (E)AX/(E)DX register pair.
The FPU registers are not used to return values.
Return value mechanisms differ as Borland's cdecl calling convention differs between its 16-bit and 32-bit compilers:
- Borland C++ for DOS/Windows (16-bit compiler):
  
  For return values of structure or class type, or of floating-point type, the called function is expected to allocate space for a value of that type, writing the return value to this address, and returning the address in the (E)AX register.
- Borland C++ for DOS/Windows (32-bit compiler):
  
  For return values of structure or class type that are "plain old data structures" 32 bits or smaller in size, the called function stores the structure/class value in the EAX register.
  
  For return values of structure or class type that are not "plain old data structures" or that are larger than 32 bits, the caller is expected to allocate space for a value of that type, passing a pointer to it as a hidden parameter, pushed onto the call stack after all other parameters. The called function writes the return value to this address, and returns the address in the EAX register.

© Copyright 2010 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.

The gen on function calling conventions.

Platform-defined calling conventions

OS/2 system API calling conventions

32-bit OS/2 system API calling convention

16-bit OS/2 system API calling convention

Windows system API calling conventions

32-bit x86 Windows system API and application callback calling conventions

64-bit x86-64 Windows system API and application callback calling conventions

16-bit Windows system API calling convention

16-bit Windows application callback calling convention

Unix system API calling conventions

MacOS version 10 x86 ABI calling convention

64-bit x86-64 System V Unix ABI calling convention

IA64 System V Unix ABI calling convention

Architecture-defined calling conventions

x86 interrupt calling conventions

Compiler-defined calling conventions

"common" cdecl calling conventions

16-bit cdecl calling convention

32-bit cdecl calling convention

IBM's Optlink calling convention

Watcom's Watcall calling conventions

Watcom's 16-bit stack-based Watcall calling convention

Watcom's 16-bit register-based Watcall calling convention

Watcom's 32-bit stack-based Watcall calling convention

Watcom's 32-bit register-based Watcall calling convention

Microsoft's Fastcall calling convention

Borland's Register calling conventions

Borland's 16-bit Register calling convention

Borland's 32-bit Register calling convention

Borland's FastThis calling conventions

"common" `cdecl` calling conventions

16-bit `cdecl` calling convention

32-bit `cdecl` calling convention

IBM's `Optlink` calling convention

Watcom's `Watcall` calling conventions

Watcom's 16-bit stack-based `Watcall` calling convention

Watcom's 16-bit register-based `Watcall` calling convention

Watcom's 32-bit stack-based `Watcall` calling convention

Watcom's 32-bit register-based `Watcall` calling convention

Microsoft's `Fastcall` calling convention

Borland's `FastThis` calling conventions