You've come to this page because you've said something similar to the following:
Character 26,
CTRL-Z
, is the End-Of-File (EOF) character in MS-DOS/PC-DOS/DR-DOS.
This is the Frequently Given Answer to such statements.
MS-DOS didn't have an End-Of-File character of any sort. The MS-DOS system API from version 2.0 onwards treated files as simple octet streams, with no particular octet values having any special meanings. The end-of-file position of a file was recorded in file metadata. (It's the length field in the file's directory entry.) No special meaning was ascribed to character 26 (or indeed to any other character) within file data.
All of the so-called "text file" semantics that are conventionally, but erroneously, ascribed to MS-DOS were in fact artifacts of the C libraries for C compilers targetting DOS.
The conversion of CR
+LF
sequences into just LF
was done by the C libraries.
The handling of character 26 was done by the C libraries.
None of this was actually behaviour inherent in DOS itself.
One can even see this for onesself:
In the source code for the COPY
command in FreeDOS explicit application-level code to perform all of the special handling for character 26 and for transforming CR
+LF
is clearly present.
Similar application-level code can be found in many other utilities, such as in the FreeDOS TYPE
command for example.
In OpenWatcom C/C++'s fgetc()
library function there is the following code:
if( c == DOS_EOF_CHAR ) { fp->_flag |= _EOF; c = EOF; }
There's identical code in OpenWatcom C/C++'s fread()
library function.
In OpenWatcom C/C++'s read()
library function one finds this code, which ensures that character 26 (which as you can see it erroneously calls "EOF") terminates all reads, resetting the position of the next read to re-read the character and resulting in a zero-byte read if character 26 is the first character read:
if( buffer[ reduce_idx ] == 0x1a ) { /* EOF */ __lseek( handle, ((long)reduce_idx - (long)amount_read)+1L, SEEK_CUR ); total_len += finish_idx; _ReleaseFileH( handle ); return( total_len ); }
Similar code can be found in the OpenWatcom C++ streams functions, and in the run-time libraries of Borland C/C++ for DOS and of DJGPP (the latter in its _filbuf()
and read()
functions).
DOS makes no distinction between "text" files and "binary" files in its system API.
Files are, to DOS, simple octet streams, with no such division.
The DOS API function is INT 0x21
with AX=0x3f
, which, as can be seen, does not treat any characters in a file specially, nor perform any translation of the characters in a file.
DOS itself is actually a lot more like Unix in this regard than many people think.
Ironically, this greater similarity to Unix was hidden by language libraries, even though several of those language implementations attempted to give Unix-like semantics to DOS as much as they could. This is particularly ironic for DJGPP, for example.
The treatment of character 26 and the handling of "text" files was a shared delusion, common to the C libraries and the code of many programs that ran on top of DOS, from the aforementioned COPY
command to text editors.
It was wholly layered above DOS itself.