DOC File Format (Microsoft Word Binary Document)
DOC is the legacy binary document format used by Microsoft Word from Word 97 through Word 2003. Unlike the modern XML-based DOCX, a DOC file is a single binary container built on Microsoft's Compound File Binary Format (also called OLE2 structured storage), which organizes text, formatting, fonts, images, and metadata into streams within one file. DOC was the dominant word processing format throughout the late 1990s and 2000s, and billions of documents were saved in it before Office 2007 made DOCX the default. The format supports rich text, styles, tables, headers and footers, embedded images and OLE objects, footnotes, and track changes. Because the structure is proprietary and binary, DOC files are harder to parse, more prone to corruption, and larger than equivalent DOCX files. Microsoft published the .doc binary specification in 2008, which improved third-party support. Today DOC is considered a legacy format: Word, Google Docs, LibreOffice, and Apple Pages still open it for backward compatibility, but converting older DOC files to DOCX or PDF is the recommended path for editing, sharing, and long-term storage.
Quick Facts
- Extension: .doc
- MIME Type: application/msword
- Category: document
Advantages
- Backward compatibility with Word 97-2003 and older systems
- Opens in Word, Google Docs, LibreOffice, and Apple Pages
- Preserves rich formatting, styles, tables, and images
- Supports track changes and comments for editing workflows
- Self-contained single file with embedded images and objects
Disadvantages
- Proprietary binary format that is harder to parse than DOCX
- More prone to corruption with no easy partial recovery
- Larger file sizes than the equivalent DOCX
- Can carry hidden macros and metadata, posing security and privacy risks
- Legacy format no longer the default since Office 2007
Common Use Cases
- Opening and editing legacy business documents
- Archived contracts, letters, and reports from the 1990s-2000s
- Sharing files with users on old versions of Microsoft Word
- Templates and forms created in older Office versions
- Migrating older document libraries to modern formats
Technical Details
DOC uses the Compound File Binary Format (OLE2 structured storage), which divides the file into 512-byte sectors organized like a small FAT file system, with named streams and storages. The main text and formatting live in the WordDocument stream, which begins with the File Information Block (FIB) that points to the document's piece table, character and paragraph property tables, and section descriptors. Formatting is stored as runs of properties (CHPs for characters, PAPs for paragraphs) referenced through formatted disk pages (FKPs). Document metadata is held in separate SummaryInformation and DocumentSummaryInformation streams, and embedded objects use ObjectPool storage. Word 97-2003 caps document text at roughly 32 MB. Microsoft released the binary format specification (MS-DOC) in 2008.
Frequently Asked Questions about DOC
What is the difference between DOC and DOCX?
DOC is the legacy binary format used by Word 97-2003, built on Microsoft's OLE2 compound file structure. DOCX is the modern XML-based format used since Word 2007. DOCX files are smaller, more reliable, easier to recover, and based on the open Office Open XML standard.
How do I open a DOC file without Microsoft Word?
Google Docs, LibreOffice Writer, Apple Pages, and WPS Office all open DOC files. You can also convert a DOC to PDF or DOCX with FileChange to view it in any browser or modern word processor.
How do I convert DOC to DOCX?
Open the DOC in Word and use Save As, or use FileChange to convert DOC to DOCX in your browser. Converting to DOCX produces a smaller, more reliable file and unlocks modern Word features.
How do I convert DOC to PDF?
FileChange converts DOC to PDF directly in your browser. Converting to PDF locks the layout, fonts, and images so the document looks the same on every device, which is ideal for sharing and printing.
Are DOC files safe to open?
Older DOC files can contain macros (VBA code) that may carry malware, and they often retain hidden metadata. Only open DOC files from trusted sources, keep macros disabled by default, and consider converting to PDF or DOCX to strip embedded code.
Why is my DOC file larger than a DOCX version of the same document?
DOC stores content in an uncompressed binary container, while DOCX is a ZIP archive of compressed XML. For the same text and images, the DOCX is usually noticeably smaller, which is one reason Microsoft switched to it as the default.