Reading binary files in Modern C++
For a general-purpose programming language used to write desktop applications as well as to program embedded systems, C++ makes it surprisingly difficult to simply read a bunch of bytes from a binary file. Compared with other high-level programming languages, it's complicated.
Maybe it's a bit unfair to compare C++ to Java or Python in this respect, but then again, maybe it isn't. Or maybe reading data from a binary file is not such a common use case after all, but somehow I don't buy that. So, let's have quick look at how it is done in some other languages, and then proceed to find out what that requires in C++.
In this context I'm only interested in reading the complete contents of the file, which will work just fine for small(ish) files. My practical purpose is to read MIDI System Exclusive files, which tend to be just a few kilobytes, or at most a few megabytes in size. In a modern desktop or cloud service context this is peanuts, but if you need to read files that are hundreds of megabytes in size, you will need to resort to streaming, to keep the memory use of your program in check.
Reading a binary file in Python
Python has the convenient bytes
object. I made a convenient
helper function to read in a file and return a bytes
object,
complete with type hints for Python 3.5 or later.
import sys def read_file_data(filename: str) -> bytes: try: with open(filename, 'rb') as f: return f.read() except FileNotFoundError: print(f'File not found: {filename}') sys.exit(-1)
Using the with
statement ensures that the file is closed.
For more information about reading binary files, take a look at
the articles from Python Morsels, starting with
How to read
a binary file in Python.
Reading a binary file in Scheme
Here is a curveball for you: reading a binary file in Scheme, or more accurately, using Chez Scheme, which is one of the more established Scheme dialects along with GNU Guile.
The Scheme bytevector
is roughly the equivalent of the Python
bytes
object. You can use a file input port to access the contents of
a file, and get the full contents of the file using the get-bytevector-all
function.
As with Python, I made a small helper function to read the contents of a file:
(import (chezscheme)) (define (read-file-data filename) (get-bytevector-all (open-file-input-port filename)))
For more information on Scheme, refer to The Scheme Programming Language, Fourth Edition by R. Kent Dybvig, the principal developer of Chez Scheme.
Reading a binary file in Rust
Rust is an up-and-coming systems programming language, which has gained mindshare in recent years among programmers who like a more predictable language than C++, with less obvious discontinuities (the technical term for "WTF"). For many of the benefits of Rust (with some of the negatives), see Why Rust?.
My little helper function to read a binary file in Rust looks like this (with the required import statements):
use std::io::prelude::*; use std::fs; fn read_file_data(name: &String) -> Vec<u8> { let mut f = fs::File::open(&name).expect("no file found"); let mut buffer = Vec::new(); f.read_to_end(&mut buffer).expect("unable to read file"); buffer }
It returns a Vec<u8>
, where Vec
is a Rust collection type with
a template type parameter u8
. Note that Rust deallocates memory when variables go out
of scope, which also cause the std::fs::File
object to
automatically close.
For an occasionally updated series on doing stuff with Rust, see the Flecks of Rust newsletter on this site.
Reading a binary file in Modern C++
The solutions for reading a binary file in Python, Scheme and Rust were straightforward enough to use. When I started to figure out how to achieve the same in C++, I soon realised that it would be a little different.
Modern C++ does have the std::vector
collection type. It is closest to the
Vec
type of Rust, also being a template type. Since I want to use the C++
std::byte
type for the items in the vector, I know I will be needing a
std::vector<std::byte>
instance.
For accessing the file, you can use the ifstream
class. I haven't found a way
to read all the file data with one method call, so the next best thing is to
find out the size of the file, and then read exactly that number of bytes.
With the help of the information found in Modern C++ Programming Cookbook, 2nd Ed by Marius Bancila, I came up with the following helper function:
#include <fstream> #include <iterator> #include <vector> std::vector<std::byte> readFileData(const std::string& name) { std::ifstream inputFile(name, std::ios_base::binary); // Determine the length of the file by seeking // to the end of the file, reading the value of the // position indicator, and then seeking back to the beginning. inputFile.seekg(0, std::ios_base::end); auto length = inputFile.tellg(); inputFile.seekg(0, std::ios_base::beg); // Make a buffer of the exact size of the file and read the data into it. std::vector<std::byte> buffer(length); inputFile.read(reinterpret_cast<char*>(buffer.data()), length); inputFile.close(); return buffer; }
Note that this function does not perform error checking when the file is opened, or try to find out if the read succeeded.
What I find weird is that there is no read
function for std::byte
,
which is conceptually wrong because std::byte
would be exactly the right type here.
Instead, you need to use
reinterpret_cast
.
Of course, it's all bits anyway, but I would like them to be the most obvious and correct bits.
You could specify the size of the vector when you initialize it, but you need to be careful:
if you use the uniform initialization syntax of Modern C++ (curly brackets around the value),
like std::vector<std::byte> buffer{length};
,
you will end up creating a one-element vector with the current value of length
as the sole element. Instead, you will want to use parentheses, like
std::vector<std::byte> buffer(length);
. Another day, another C++ footgun.
The type of value returned by the tellg
method of std::ifstream
is a
std::fpos
, while the size of the vector is a size_type
, which is usually
a typedef
for std::size_t
, which is... oh, never mind. We seem to have
descended into another pit of madness in the C++ type system. Somehow it all seems to work, where
the definition of "work" is "compiles with clang++ and runs on macOS 12".
Truth be told, the biggest differences in C++ were the need to find out the size of the file, and to make a buffer to hold that exact number of bytes. You can write a helper function to paper over these differences, but shouldn't that be a standard library function?
So there you have it: reading a binary file in Modern C++. It's not exactly the kind of straightforward solution to what must be a common task, so if I'm missing something obvious, then please let me know!