Using the C++ 17 std::byte data type

Platform-dependent data types are always a little bit disturbing. A programming language should have data types that don't change in size from one platform to another. Countless hours have been wasted in defining macros and other contrapments that try to ensure that a data type has a constant size across platforms. While they have succeeded to some extent, it has become ever more tedious with new microprocessors and wider data types, from 8-bit to 16-bit to 32-bit to 64-bit integers and beyond.

Even the lowly 8-bit byte or octet has not been immune to this. The fundamental problem with C and C++ (and with many other languages as well) has always been their insistence to originally treat bytes and characters as somehow equivalent. This worked when everyone used ASCII, but that was a long time ago, and with the need to use "wide" characters and ultimately the move to Unicode with up to 32-bit code points, it became clear that the idea of one byte equalling one character was no good anymore (it never was, of course, but we didn't know it yet).

So we have ended up in a situation where handling binary data and handling text have completely different requirements. Beyond the practically uniform definition of a byte as an octet (in modern computer architectures), there has been confusion in C and C++ about signed and unsigned bytes with regard to bytes. Ostensibly, a char is seven bits, while an unsigned char is eight bits, but the default char type of a given platform could be signed or unsigned, so the madness still continues. (It's even more complicated, but I don't want to go there, since there is a better way, so read on.)

Use std::byte for binary data

Just as it's better to forget about strings as arrays of characters, and use the C++ std::string type instead, it's better to adopt the std::byte data type for dealing with binary data.

The std::byte data type was introduced in C++ 17, and it is limited on purpose, and somewhat peculiar in the sense that it only describes a collection of bits and some operations you can perform on them, but it does not do double duty as a character type or an arithmetic type.

So when you can initialize a variable of type std::byte with a value from 0 to 255 (inclusive), you end up with a bit pattern describing that value, but if you want to use it for anything else than manipulating those bits, you will need to convert it to a numeric value, for example by using the std::to_integer function.

Restricting the std::byte to a collection of bits stops you from attaching any semantics to the value. That task belongs to any class or function that actually knows what kind of equivalence those bits might have to any integer (or even floating-point) values. For more information on the rationale of using std::byte, see Marc Grigoire's blog.

NOTE: If you think that you could just as well use std::string for binary data, that's not such a great idea (mostly also because of the wrong semantics; it really does matter). See the Simplify C++ blog entry std::string is not a Container for Raw Data for details.

C++ 17 example of std::byte

Here is a quick C++ 17 example of using the std::byte data type for some lightweight operations on bytes. The bytes in question come from the world of MIDI System Exclusive messages, which are just small (from a few bytes to some hundreds of kilobytes) vectors of bytes that are passed around using the MIDI interface (old school serial with 5-pin DIN, or modern USB,or even Bluetooth).

The program produces an Identity Request message that you can send to a MIDI synthesizer. If it support the Identify Request function, it will reply with a similar message that can be interpreted as an Identity Reply. These are known as Universal System Exclusive messages.

If you have a MIDI-capable synthesizer connected to your computer, you could try to send the bytes to it using Geert Bevin's excellent SendMIDI utility. Be sure to leave the initial F0 and the terminating F7 bytes off, because SendMIDI will add them when you use its syx command. A suitable command would be sendmidi dev "Your MIDI Port Name" hex syx 7e 06 01. See the SendMIDI documentation for details.

    // Using the C++17 std::byte type to make a MIDI SysEx message.
    // Compile using clang on macOS with "clang++ -std=c++17 bytes.cpp -o bytes"
    // For more details, see:
    // - C++ Reference: https://en.cppreference.com/w/cpp/types/byte
    // - Marc Grigoire's blog: http://www.nuonsoft.com/blog/2018/06/03/c17-stdbyte/
    // - MMA reference: https://www.midi.org/specifications-old/item/table-4-universal-system-exclusive-messages
    // - Geert Bevin's SendMIDI: https://github.com/gbevin/SendMIDI

    #include <iostream>
    #include <vector>
    #include <iomanip>
    #include <cstddef>

    int main() {
        // Define the bytes that can be used to make up
        // the MIDI System Exclusive message that indicates
        // an Identity Request to send to a synthesizer.
        // It's convenient to use a vector of bytes instead of
        // individual variables of type std::byte,
        // but the initialization is kind of tedious.
        auto identityRequest = std::vector<std::byte> {
            std::byte { 0xf0 },  // System Exclusive initiator
            std::byte { 0x7e },  // Universal Non-Real-time message
            std::byte { 0x06 },  // General Information command
            std::byte { 0x01 },  // Identity Request
            std::byte { 0xf7 }   // System Exclusive terminator
        };

        // Print the contents of the vector as two-digit
        // hex numbers. We need to cast each byte into an
        // integer, because std::byte is just a collection of bits.
        for (auto b : identityRequest) {
            std::cout
                << std::setw(2)
                << std::setfill('0')
                << std::hex
                << std::to_integer(b)
                << " ";
        }
        std::cout << std::endl;

        // If you send this MIDI message to your synthesizer,
        // for example using Geert Bevin's SendMIDI, it may
        // respond with an Identity Reply message.
    }

If you compile and run this program, you should see this output:

f0 7e 06 01 f7

Hopefully this was useful information if you need to deal with binary data in C++. As of this writing in 2022, most mainstream compilers seem to support nearly all C++ 17 features.

For a concise take on the most useful new features of "Modern C++" (especially if you have used C++ before, but haven't kept up with it) see the overview Welcome back to C++ - Modern C++ by Microsoft.