Flecks of Rust #11: asc, a Rust command-line utility for ASCII code lookup

When working with old file formats from the 1980s and 1990s I found the need to have a quick utility to look up a byte to find out which ASCII character it possibly represents. So I wrote a command-line utility in Rust to provide that information.

The scope of the program is purposefully limited to ASCII, even though Unicode is finally become accepted as the standard way of encoding text (and particularly the use of UTF-8 has become the norm on the web). The file formats I'm dealing with were designed before Unicode got that position, and possibly even before Unicode was published.

In Unix systems you can usually get information about the ASCII character set with the command man ascii, but it gets a little awkward to read the manual page and scan for the right code in the difference tables for octal, hexadecimal or decimal values.

There is also the ascii utility by Eric S. Raymond (easily installed on macOS using Homebrew), but that gave me a little too much information for my taste, and I wanted to do something in Rust, so asc was born as a more minimalist Rust alternative.

In this post I'm going to take a quick walkthrough to the design of asc, paying special attention to how some things are done in (hopefully) idiomatic Rust.

You can find the source code for asc on GitHub. The program is published under the MIT License.

Command-line arguments

Without any command-line arguments, asc just prints the ASCII table:

% asc
    0	00	00000000	000	NUL
    1	01	00000001	001	SOH
    2	02	00000010	002	STX
    .
    .
    .
  125	7D	01111101	175	}
  126	7E	01111110	176	~
  127	7F	01111111	177	DEL

The table has five columns: the first four give you the ASCII code of a character in decimal, hexadecimal, binary and octal (respectively), and the last column contains the name of the character. The control characters have their standard names, while the printable characters are shown using their usual representation.

If you give asc one command-line argument, the program tries to interpret it as either the ASCII code of a character (decimal, hexadecimal, or octal). Decimal codes have no prefix, while for the others they are:

The character code must be in the range 0...127 inclusive. Some examples of lookups:

% asc 65
    65	41	01000001	101	A
% asc 10
    10	0A	00001010	012	LF

(The program also has an Easter egg related to the command-line parameter, but sadly nobody seems to get the joke.)

The main program

The main program of asc first collects the command-line arguments. It then immediately starts to prepare for exiting to the shell, returning either exitcode::OK or exitcode::USAGE, depending on the return value of the run_app function. These values are defined in the exitcode crate.

fn main() {
    let args: Vec<String> = env::args().collect();

    std::process::exit(match run_app(&args[1..]) {
        Ok(_) => exitcode::OK,
        Err(err) => {
            eprintln!("error: {:?}", err);
            exitcode::USAGE
        }
    });
}

The run_app function

The actual work happens in the run_aop function, which returns a Result value: either Ok or Err with a message in a static string. It takes in an array of actual command-line arguments, not including the very first one, which is the name of the executable program (like "asc"). If the array is empty, then this fucntion just prints the full ASCII table using the print_table helper function.

Only the first element in the array of command-line arguments, if any, is considered. The run_app function tries to parse the argument first as a hexadecimal, binary, or octal number, based on the prefix. If there is no prefix, then the argument is treated as a decimal number.

There is a helper function for parsing the command-line argument as a number, called appropriately parse_number. It takes either the rest of the argument if there is a known prefix, or the whole argument. The number base of the conversion (16, 2, 8, or 10) is also passed. It returns a Result: either an i64 value wrapped in Ok, or an Err with a message in a static string.

fn run_app(args: &[String]) -> Result<(), &'static str> {
    if args.len() == 0 {
        print_table();
        Ok(())
    }
    else {
        let arg = String::from(&args[0]);
        if arg.starts_with("me") {
            println!("Anything?");
            Ok(())
        }
        else {
            let number: Result<i64, &'static str>;
            if arg.starts_with("0x") {
                number = parse_number(&arg[2..], 16);
            }
            else if arg.starts_with("0b") {
                number = parse_number(&arg[2..], 2);
            }
            else if arg.starts_with("0o") {
                number = parse_number(&arg[2..], 8);
            }
            else {
                // assume the argument is a decimal number with no prefix
                number = parse_number(&arg, 10);
            }
            match number {
                Ok(n) => {
                    print_row(n);
                    Ok(())
                },
                Err(_) => Err("value out of range")
            }
        }
    }
}

If the parse was successful, the print_row function is used to output the information about the particular ASCII code:

fn print_row(i: i64) {
    println!("{:3}\t{:02X}\t{:08b}\t{:03o}\t{}", i, i, i, i, CHAR_NAMES[i as usize]);
}

This function uses the static array of constant strings called CHAR_NAMES, defined at the start of the program. It looks up the name of the character based on the ASCII code. The code is already verified to be in the range 0...127, so it is safe to use it for indexing the table, which has exactly 128 elements. Note that since the ASCII code was parsed into an i64, we needed to convert it to a usize for indexing.

Parsing the number

To parse the argument string as a number, given the desired base, the parse_number function is used. It first calculates the correct width for the string representation of the number for each base: binary numbers are eight digits, octal numbers are three, decimal are three, and hexadecimal are two digits from the appropriate base. (Note: why is the width and fill needed?)

First, the format! macro in Rust is used to fill the string to the calculated width. Then the i64::from_str_radix function is used to parse the string into an actual number of type i64.

The result of the parsing is checked to ensure that the number represents a valid ASCII code from 0 to 127. Numbers greater than 127 are rejected by returing Err, while accidental negative numbers are turned into positive by taking their absolute value.

Finally, the result is returned wrapped in Ok, or an error message is returned with an Err.

fn parse_number(s: &str, radix: u32) -> Result<i64, &'static str> {
    let width = match radix {
        2 => 8,
        8 => 3,
        10 => 3,
        16 => 2,
        _ => 3
    };

    let digits = format!("{:0>width$}", s, width=width);
    let number = i64::from_str_radix(&digits, radix);
    match number {
        Ok(n) => {
            if n > 0x7f {
                Err("value out of range")
            }
            else {
                Ok(n.abs())
            }
        },
        Err(_) => Err("wrong format"),
    }
}

Using the utility

My preferred way of using the utility is to first build a release version using Cargo:

% cargo build --release
Compiling exitcode v1.1.2
Compiling asc v0.1.1 (/Users/me/Projects/Rust/asc)
 Finished release [optimized] target(s) in 21.82s

Since I have created a bin subdirectory in my home directory and added it to my PATH, I just need to copy the resulting binary over, and I'm ready to go:

% cp target/release/asc ~/bin

Just to confirm I'm running what I think I'm running, I can check:

% which asc
/Users/me/bin/asc
% asc 0x48
 72	48	01001000	110	H

Seems to work!

Conclusion

This concludes the walkthrough of the asc utility, written in Rust. Hope it is useful to you also, as I use it almost daily—I still can't remember all the ASCII codes!