asc
, a Rust command-line utility for ASCII code lookupWhen working with old file formats from the 1980s and 1990s I found the need to have a quick utility to look up a byte to find out which ASCII character it possibly represents. So I wrote a command-line utility in Rust to provide that information.
The scope of the program is purposefully limited to ASCII, even though Unicode is finally become accepted as the standard way of encoding text (and particularly the use of UTF-8 has become the norm on the web). The file formats I'm dealing with were designed before Unicode got that position, and possibly even before Unicode was published.
In Unix systems you can usually get information about the ASCII character set with the command
man ascii
, but it gets a little awkward to read the manual page and scan for the
right code in the difference tables for octal, hexadecimal or decimal values.
There is also the ascii utility by Eric S. Raymond
(easily installed on macOS using Homebrew), but that gave me a little too much information for
my taste, and I wanted to do something in Rust, so asc
was born as a more minimalist
Rust alternative.
In this post I'm going to take a quick walkthrough to the design of asc
,
paying special attention to how some things are done in (hopefully) idiomatic Rust.
You can find the source code for asc
on GitHub.
The program is published under the MIT License.
Without any command-line arguments, asc
just prints the ASCII table:
% asc 0 00 00000000 000 NUL 1 01 00000001 001 SOH 2 02 00000010 002 STX . . . 125 7D 01111101 175 } 126 7E 01111110 176 ~ 127 7F 01111111 177 DEL
The table has five columns: the first four give you the ASCII code of a character in decimal, hexadecimal, binary and octal (respectively), and the last column contains the name of the character. The control characters have their standard names, while the printable characters are shown using their usual representation.
If you give asc
one command-line argument, the program tries to interpret
it as either the ASCII code of a character (decimal, hexadecimal, or octal). Decimal codes
have no prefix, while for the others they are:
0x
= hexadecimal0b
= binary0o
= octalThe character code must be in the range 0...127 inclusive. Some examples of lookups:
% asc 65 65 41 01000001 101 A
% asc 10 10 0A 00001010 012 LF
(The program also has an Easter egg related to the command-line parameter, but sadly nobody seems to get the joke.)
The main program of asc
first collects the command-line arguments. It then immediately
starts to prepare for exiting to the shell, returning either exitcode::OK
or
exitcode::USAGE
, depending on the return value of the run_app
function.
These values are defined in the exitcode crate.
fn main() {
let args: Vec<String> = env::args().collect();
std::process::exit(match run_app(&args[1..]) {
Ok(_) => exitcode::OK,
Err(err) => {
eprintln!("error: {:?}", err);
exitcode::USAGE
}
});
}
run_app
functionThe actual work happens in the run_aop
function, which returns a Result
value:
either Ok
or Err
with a message in a static string. It takes in an array of
actual command-line arguments, not including the very first one, which is the name of the executable program
(like "asc"
).
If the array is empty, then this fucntion just prints the full ASCII table using the
print_table
helper function.
Only the first element in the array of command-line arguments, if any, is considered. The run_app
function tries to parse the argument first as a hexadecimal, binary, or octal number, based on the prefix. If there is no
prefix, then the argument is treated as a decimal number.
There is a helper function for parsing the command-line argument as a number, called appropriately
parse_number
. It takes either the rest of the argument if there is a known prefix, or the whole
argument. The number base of the conversion (16, 2, 8, or 10) is also passed. It returns a Result
:
either an i64
value wrapped in Ok
, or an Err
with a message in a static string.
fn run_app(args: &[String]) -> Result<(), &'static str> {
if args.len() == 0 {
print_table();
Ok(())
}
else {
let arg = String::from(&args[0]);
if arg.starts_with("me") {
println!("Anything?");
Ok(())
}
else {
let number: Result<i64, &'static str>;
if arg.starts_with("0x") {
number = parse_number(&arg[2..], 16);
}
else if arg.starts_with("0b") {
number = parse_number(&arg[2..], 2);
}
else if arg.starts_with("0o") {
number = parse_number(&arg[2..], 8);
}
else {
// assume the argument is a decimal number with no prefix
number = parse_number(&arg, 10);
}
match number {
Ok(n) => {
print_row(n);
Ok(())
},
Err(_) => Err("value out of range")
}
}
}
}
If the parse was successful, the print_row
function is used to output the
information about the particular ASCII code:
fn print_row(i: i64) {
println!("{:3}\t{:02X}\t{:08b}\t{:03o}\t{}", i, i, i, i, CHAR_NAMES[i as usize]);
}
This function uses the static array of constant strings called CHAR_NAMES
, defined at the start of the program.
It looks up the name of the character based on the ASCII code. The code is already verified to be in the range 0...127, so it is
safe to use it for indexing the table, which has exactly 128 elements. Note that since the ASCII code was parsed into an i64
,
we needed to convert it to a usize
for indexing.
To parse the argument string as a number, given the desired base, the parse_number
function is used. It first calculates the correct width for the string representation of the
number for each base: binary numbers are eight digits, octal numbers are three, decimal are three,
and hexadecimal are two digits from the appropriate base. (Note: why is the width and fill needed?)
First, the format!
macro in Rust
is used to fill the string to the calculated width.
Then the i64::from_str_radix
function is used to parse the string into an actual number of type i64
.
The result of the parsing is checked to ensure that the number represents a valid ASCII code
from 0 to 127. Numbers greater than 127 are rejected by returing Err
, while accidental negative numbers are
turned into positive by taking their absolute value.
Finally, the result is returned wrapped in Ok
, or an error message is returned with
an Err
.
fn parse_number(s: &str, radix: u32) -> Result<i64, &'static str> {
let width = match radix {
2 => 8,
8 => 3,
10 => 3,
16 => 2,
_ => 3
};
let digits = format!("{:0>width$}", s, width=width);
let number = i64::from_str_radix(&digits, radix);
match number {
Ok(n) => {
if n > 0x7f {
Err("value out of range")
}
else {
Ok(n.abs())
}
},
Err(_) => Err("wrong format"),
}
}
My preferred way of using the utility is to first build a release version using Cargo:
% cargo build --release Compiling exitcode v1.1.2 Compiling asc v0.1.1 (/Users/me/Projects/Rust/asc) Finished release [optimized] target(s) in 21.82s
Since I have created a bin
subdirectory in my home directory and added it
to my PATH
, I just need to copy the resulting binary over, and I'm ready to go:
% cp target/release/asc ~/bin
Just to confirm I'm running what I think I'm running, I can check:
% which asc /Users/me/bin/asc % asc 0x48 72 48 01001000 110 H
Seems to work!
This concludes the walkthrough of the asc
utility, written in Rust. Hope it is useful to
you also, as I use it almost daily—I still can't remember all the ASCII codes!