ASCII lookup utility in Ada
When working with old digital synthesizers from the 1980s and 1990s (and some newer ones too) I often have the need to quickly look up the identity of a byte that seems like an ASCII character code. For example, in MIDI System Exclusive format files there are often names of sound patches or "voices". When writing programs to process these formats it is handy to do a quick check, because even (or especially?) after nearly 40 years at this thing I still haven't memorized the ASCII codes.
Why ASCII and not Unicode? For the simple reason that these old formats were created at a time when Unicode didn't even exist yet, or was in its very early times. A full-scale Unicode lookup utility would be great, but I haven't found one, or even looked very hard for one.
I have done a utility like this earlier in Rust, and you can find asc-rs on GitHub. However, I wanted to learn more Ada, and also show how this kind of utility could be produced using GNAT.
You don't need to know much of Ada to be able to follow this walkthrough, but I do assume that you know how to program in some other language like Java, C++, Python or Rust, just to name a few.
Getting your Ada tools ready
For this walkthrough I'll be working in a Unix-like environment (either Linux or macOS) in a terminal, so if you're using Microsoft Windows, you will need to adjust for Command Prompt or PowerShell. The utility itself will be a command-line tool, with no graphical user interface, so it should be highly portable across these mainstream operating systems.
To install Ada development tools, I refer you to Ada on Windows and Linux: an installation guide by Adel Noureddine. You will need an Ada compiler (GNAT) and a build system (GPRBuild). For this utility I will not be using the Alire package manager, because I'm trying to get by without using any external libraries, only standard Ada. This is such a small program that it is better to work with Ada only and not introduce the complications of a package manager (not that working with Alire is complicated, but it is an extra step that can be saved until the time you are working with something more substantial).
Once you have installed the tools, make a subdirectory for the project
called asc
on your development machine, so that we can
begin.
Basic skeleton of the utility
Ultimately I want the program to work like this: when I invoke it with no arguments, it prints the full ASCII table. That will result in 127 rows, each with the character code (in decimal, hexadecimal, octal, and binary) and the name of the character and its visible manifestation (except for some printable control characters which would mess up the display).
When you invoke the program with one argument, that will be interpreted
as a numeric character code value. If the argument starts
with 0x
, it will be taken as a character code in hexadecimal format.
Similarly, starting with 0b
means binary, and 0o
means
octal. If there is no such prefix, then decimal is assumed.
Here are some examples of the program in use, once it is finalised:
Gets information about the hexadecimal ASCII character code 41H:
% asc 0x41 65 41 1000001 101 A
Gets information about the binary ASCII character code 1010B:
% asc 0b1010 10 0A 0001010 012 LF
From the output it can be seen that the binary character codes have seven bits, because ASCII is a 7-bit code.
Note that the program does not accept an actual character as
the argument. This is by design; some characters could be interpreted
by the shell, and any character can be queried by combining the
command with grep
. For example, to find out about the
character N:
% asc | grep "N" 0 00 0000000 000 NUL 5 05 0000101 005 ENQ 21 15 0010101 025 NAK 22 16 0010110 026 SYN 24 18 0011000 030 CAN 78 4E 1001110 116 N
Well, maybe that was a little too much information - try grep
with a regular expression that matches a space and an N at the end of a line:
% ./asc | grep " N$" 78 4E 1001110 116 N
Much better!
First rudimentary Ada version
The program text is in the file asc.adb
. The extension
stands for "Ada body". There is one library unit, namely a procedure
called Asc
. It also contains a nested procedure called
Print_Table
, which is called if there are no command-line
arguments. The first version
(GitHub)
looks like this:
with Ada.Text_IO; with Ada.Command_Line; procedure Asc is -- Print the full ASCII table. procedure Print_Table is begin Ada.Text_IO.Put_Line ("(table goes here)"); end Print_Table; begin -- If there are no command line arguments, -- just print the whole table and exit. if Ada.Command_Line.Argument_Count < 1 then Print_Table; return; end if; -- Show the first command line argument Ada.Text_IO.Put ("First argument = '" & Ada.Command_Line.Argument (1) & "'"); Ada.Text_IO.New_Line; end Asc;
The command line argument handling will be implemented later.
To build the program you can use gnatmake
:
% gnatmake asc
The result should be an executable file in the current directory. If you run it,
you should see just the text "(table goes here)
".
Printing the ASCII table
Since handling the command line parameters is the more difficult task, lets start with printing out the full ASCII table.
The Ada Character
data type covers the ISO Latin-1 character set,
but we only need ASCII. The package Ada.Characters.Handling
defines
a subtype of Character
called ISO_646
, which is
exactly what we want. It is defined like this in the standard library:
subtype ISO_646 is Character range Character'Val(0) .. Character'Val(127);
We could rename this type for convenience, but let's just use the original name.
The second version of the program
(GitHub)
introduces another nested
procedure, Print_Row
, which handles the printing
of the information row for each ASCII character. Now the
Print_Table
procedure has been augmented to loop
through all the characters in the ISO_646
type.
with Ada.Text_IO; with Ada.Command_Line; with Ada.Characters.Handling; use Ada.Characters.Handling; procedure Asc is -- Print a row for an ASCII character. procedure Print_Row (Char : ISO_646) is begin Ada.Text_IO.Put_Line ("Row for character " & Char'Image); end Print_Row; -- Print the full ASCII table. procedure Print_Table is begin for Char in ISO_646'Range loop Print_Row (Char); end loop; end Print_Table; begin -- If there are no command line arguments, -- just print the whole table and exit. if Ada.Command_Line.Argument_Count < 1 then Print_Table; return; end if; -- Show the first command line argument Ada.Text_IO.Put ("First argument = '" & Ada.Command_Line.Argument (1) & "'"); Ada.Text_IO.New_Line; end Asc;
If you now compile and run this program, it will print out 127 rows like this (most are omitted here):
Row for character NUL Row for character SOH Row for character STX . . . Row for character '~' Row for character DEL
With that settled, it's time to move on to constructing the actual rows.
Row output, Ada style
Every individual row should have the character code in four bases,
followed by the character name. We can print these out using the
facilities found in the Ada.Integer_Text_IO
package.
This time, only the revised Print_Row
procedure is shown
(see GitHub
for full program text):
-- Print a row for an ASCII character. procedure Print_Row (Char : ISO_646) is use Ada.Text_IO; use Ada.Integer_Text_IO; -- The ordinal value of the character Value : constant Integer := ISO_646'Pos (Char); begin Put (Item => Value, Width => 3, Base => 10); Put (" "); Put (Item => Value, Width => 2, Base => 16); Put (" "); Put (Item => Value, Width => 7, Base => 2); Put (" "); Put (Item => Value, Width => 3, Base => 8); Put (" "); Put (Item => Char'Image); New_Line; end Print_Row;
The default printout leaves something to be
desired. By default, the Put
procedure prints out first the base and
then the value of the character code, sandwiched between hash characters
(for all but base 10):
65 16#41# 2#1000001# 8#101# 'A'
We would like to get rid of the base and the hash characters, and also the single quotes around the printable characters (names of control characters like LF are already shown).
Custom number printing
The solution here is to make a procedure to print out the character code values in various bases with a constant width and zero-padded from the left as desired.
We are dealing strictly with 7-bit ASCII code values, so let's make a subtype:
subtype ASCII_Code is Integer range 0 .. 127;
The number base used by the procedures in the Ada.Integer_Text_IO
package use the Ada.Text_IO.Number_Base
type. We can further
restrict the allowed bases to 2, 8, 10, and 16 using a subtype predicate:
subtype Our_Base is Ada.Text_IO.Number_Base with Static_Predicate => Our_Base in 2 | 8 | 10 | 16;
Now let's make a new procedure Print_Value
that
takes as its arguments the character code, the desired width,
and the number base to use. It will first output the value
into a string, and then extract the part that should be printed.
(Full program text on GitHub.)
-- Print a non-base-10 value. -- Based on ideas found here: https://stackoverflow.com/a/30423877 procedure Print_Value (Value : ASCII_Code; Width : Positive; Base : Our_Base) is -- Make a temporary string with the maximum length (of 2#1111111#) Temp_String : String (1 .. 10); First_Hash_Position : Natural := 0; Second_Hash_Position : Natural := 0; begin -- Get base 10 out of the way first. Just put it out. if Base = 10 then Ada.Integer_Text_IO.Put (Item => Value, Width => 3); return; end if; -- Put the ASCII code value in the specified base -- into the temporary string. Since we are not putting -- a base 10 value, we know there will be hash characters. Ada.Integer_Text_IO.Put (To => Temp_String, Item => Value, Base => Base); -- Get the first hash position, starting from the front First_Hash_Position := Index (Source => Temp_String, Pattern => "#", From => 1, Going => Forward); -- Get the second hash position, starting from the back Second_Hash_Position := Index (Source => Temp_String, Pattern => "#", From => Temp_String'Length, Going => Backward); -- Put the part between the hash positions, zero-padded from the left Ada.Text_IO.Put ( Tail ( Source => Temp_String (First_Hash_Position + 1 .. Second_Hash_Position - 1), Count => Width, Pad => '0')); end Print_Value;
We handle the decimal character code value first. It doesn't need any special treatment, so we just print it out in a field of three digits, and we automatically get left-padding with spaces.
For the other bases, we take advantage of the fact that the longest
value we will ever produce has 10 characters. That would be any 7-bit
binary number with the base and the hash characters. So we need a
temporary string of up to that length. We can
use the Ada.Integer_Text_IO.Put (To, Item, Base)
overload to
output the character code into the temporary string.
We can find the positions of the first and second hash character in the temporary
string using the
Ada.Strings.Fixed.Index
function, searching forward and backward
respectively.
Finally, we can extract the relevant part (between the hash characters)
using Ada.Strings.Fixed.Tail
function and display it.
Note that the string indexing starts at 1, not 0!
Note also that we have used the procedures and functions from the
Ada.Strings.Fixed
package without their prefixes,
because that can get quite tedious. In a program of this size
I think we can get away with adding use
clauses.
This is the complete list of with
and use
clauses at the top of the program text:
with Ada.Text_IO; with Ada.Command_Line; with Ada.Characters.Handling; use Ada.Characters.Handling; with Ada.Strings; use Ada.Strings; with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Integer_Text_IO;
We still need to update the Print_Row
procedure to use the
new Print_Value
procedure. We also print the character name
if it is a control character, but print the actual character otherwise:
-- Print a full row for the character: decimal, hexadecimal, binary, -- octal, and the character name or literal. procedure Print_Row (Char : ISO_646) is use Ada.Text_IO; -- The ordinal value of the character Value : constant ASCII_Code := ISO_646'Pos (Char); -- The separator between the fields Blanks : constant String := 2 * Space; begin Print_Value (Value, Width => 3, Base => 10); Put (Blanks); Print_Value (Value, Width => 2, Base => 16); Put (Blanks); Print_Value (Value, Width => 7, Base => 2); Put (Blanks); Print_Value (Value, Width => 3, Base => 8); Put (Blanks); if Is_Control (Char) then Put (Item => Char'Image); else Put (Char); end if; New_Line; end Print_Row;
If you try to use Print_Value
to print a value in some other
base than those declared in the static predicate of the Our_Base
type, then the Ada compiler will first warn you:
asc.adb:57:47: warning: static expression fails static predicate check on "Our_Base" [enabled by default] asc.adb:57:47: warning: expression is no longer considered static [enabled by default]
but only if you specify pragma Assertion_Policy (Check)
at the
beginning of the program text. You will also get an Assertion_Error
at runtime:
raised ADA.ASSERTIONS.ASSERTION_ERROR : Static_Predicate failed at asc.adb:57
You will get even more warnings from the compiler if you mistakenly specify
a zero or negative value for the Width
parameter, since it is
declared with the type Positive
.
A hidden gem of Ada is found in the Ada.Strings.Fixed
package:
the multiplication operator is defined for integers and strings. In the
Print_Row
procedure we define a constant:
-- The separator between the fields Blanks : constant String := 2 * Space;
Then we use it to separate the character code values between calls to
Print_Value
with Put (Blanks)
. This makes
the program more readable, and also gives us a handy way of changing the
number of blanks with just changing the number 2 to something else, like 4,
in just one place, instead of hunting for Put (" ")
lines.
Here Space
is actually Ada.Characters.Latin_1.Space
,
but can be used like this because of the use
clause.
Handling the command-line argument
Now we have a utility that can print the ASCII table, with the character codes in four different number bases. Here is an excerpt from the output:
65 41 1000001 101 A 66 42 1000010 102 B 67 43 1000011 103 C
There is one more thing left to do: to handle the command-line argument, if there is one. That will narrow down the printout to information about just the one character code specified in any of the known number bases.
We'll start with some helpers. The Starts_With
function returns
true
if the given string starts with the given prefix, and
false
otherwise. It uses the Ada.Strings.Fixed.Index
function. The Print_Error
procedure just prints a message to the
standard error device.
-- Helper function to find out if a string starts with a prefix. function Starts_With (S : String; Prefix : String) return Boolean is begin return (Ada.Strings.Fixed.Index (Source => S, Pattern => Prefix) /= 0); end Starts_With; -- Print an error message to the standard error device. procedure Print_Error (Message : String) is begin Ada.Text_IO.Put_Line (Ada.Text_IO.Standard_Error, Message); end Print_Error;
We need a few variables for the argument processing, so we'll use a
declare
block to make it clear that these are only
used if we actually have an argument to process.
Since we know at this point that we do have at least one command-line
argument, we can save it to the Arg
variable and check
if it starts with one of the prefixes we support. We will set the
Start_Position
variable accordingly, so that we know
where the actual number part starts.
To interpret the actual number after any possible prefix, we use Ada's
own mechanism that we just circumvented in the Print_Value
procedure earlier: construct a string with the base and the hash
characters, and read from it using the Ada.Integer_Text_IO.Get
procedure.
There are basically two things that can go wrong here: either the character code argument is complete garbage (not a number in any given base), or it is out of the ASCII code range.
If the argument is bad, the Ada.Integer_Text_IO.Get
procedure
will raise an exception of type Ada.Text_IO.Data_Error
. We handle
this in the exception handler of the declare
block. We just
print a message to standard error.
If we get a value, we check that it is in the range of the
ASCII_Code
type. On success we print the row for that
character code. On failure, we print an error message.
Here is the declare
block in its entirety
(full program text on GitHub):
declare Arg : constant String := Ada.Command_Line.Argument (1); -- The start position of the number part, after any prefix. -- The most common case is 3 (after a prefix line "0x"). Start_Position : Positive := 3; -- The position of the last character that the -- Get procedure read (required but ignored here) Last_Position_Ignored : Positive; -- The actual number we get out of the argument Value : Integer; -- The base for the argument, defaults to decimal Base : Our_Base := 10; begin if Starts_With (Arg, "0x") then Base := 16; elsif Starts_With (Arg, "0b") then Base := 2; elsif Starts_With (Arg, "0o") then Base := 8; else -- no prefix, most likely a decimal number Start_Position := 1; end if; -- Construct an image like "10#65#" or "16#7E#" and parse it. Ada.Integer_Text_IO.Get ( From => Base'Image & "#" & Arg (Start_Position .. Arg'Length) & "#", Item => Value, Last => Last_Position_Ignored); if Value in ASCII_Code then Print_Row (Character'Val (Value)); else Print_Error ("Character code out of range: " & Arg); end if; exception when Ada.Text_IO.Data_Error => Print_Error ("Error in argument"); end;
With that, we're done! Some Ada-specific points to note:
- Constants were used whenever possible.
- Use clauses were used liberally. In a larger program, consider applying use clauses locally in procedures to lighten the cognitive load.
- The Ada text I/O has some quirks, so you need to roll your own like we did here with the number printout.
- Use the Ada type system to your advantage to better catch problems related to invalid data values.
Hopefully this is a useful utility for you if you need an ASCII code lookup, and instructional in learning Ada programming.
The final version of the program is found on GitHub.