ASCII lookup utility in Ada

When working with old digital synthesizers from the 1980s and 1990s (and some newer ones too) I often have the need to quickly look up the identity of a byte that seems like an ASCII character code. For example, in MIDI System Exclusive format files there are often names of sound patches or "voices". When writing programs to process these formats it is handy to do a quick check, because even (or especially?) after nearly 40 years at this thing I still haven't memorized the ASCII codes.

Why ASCII and not Unicode? For the simple reason that these old formats were created at a time when Unicode didn't even exist yet, or was in its very early times. A full-scale Unicode lookup utility would be great, but I haven't found one, or even looked very hard for one.

I have done a utility like this earlier in Rust, and you can find asc-rs on GitHub. However, I wanted to learn more Ada, and also show how this kind of utility could be produced using GNAT.

You don't need to know much of Ada to be able to follow this walkthrough, but I do assume that you know how to program in some other language like Java, C++, Python or Rust, just to name a few.

Getting your Ada tools ready

For this walkthrough I'll be working in a Unix-like environment (either Linux or macOS) in a terminal, so if you're using Microsoft Windows, you will need to adjust for Command Prompt or PowerShell. The utility itself will be a command-line tool, with no graphical user interface, so it should be highly portable across these mainstream operating systems.

To install Ada development tools, I refer you to Ada on Windows and Linux: an installation guide by Adel Noureddine. You will need an Ada compiler (GNAT) and a build system (GPRBuild). For this utility I will not be using the Alire package manager, because I'm trying to get by without using any external libraries, only standard Ada. This is such a small program that it is better to work with Ada only and not introduce the complications of a package manager (not that working with Alire is complicated, but it is an extra step that can be saved until the time you are working with something more substantial).

Once you have installed the tools, make a subdirectory for the project called asc on your development machine, so that we can begin.

Basic skeleton of the utility

Ultimately I want the program to work like this: when I invoke it with no arguments, it prints the full ASCII table. That will result in 127 rows, each with the character code (in decimal, hexadecimal, octal, and binary) and the name of the character and its visible manifestation (except for some printable control characters which would mess up the display).

When you invoke the program with one argument, that will be interpreted as a numeric character code value. If the argument starts with 0x, it will be taken as a character code in hexadecimal format. Similarly, starting with 0b means binary, and 0o means octal. If there is no such prefix, then decimal is assumed.

Here are some examples of the program in use, once it is finalised:

Gets information about the hexadecimal ASCII character code 41H:

% asc 0x41
 65  41  1000001  101  A

Gets information about the binary ASCII character code 1010B:

% asc 0b1010
 10  0A  0001010  012  LF        
    

From the output it can be seen that the binary character codes have seven bits, because ASCII is a 7-bit code.

Note that the program does not accept an actual character as the argument. This is by design; some characters could be interpreted by the shell, and any character can be queried by combining the command with grep. For example, to find out about the character N:

% asc | grep "N"
  0  00  0000000  000  NUL
  5  05  0000101  005  ENQ
 21  15  0010101  025  NAK
 22  16  0010110  026  SYN
 24  18  0011000  030  CAN
 78  4E  1001110  116  N

Well, maybe that was a little too much information - try grep with a regular expression that matches a space and an N at the end of a line:

% ./asc | grep " N$"
 78  4E  1001110  116  N

Much better!

First rudimentary Ada version

The program text is in the file asc.adb. The extension stands for "Ada body". There is one library unit, namely a procedure called Asc. It also contains a nested procedure called Print_Table, which is called if there are no command-line arguments. The first version (GitHub) looks like this:

    with Ada.Text_IO;
    with Ada.Command_Line;
        
    procedure Asc is    
       --  Print the full ASCII table.
       procedure Print_Table is
       begin
          Ada.Text_IO.Put_Line ("(table goes here)");
       end Print_Table;
        
    begin
       --  If there are no command line arguments, 
       --  just print the whole table and exit.
       if Ada.Command_Line.Argument_Count < 1 then
          Print_Table;
          return;
       end if;
    
       --  Show the first command line argument
       Ada.Text_IO.Put ("First argument = '" &
          Ada.Command_Line.Argument (1) & "'");
       Ada.Text_IO.New_Line;
    end Asc;
    

The command line argument handling will be implemented later.

To build the program you can use gnatmake:

% gnatmake asc

The result should be an executable file in the current directory. If you run it, you should see just the text "(table goes here)".

Printing the ASCII table

Since handling the command line parameters is the more difficult task, lets start with printing out the full ASCII table.

The Ada Character data type covers the ISO Latin-1 character set, but we only need ASCII. The package Ada.Characters.Handling defines a subtype of Character called ISO_646, which is exactly what we want. It is defined like this in the standard library:

subtype ISO_646 is
    Character range Character'Val(0) .. Character'Val(127);

We could rename this type for convenience, but let's just use the original name.

The second version of the program (GitHub) introduces another nested procedure, Print_Row, which handles the printing of the information row for each ASCII character. Now the Print_Table procedure has been augmented to loop through all the characters in the ISO_646 type.

    with Ada.Text_IO;
    with Ada.Command_Line;
    with Ada.Characters.Handling; use Ada.Characters.Handling;
    
    procedure Asc is
    
       --  Print a row for an ASCII character.
       procedure Print_Row (Char : ISO_646) is
       begin
          Ada.Text_IO.Put_Line ("Row for character " & Char'Image);
       end Print_Row;
    
       --  Print the full ASCII table.
       procedure Print_Table is
       begin
          for Char in ISO_646'Range loop
             Print_Row (Char);
          end loop;
       end Print_Table;
    
    begin
       --  If there are no command line arguments, 
       --  just print the whole table and exit.
       if Ada.Command_Line.Argument_Count < 1 then
          Print_Table;
          return;
       end if;
    
       --  Show the first command line argument
       Ada.Text_IO.Put ("First argument = '" &
          Ada.Command_Line.Argument (1) & "'");
       Ada.Text_IO.New_Line;
    end Asc;
    

If you now compile and run this program, it will print out 127 rows like this (most are omitted here):

Row for character NUL
Row for character SOH
Row for character STX
.
.
.
Row for character '~'
Row for character DEL

With that settled, it's time to move on to constructing the actual rows.

Row output, Ada style

Every individual row should have the character code in four bases, followed by the character name. We can print these out using the facilities found in the Ada.Integer_Text_IO package.

This time, only the revised Print_Row procedure is shown (see GitHub for full program text):

       --  Print a row for an ASCII character.
       procedure Print_Row (Char : ISO_646) is
          use Ada.Text_IO;
          use Ada.Integer_Text_IO;
        
          --  The ordinal value of the character
          Value : constant Integer := ISO_646'Pos (Char);
       begin
          Put (Item => Value, Width => 3, Base => 10);
          Put ("  ");
          Put (Item => Value, Width => 2, Base => 16);
          Put ("  ");
          Put (Item => Value, Width => 7, Base => 2);
          Put ("  ");      
          Put (Item => Value, Width => 3, Base => 8);
          Put ("  ");      
          Put (Item => Char'Image);
          New_Line;
       end Print_Row;        
    

The default printout leaves something to be desired. By default, the Put procedure prints out first the base and then the value of the character code, sandwiched between hash characters (for all but base 10):

 65  16#41#  2#1000001#  8#101#  'A'

We would like to get rid of the base and the hash characters, and also the single quotes around the printable characters (names of control characters like LF are already shown).

Custom number printing

The solution here is to make a procedure to print out the character code values in various bases with a constant width and zero-padded from the left as desired.

We are dealing strictly with 7-bit ASCII code values, so let's make a subtype:

subtype ASCII_Code is Integer range 0 .. 127;

The number base used by the procedures in the Ada.Integer_Text_IO package use the Ada.Text_IO.Number_Base type. We can further restrict the allowed bases to 2, 8, 10, and 16 using a subtype predicate:

subtype Our_Base is Ada.Text_IO.Number_Base with
   Static_Predicate => Our_Base in 2 | 8 | 10 | 16;

Now let's make a new procedure Print_Value that takes as its arguments the character code, the desired width, and the number base to use. It will first output the value into a string, and then extract the part that should be printed. (Full program text on GitHub.)

       --  Print a non-base-10 value.
       --  Based on ideas found here: https://stackoverflow.com/a/30423877
       procedure Print_Value (Value : ASCII_Code; Width : Positive; Base : Our_Base) is
          --  Make a temporary string with the maximum length (of 2#1111111#)
          Temp_String : String (1 .. 10);
    
          First_Hash_Position : Natural := 0;
          Second_Hash_Position : Natural := 0;
       begin
          -- Get base 10 out of the way first. Just put it out.
          if Base = 10 then
             Ada.Integer_Text_IO.Put (Item => Value, Width => 3);
             return;
          end if;
    
          --  Put the ASCII code value in the specified base 
          --  into the temporary string. Since we are not putting 
          --  a base 10 value, we know there will be hash characters.
          Ada.Integer_Text_IO.Put (To => Temp_String, Item => Value, Base => Base);
    
          -- Get the first hash position, starting from the front
          First_Hash_Position := Index (Source => Temp_String, 
             Pattern => "#", From => 1, Going => Forward);
    
          -- Get the second hash position, starting from the back
          Second_Hash_Position := Index (Source => Temp_String,
             Pattern => "#", From => Temp_String'Length, Going => Backward);
    
          -- Put the part between the hash positions, zero-padded from the left
          Ada.Text_IO.Put (
             Tail (
                Source => Temp_String (First_Hash_Position + 1 .. Second_Hash_Position - 1),
                Count   => Width,
                Pad     => '0'));
       end Print_Value;    
    

We handle the decimal character code value first. It doesn't need any special treatment, so we just print it out in a field of three digits, and we automatically get left-padding with spaces.

For the other bases, we take advantage of the fact that the longest value we will ever produce has 10 characters. That would be any 7-bit binary number with the base and the hash characters. So we need a temporary string of up to that length. We can use the Ada.Integer_Text_IO.Put (To, Item, Base) overload to output the character code into the temporary string.

We can find the positions of the first and second hash character in the temporary string using the Ada.Strings.Fixed.Index function, searching forward and backward respectively.

Finally, we can extract the relevant part (between the hash characters) using Ada.Strings.Fixed.Tail function and display it. Note that the string indexing starts at 1, not 0!

Note also that we have used the procedures and functions from the Ada.Strings.Fixed package without their prefixes, because that can get quite tedious. In a program of this size I think we can get away with adding use clauses. This is the complete list of with and use clauses at the top of the program text:

with Ada.Text_IO;
with Ada.Command_Line;
with Ada.Characters.Handling; use Ada.Characters.Handling;
with Ada.Strings; use Ada.Strings;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
with Ada.Integer_Text_IO;

We still need to update the Print_Row procedure to use the new Print_Value procedure. We also print the character name if it is a control character, but print the actual character otherwise:

       --  Print a full row for the character: decimal, hexadecimal, binary,
       --  octal, and the character name or literal.
       procedure Print_Row (Char : ISO_646) is
          use Ada.Text_IO;
    
          --  The ordinal value of the character
          Value : constant ASCII_Code := ISO_646'Pos (Char);
    
          --  The separator between the fields
          Blanks : constant String := 2 * Space;
       begin
          Print_Value (Value, Width => 3, Base => 10);
          Put (Blanks);
          Print_Value (Value, Width => 2, Base => 16);
          Put (Blanks);
          Print_Value (Value, Width => 7, Base => 2);
          Put (Blanks);      
          Print_Value (Value, Width => 3, Base => 8);
          Put (Blanks);
    
          if Is_Control (Char) then    
             Put (Item => Char'Image);
          else
             Put (Char);
          end if;
    
          New_Line;
       end Print_Row;    

If you try to use Print_Value to print a value in some other base than those declared in the static predicate of the Our_Base type, then the Ada compiler will first warn you:

asc.adb:57:47: warning: static expression fails static predicate check on "Our_Base" [enabled by default]
asc.adb:57:47: warning: expression is no longer considered static [enabled by default]

but only if you specify pragma Assertion_Policy (Check) at the beginning of the program text. You will also get an Assertion_Error at runtime:

raised ADA.ASSERTIONS.ASSERTION_ERROR : Static_Predicate failed at asc.adb:57

You will get even more warnings from the compiler if you mistakenly specify a zero or negative value for the Width parameter, since it is declared with the type Positive.

A hidden gem of Ada is found in the Ada.Strings.Fixed package: the multiplication operator is defined for integers and strings. In the Print_Row procedure we define a constant:

--  The separator between the fields
Blanks : constant String := 2 * Space;

Then we use it to separate the character code values between calls to Print_Value with Put (Blanks). This makes the program more readable, and also gives us a handy way of changing the number of blanks with just changing the number 2 to something else, like 4, in just one place, instead of hunting for Put (" ") lines. Here Space is actually Ada.Characters.Latin_1.Space, but can be used like this because of the use clause.

Handling the command-line argument

Now we have a utility that can print the ASCII table, with the character codes in four different number bases. Here is an excerpt from the output:

 65  41  1000001  101  A
 66  42  1000010  102  B
 67  43  1000011  103  C

There is one more thing left to do: to handle the command-line argument, if there is one. That will narrow down the printout to information about just the one character code specified in any of the known number bases.

We'll start with some helpers. The Starts_With function returns true if the given string starts with the given prefix, and false otherwise. It uses the Ada.Strings.Fixed.Index function. The Print_Error procedure just prints a message to the standard error device.

       --  Helper function to find out if a string starts with a prefix.
       function Starts_With (S : String; Prefix : String) return Boolean is
       begin
          return (Ada.Strings.Fixed.Index (Source => S, Pattern => Prefix) /= 0);
       end Starts_With;
    
       --  Print an error message to the standard error device.
       procedure Print_Error (Message : String) is
       begin
          Ada.Text_IO.Put_Line (Ada.Text_IO.Standard_Error, Message);
       end Print_Error;    

We need a few variables for the argument processing, so we'll use a declare block to make it clear that these are only used if we actually have an argument to process.

Since we know at this point that we do have at least one command-line argument, we can save it to the Arg variable and check if it starts with one of the prefixes we support. We will set the Start_Position variable accordingly, so that we know where the actual number part starts.

To interpret the actual number after any possible prefix, we use Ada's own mechanism that we just circumvented in the Print_Value procedure earlier: construct a string with the base and the hash characters, and read from it using the Ada.Integer_Text_IO.Get procedure.

There are basically two things that can go wrong here: either the character code argument is complete garbage (not a number in any given base), or it is out of the ASCII code range.

If the argument is bad, the Ada.Integer_Text_IO.Get procedure will raise an exception of type Ada.Text_IO.Data_Error. We handle this in the exception handler of the declare block. We just print a message to standard error.

If we get a value, we check that it is in the range of the ASCII_Code type. On success we print the row for that character code. On failure, we print an error message.

Here is the declare block in its entirety (full program text on GitHub):

       declare
          Arg : constant String := Ada.Command_Line.Argument (1);
    
          --  The start position of the number part, after any prefix.
          --  The most common case is 3 (after a prefix line "0x").
          Start_Position : Positive := 3;
    
          --  The position of the last character that the
          --  Get procedure read (required but ignored here)
          Last_Position_Ignored  : Positive;
    
          --  The actual number we get out of the argument
          Value : Integer;
    
          --  The base for the argument, defaults to decimal
          Base : Our_Base := 10;
       begin
          if Starts_With (Arg, "0x") then
             Base := 16;
          elsif Starts_With (Arg, "0b") then
             Base := 2;
          elsif Starts_With (Arg, "0o") then
             Base := 8;
          else  --  no prefix, most likely a decimal number
             Start_Position := 1;
          end if;
    
          --  Construct an image like "10#65#" or "16#7E#" and parse it.
          Ada.Integer_Text_IO.Get (
             From => Base'Image & "#" & Arg (Start_Position .. Arg'Length) & "#",
             Item => Value,
             Last => Last_Position_Ignored);
    
          if Value in ASCII_Code then
             Print_Row (Character'Val (Value));
          else
             Print_Error ("Character code out of range: " & Arg);
          end if;
       exception
          when Ada.Text_IO.Data_Error =>
             Print_Error ("Error in argument");
       end;
    

With that, we're done! Some Ada-specific points to note:

Hopefully this is a useful utility for you if you need an ASCII code lookup, and instructional in learning Ada programming.

The final version of the program is found on GitHub.