Language/APL_Inside
comp.lang.apl posting by Bob Bernecky about the internal
structure of APL
From: Robert Bernecky 97/10/27 13:23
Subject: Re: What is APL data storage?
To: Larisa Cherkanskaya
Newsgroups: comp.lang.apl
I am a designer and implementor of APL interpreters and compilers,
so I might be able to provide a few suggestions and comments:
a. The idea that APL is heavy on arrays of real numbers
is a common myth, but not true. Boolean, integer, and character
arrays dominate the data types; 0- and 1-element arrays represent more
than half of the array sizes encountered in real applications.
Measurements of real applications, taken over periods
of weeks on large systems bear out these statistics. I can provide
3 or 4 independent citations of studies if you wish.
However, that is not truly relevant to your problem, which
is fundamentally the issue of array heap management.
b. The issue of arrays being real numbers only is a red herring. Bits is bits,
as they say. The only potential design advantage given by that limitation
is that you can elide type information, and perhaps gain some performance
advantage by ensuring that data is aligned in memory in a manner
appropriate to your target machine.
c. I am familiar with several forms of storage management in APL
systems. Simply put, they are as follows:
1. Use the operating system calls for storage allocation and
deallocation (e.g., malloc/free). This generally presumes that
data is not going to have to move while allocated, and may
cause storage fragmentation to the point of system failure
if array sizes are highly variable and the system executes for
long periods of time. Operating system calls may be more
expensive than one of the following techniques.
2. Allocate your own heap at startup time and manage it yourself.
This is generally faster than operating system calls, but means
that you get to write and debug your own storage manager,
and you also tend to be limited in the heap size you can allocate
(You have to guess how much dynamic memory will be needed
during the entire execution. This is the APL " workspace full"
problem.) . Methods of heap management of this sort are
many and varied, and I recommend you check a standard
Algorithms text for methods. An advantage of this scheme
besides performance is that you can move arrays during
execution to defragment memory.
3. Use a fancy storage management system that is a hybrid of the
two above, perhaps combined with a buddy system, to reduce
fragmentation and the need to garbage collect/defragment
storage.
d. Arrays tend to be stored in APL as something like this (Call this an
m-entry, for m-entry): [By the way, schemes of this sort date back to
the late 1960's, and the development of APL\360 at IBM.]
backpointer
length of the m-entry in bytes
element count
reference count
array type
array rank (# dimensions)
array shape
array elements (variable length)
The meanings of these items are:
backpointer- an index or address of the thing that points to this
array. The backpointer gives you the information
you need to move the array somewhere else in
memory, perhaps for heap compaction. When you
move the m-entry, you use the backpointer to find
the forward pointer to the m-entry, and then
adjust that forward pointer to point to the new
location of the m-entry after it has been moved.
m-entry length: This is used by the storage compacter when the array
is moved and by the storage deallocator when the array is
deallocated.
element count: Many APL primitives need to know the number of elements
in the array. For example, the "ravel" primitive (,) that turns any
array into a vector (list) has to compute the result shape. The
element count, formed as the times reduction of the array shape, is
redundant, but is sometimes used often enough during execution that
it makes sense to keep it around, rather than doing the multiplies
on each primitive call.
reference count: The number of "things" that point to this m-entry
New references to the m-entry increase the reference count;
deleted references decrease it. When the reference count goes to
zero, the m-entry is deallocated. This speeds up certain
operations and makes them take less memory. For example,
if HERM is a big matrix, and you call a function foo: foo HERM
then you can either copy all of HERM to a new array for use by foo,
OR you can increase the reference count for HERM and pass it
directly into foo. Anything that might want to alter foo's argument
checks the reference count -- if it's 1, then it can safely do the
changes in place; otherwise it has to make a copy then.
Reference counts are generally swell. In compiled APL, such
as my APEX compiler, we can perform static analysis to remove
most of the reference counting. This is difficult to do efficiently
(or at all, for that matter) in an interpreter.
array type: Since APL supports many data types on overloaded
primitives, and as APL does not use declarations, the type is used
to identify the array as, e.g., boolean, integer, character, real,
complex. You don't need this, as your arrays are all real. array rank:
The number of dimensions (axes) in the array. That is,
the number of elements in the shape vector.
array shape: The shape vector is the number of elements along each
axis of the array. A chessboard has shape 8 8. If you ravel it
to turn it into a vector, the result has shape ,64.
array elements: Finally, we have the array elements themselves,
stored in row-major order. This ordering makes certain primitives
faster, allows us to easily exploit identities and use cache to
advantage, and matches most other language storage layout
schemes (except FORTRAN).
A good thing to do, now that I've told you this, is to decouple the array
data from the descriptor, keeping the array elements in one heap (or portion
thereof) and the descriptors in another. This permits you to share the same
data among arrays of different rank or shape. For example, that would let you
store the chessboard and its ravelled list as the same array.
Another good thing to do is to maintain an "array coordinate map" [See
Guibas & Wyatt POPL78] that extends this scheme so that common array
operations, such as transpose, reversal, drop, and row indexing, can be done
in fixed time, with no data movement regardless of the size of the arrays
involved. There's more, but that should give you a start...
Bob
Larisa Cherkanskaya wrote:
> I'm a computer major student. Part of my assignment is to design a
> storage managment system for a heap of variaable-size blocks under
> assumption that the heap is used only to store arrays of real numbers.
> This assumption is basicaly the one behind APL data storage.
> Can anybody please tell me where can I find the APL data description?
>
> My e-mail akrats1@triton.towson.edu
>
> Thank you! Alex.
\/ \/ \/ Feedback \/ \/ \/