ONE OF THE FIRST STEPS in creating a function library for calculating with large
numbers is to determine how large numbers are to represented in the computer’s
main memory. It is necessary to plan carefully, since decisions made at this point
will be difficult to revise at a later time. Changes to the internal structure of a
software library are always possible, but the user interface should be kept as
stable as possible in the sense of “upward compatibility.”
It is necessary to determine the order of magnitude of the numbers to be
processed and the data type to be used for coding these numerical values.
The basic function of all routines in the FLINT/C library is the processing
of natural numbers of several hundred digits, which far exceeds the capacity of
standard data types. We thus require a logical ordering of a computer’s memory
units by means of which large numbers can be expressed and operated on. In
this regard one might imagine structures that automatically create sufficient
space for the values to be represented, but no more than is actually needed. One
would like to maintain such economically organized housekeeping with respect
to main memory by means of dynamic memory management for large numbers
that allocates or releases memory according to need in the course of arithmetic
operations. Although such can certainly be realized (see, for example, [Skal]),
memory management has a price in computation time, for which reason the
representation of integers in the FLINT/C package gives preference to the simpler
definition of static length.
For representing large natural numbers one might use vectors whose
elements are a standard data type. For reasons of efficiency an unsigned data
type is to be preferred, which allows the results of arithmetic operations to be
stored in this type without loss as unsigned long (defined in flint.h as ULONG),
which is the largest arithmetic standard C data type (see [Harb], Section 5.1.1). A
ULONG variable can usually be represented exactly with a complete register word of
the CPU.
Our goal is that operations on large numbers be reducible by the compiler as
directly as possible to the register arithmetic of the CPU, for those are the parts
that the computer calculates “in its head,” so to speak. For the FLINT/C package
the representation of large integers is therefore by means of the type unsigned
short int (in the sequel USHORT). We assume that the type USHORT is represented
by 16 bits and that the type ULONG can fully accept results of arithmetic operations
with USHORT types, which is to say that the informally formulated size relationship
USHORT × USHORT ≤ ULONG holds.
Whether these assumptions hold for a particular compiler can be deduced
from the ISO header file limits.h (cf. [Harb], Sections 2.7.1 and 5.1). For example,
in the file limits.h for the GNU C/C++ compiler (cf. [Stlm]) the following appears
#define UCHAR_MAX 0xffU
#define USHRT_MAX 0xffffU
#define UINT_MAX 0xffffffffU
#define ULONG_MAX 0xffffffffUL
One should note that with respect to the number of binary places there
are actually only three sizes that are distinguished. The type USHRT (respectively
USHORT in our notation) can be represented in a 16-bit register; the type ULONG fills
the word length of a CPU with 32-bit registers. The type ULONG_MAX determines
the value of the largest unsigned whole numbers representable by scalar types
(cf. [Harb], page 110).1 The value of the product of two numbers of type USHRT
is at most 0xffff * 0xffff = 0xfffe0001 and is thus representable by a ULONG
type, where the least-significant 16 bits, in our example the value 0x0001, can be
isolated by a cast operation into the type USHRT. The implementation of the basic
arithmetic functions of the FLINT/C package is based on the above-discussed
size relationship between the types USHORT and ULONG.
An analogous approach, one that used data types with 32-bit and 64-bit
lengths in the role of USHORT and ULONG in the present implementation, would
reduce the calculation time for multiplication, division, and exponentiation
1
Without taking into account such practical nonstandard types as unsigned long long in GNU
C/C++ and certain other C compilers.
14
Number Formats: The Representation of Large Numbers in C
by about 25 percent. Such possibilities are realizable with functions written
in assembler with direct access to 64-bit results of machine instructions for
multiplication and division or with processors with 64-bit registers that would
also allow to C implementations the lossless storage of such results in a ULONG
type. The FLINT/C package contains some examples whose gain in speed results
from the use of arithmetic assembler functions (see Chapter 19).
The next question is that of the ordering of the USHORT digits within a vector.
We can imagine two possibilities: from left to right, with a descending evaluation
of digits from lower to higher memory addresses, or the other way round, with
an ascending evaluation of digits from lower to higher memory addresses. The
latter arrangement, which is the reverse of our usual notation, has the advantage
that changes in the size of numbers at constant addresses can take place with the
simple allocation of additional digits, without the numbers having to be relocated
in memory. Thus the choice is clear: The evaluation of digits of our numerical
representation increases with increasing memory address or vector index.
As a further element of the representation the number of digits will be
appended and stored in the first element of the vector. The representation of long
numbers in memory thus has the format
n = (ln1 n2 . . . nl )B ,
0 ≤ l ≤ CLINTMAXDIGIT,
0 ≤ ni < B, i = 1, . . . , l,
where B denotes the base of the numerical representation; for the FLINT/C
package we have B := 216 = 65536. The value of B will be our companion from
here on and will appear continually in what follows. The constant CLINTMAXDIGIT
represents the maximal number of digits of a CLINT object.
Zero is represented by the length l = 0. The value n of a number that is
represented by a FLINT/C variable n_l is calculated as
⎧
⎪n_l[0]
⎪i−1
⎨
n_l[i]B, if n_l[0] > 0,
n = i=1
⎪
⎪
⎩0,otherwise.
If n > 0, then the least-significant digit of n to the base B is given by n_l[1],
and the most-significant digit by n_l[n_l[0]]. The number of digits of n_l[0]
will be read in what follows by the macro DIGITS_L (n_l) and set to l by the
macro SETDIGITS_L (n_l, l). Likewise, access to the least-significant and most-
significant digits of n_l will be conveyed by LSDPTR_L(n_l) and MSDPTR_L(n_l),
each of which returns a pointer to the digit in question. The use of the macros
defined in flint.h yields independence from the actual representation of the
number.
Since we have no need of a sign for natural numbers, we now have all
the required elements for the representation of such numbers. We define the
corresponding data type by
typedef unsigned short clint;
typedef clint CLINT[CLINTMAXDIGIT + 1];
In accordance with this, a large number will be declared by
CLINT n_l;
The declaration of function parameters of type CLINT can follow from the
instruction CLINT n_l in the function header.2 The definition of a pointer myptr_l
to a CLINT object occurs via CLINTPTR myptr_l or clint *myptr_l.
FLINT/C functions can, depending on the setting of the constant
CLINTMAXDIGIT in flint.h, process numbers up to 4096 bits in length, which
corresponds to 1233 decimal digits or 256 digits to the base 216 . By changing
CLINTMAXDIGIT the maximal length can be adjusted as required. The definitions of
other constants depend on this parameter; for example, the number of USHORTs in
a CLINT object is specified by
#define CLINTMAXSHORT CLINTMAXDIGIT + 1
and the maximal number of processable binary digits is defined by
#define CLINTMAXBIT CLINTMAXDIGIT << 4
Since the constants CLINTMAXDIGIT and CLINTMAXBIT are used frequently, yet
are rather unwieldy from a typographical point of view, we shall denote these
constants by abbreviations MAXB and MAX2 (with the exception of program
code, where the constants will appear in their normal form).
With this definition it follows that CLINT objects can assume whole-number
values in the interval 0, B MAXB − 1 , respectively 0, 2MAX2 − 1 . We denote
the value B MAXB − 1 = 2MAX2 − 1, the largest natural number that can be
represented by a CLINT object, by Nmax .
2
In this regard compare Chapters 4 and 9 of the extremely readable book [Lind], where there
is an extensive explanation of when vectors and pointers in C are equivalent, and above all,
when this is not the case and what types of errors can arise from a misunderstanding of these
issues.
Number Formats: The Representation of Large Numbers in C
For some functions it is necessary to process numbers that have more digits
than can be accommodated by a CLINT object. For these cases the variants of the
CLINT type are defined by
typedef unsigned short CLINTD[1+(CLINTMAXDIGIT<<1)];
and
typedef unsigned short CLINTQ[1+(CLINTMAXDIGIT<<2)];
which can hold double, respectively four times, the number of digits.
As support personnel to aid in programming, the module flint.c defines
the constants nul_l, one_l, and two_l, which represent the numbers 0, 1, and 2
in CLINT format; and in flint.h there are corresponding macros SETZERO_L(),
SETONE_L(), and SETTWO_L(), which set CLINT objects to the corresponding values.
Categories:
Subscribe to:
Post Comments (Atom)
0 Response for the "Number Formats: The Representation of Large Numbers in C"
Post a Comment