ONE OF THE FIRST STEPS in creating a function library for calculating with large

numbers is to determine how large numbers are to represented in the computer’s

main memory. It is necessary to plan carefully, since decisions made at this point

will be difﬁcult to revise at a later time. Changes to the internal structure of a

software library are always possible, but the user interface should be kept as

stable as possible in the sense of “upward compatibility.”

It is necessary to determine the order of magnitude of the numbers to be

processed and the data type to be used for coding these numerical values.

The basic function of all routines in the FLINT/C library is the processing

of natural numbers of several hundred digits, which far exceeds the capacity of

standard data types. We thus require a logical ordering of a computer’s memory

units by means of which large numbers can be expressed and operated on. In

this regard one might imagine structures that automatically create sufﬁcient

space for the values to be represented, but no more than is actually needed. One

would like to maintain such economically organized housekeeping with respect

to main memory by means of dynamic memory management for large numbers

that allocates or releases memory according to need in the course of arithmetic

operations. Although such can certainly be realized (see, for example, [Skal]),

memory management has a price in computation time, for which reason the

representation of integers in the FLINT/C package gives preference to the simpler

deﬁnition of static length.

For representing large natural numbers one might use vectors whose

elements are a standard data type. For reasons of efﬁciency an unsigned data

type is to be preferred, which allows the results of arithmetic operations to be

stored in this type without loss as unsigned long (deﬁned in flint.h as ULONG),

which is the largest arithmetic standard C data type (see [Harb], Section 5.1.1). A

ULONG variable can usually be represented exactly with a complete register word of

the CPU.

Our goal is that operations on large numbers be reducible by the compiler as

directly as possible to the register arithmetic of the CPU, for those are the parts

that the computer calculates “in its head,” so to speak. For the FLINT/C package

the representation of large integers is therefore by means of the type unsigned

short int (in the sequel USHORT). We assume that the type USHORT is represented

by 16 bits and that the type ULONG can fully accept results of arithmetic operations

with USHORT types, which is to say that the informally formulated size relationship

USHORT × USHORT ≤ ULONG holds.

Whether these assumptions hold for a particular compiler can be deduced

from the ISO header ﬁle limits.h (cf. [Harb], Sections 2.7.1 and 5.1). For example,

in the ﬁle limits.h for the GNU C/C++ compiler (cf. [Stlm]) the following appears

#define UCHAR_MAX 0xffU

#define USHRT_MAX 0xffffU

#define UINT_MAX 0xffffffffU

#define ULONG_MAX 0xffffffffUL

One should note that with respect to the number of binary places there

are actually only three sizes that are distinguished. The type USHRT (respectively

USHORT in our notation) can be represented in a 16-bit register; the type ULONG ﬁlls

the word length of a CPU with 32-bit registers. The type ULONG_MAX determines

the value of the largest unsigned whole numbers representable by scalar types

(cf. [Harb], page 110).1 The value of the product of two numbers of type USHRT

is at most 0xffff * 0xffff = 0xfffe0001 and is thus representable by a ULONG

type, where the least-signiﬁcant 16 bits, in our example the value 0x0001, can be

isolated by a cast operation into the type USHRT. The implementation of the basic

arithmetic functions of the FLINT/C package is based on the above-discussed

size relationship between the types USHORT and ULONG.

An analogous approach, one that used data types with 32-bit and 64-bit

lengths in the role of USHORT and ULONG in the present implementation, would

reduce the calculation time for multiplication, division, and exponentiation

1

Without taking into account such practical nonstandard types as unsigned long long in GNU

C/C++ and certain other C compilers.

14

Number Formats: The Representation of Large Numbers in C

by about 25 percent. Such possibilities are realizable with functions written

in assembler with direct access to 64-bit results of machine instructions for

multiplication and division or with processors with 64-bit registers that would

also allow to C implementations the lossless storage of such results in a ULONG

type. The FLINT/C package contains some examples whose gain in speed results

from the use of arithmetic assembler functions (see Chapter 19).

The next question is that of the ordering of the USHORT digits within a vector.

We can imagine two possibilities: from left to right, with a descending evaluation

of digits from lower to higher memory addresses, or the other way round, with

an ascending evaluation of digits from lower to higher memory addresses. The

latter arrangement, which is the reverse of our usual notation, has the advantage

that changes in the size of numbers at constant addresses can take place with the

simple allocation of additional digits, without the numbers having to be relocated

in memory. Thus the choice is clear: The evaluation of digits of our numerical

representation increases with increasing memory address or vector index.

As a further element of the representation the number of digits will be

appended and stored in the ﬁrst element of the vector. The representation of long

numbers in memory thus has the format

n = (ln1 n2 . . . nl )B ,

0 ≤ l ≤ CLINTMAXDIGIT,

0 ≤ ni < B, i = 1, . . . , l,
where B denotes the base of the numerical representation; for the FLINT/C
package we have B := 216 = 65536. The value of B will be our companion from
here on and will appear continually in what follows. The constant CLINTMAXDIGIT
represents the maximal number of digits of a CLINT object.
Zero is represented by the length l = 0. The value n of a number that is
represented by a FLINT/C variable n_l is calculated as
⎧
⎪n_l[0]
⎪i−1
⎨
n_l[i]B, if n_l[0] > 0,

n = i=1

⎪

⎪

⎩0,otherwise.

If n > 0, then the least-signiﬁcant digit of n to the base B is given by n_l[1],

and the most-signiﬁcant digit by n_l[n_l[0]]. The number of digits of n_l[0]

will be read in what follows by the macro DIGITS_L (n_l) and set to l by the

macro SETDIGITS_L (n_l, l). Likewise, access to the least-signiﬁcant and most-

signiﬁcant digits of n_l will be conveyed by LSDPTR_L(n_l) and MSDPTR_L(n_l),

each of which returns a pointer to the digit in question. The use of the macros

deﬁned in flint.h yields independence from the actual representation of the

number.

Since we have no need of a sign for natural numbers, we now have all

the required elements for the representation of such numbers. We deﬁne the

corresponding data type by

typedef unsigned short clint;

typedef clint CLINT[CLINTMAXDIGIT + 1];

In accordance with this, a large number will be declared by

CLINT n_l;

The declaration of function parameters of type CLINT can follow from the

instruction CLINT n_l in the function header.2 The deﬁnition of a pointer myptr_l

to a CLINT object occurs via CLINTPTR myptr_l or clint *myptr_l.

FLINT/C functions can, depending on the setting of the constant

CLINTMAXDIGIT in flint.h, process numbers up to 4096 bits in length, which

corresponds to 1233 decimal digits or 256 digits to the base 216 . By changing

CLINTMAXDIGIT the maximal length can be adjusted as required. The deﬁnitions of

other constants depend on this parameter; for example, the number of USHORTs in

a CLINT object is speciﬁed by

#define CLINTMAXSHORT CLINTMAXDIGIT + 1

and the maximal number of processable binary digits is deﬁned by

#define CLINTMAXBIT CLINTMAXDIGIT << 4

Since the constants CLINTMAXDIGIT and CLINTMAXBIT are used frequently, yet

are rather unwieldy from a typographical point of view, we shall denote these

constants by abbreviations MAXB and MAX2 (with the exception of program

code, where the constants will appear in their normal form).

With this deﬁnition it follows that CLINT objects can assume whole-number

values in the interval 0, B MAXB − 1 , respectively 0, 2MAX2 − 1 . We denote

the value B MAXB − 1 = 2MAX2 − 1, the largest natural number that can be

represented by a CLINT object, by Nmax .

2

In this regard compare Chapters 4 and 9 of the extremely readable book [Lind], where there

is an extensive explanation of when vectors and pointers in C are equivalent, and above all,

when this is not the case and what types of errors can arise from a misunderstanding of these

issues.

Number Formats: The Representation of Large Numbers in C

For some functions it is necessary to process numbers that have more digits

than can be accommodated by a CLINT object. For these cases the variants of the

CLINT type are deﬁned by

typedef unsigned short CLINTD[1+(CLINTMAXDIGIT<<1)];

and

typedef unsigned short CLINTQ[1+(CLINTMAXDIGIT<<2)];

which can hold double, respectively four times, the number of digits.

As support personnel to aid in programming, the module flint.c deﬁnes

the constants nul_l, one_l, and two_l, which represent the numbers 0, 1, and 2

in CLINT format; and in flint.h there are corresponding macros SETZERO_L(),

SETONE_L(), and SETTWO_L(), which set CLINT objects to the corresponding values.

Categories:

Reaksi: |

Subscribe to:
Post Comments (Atom)

## 0 Response for the "Number Formats: The Representation of Large Numbers in C"

## Post a Comment