## Small (till 2^64) prime numbers in C++

This is a small package for working with small (till 264, not enough for cryptography) prime numbers. They are organized as a static members of the class z64 (there are no dynamic members in this class).

The package consists of two files:
z64.cpp - realization.

Main methods:

typedef unsigned long long UINT64;
typedef          long long  INT64;

class z64 {
...
public:
static bool    isFullSq (UINT64 n);                    // n is full square?
static bool    isDiv357_11_13(UINT64 n);               // n is div by 3,5,7,11,13?
static UINT64  gcd      (UINT64 a, UINT64 b);          // GCD(a,b)
static UINT64  mul_mod  (UINT64 a, UINT64 b, UINT64 ń);// (a*b)%c
static UINT64  pow_mod  (UINT64 a, UINT64 b, UINT64 c);// (a^b)%c
static bool  MillerRabin(UINT64 a, UINT64 n);          // MR primality test for n with base a
static void    euclid   (UINT64 a, UINT64 b, INT64 &x, INT64 &y, UINT64 &d); // find x,y,d: a*x+b*y=d=gcd(a,b)
static UINT64  inverse  (UINT64 a, UINT64 n);          // 1/a mod n
static bool    chineese (UINT64 a1, UINT64 b1, UINT64 a2, UINT64 b2, UINT64 &a3, UINT64 &b3); // cheneese theorem

// sieve of Eratosthenes
static void   Efill(UINT64 mN);      // fill sieve for primes till mN
static void   Efree();               // free memory
static int    Esave(char *fname);    // return error code
static int    Eload(char *fname);    // return error code
//
static UINT64 maxN();                // return maximal N in sieve
static UINT64 nPrimes();             // total number of primes in the sieve (including 2)
static UINT64 maxP();                // return maximal prime in the sieve

// Primes
static bool   isPrime( UINT64 n);    // is prime(n)?
static UINT64 p( UINT64 i);          // i-th prime, Maple: ithprime(i), p(0)=0, p(1)=2, p(2)=3,...
static UINT64 Pi( UINT64 n);         // the number of primes <= n, Pi(0)=Pi(1)=0, Pi(2)=1, Pi(3)=Pi(4)=2,...
static UINT64 NextPrime( UINT64 n);  // prime p>n, NextPrime(3)=5, NextPrieme(4)=5, NextPrime(0)=NextPrime(1)=2, NextPrime(2)=3
static UINT64 PrevPrime( UINT64 n);  // prime p<n, PrevPrime(0)=PrevPrime(1)=PrevPrime(2)=0, PrevPrime(4)=3
//
static UINT64 primeFactor( UINT64 n);// prime divisor or 1, if not found
static UINT64 order( UINT64 a, UINT64 n);// order of a by modulo prime n
static int    jacobi(UINT64 a, UINT64 n);// Jacobi symbol
static UINT64 msqrt( UINT64 a, UINT64 p);// sqrt(a) mod p, p must be prime!
static char * factorString(UINT64 n);    // string with factorisation: 180200 -> "2^3*5^2*17*53"

static void    toString(char *s, UINT64 n, int w=0); // s <- to String of width w
static char   *toString(UINT64 n, int w=0);          // return to String of width w
};

Configure the package to different processors and compilers is reduced to an adjustment line in the header file:
typedef unsigned long long UINT64;
typedef          long long  INT64;

It should indicate the names of the type 64-bit integers.

Most of the functions are obvious and needs no comments. So let's just to comment a few.

isDiv357_11_13:

  static bool isDiv357_11_13(UINT64 n); // n is div by 3,5,7,11,13?

- test of divisibility by 3,5,7,11 and 13 by one division. Find n%(3*5*7*11*13=15015) and, using the table of 1877 bytes, check whether the n is coprime with 15015.

Only odd numbers are stored in a sieve, one bit for each. Therefore, N bytes is sufficient for prime numbers up to 16*N+1. To store all primes up to 1 million requires 63K bytes. 1G bytes enough for primes till 16 billions.

Usign sieve (till maxP()) is easily implemented funtions isPrime, NextPrime, PrevPrime. But we need some additional date for functions p() (ithprime) and Pi(). Simultaneously with the construction of the sieves are remembered in a separate array values of Pi(n) (the number of primes, not greater, than n) for all multipiers of 1024. So, for sieve itself till N we need N/16 bytes and N/256 bytes for this additional table. Using this table, we have fast functions

  static UINT64 p( UINT64 i);  // i-th prime, Maple: ithprime(i), p(0)=0, p(1)=2, p(2)=3,...
static UINT64 Pi( UINT64 n); // the number of primes <= n, Pi(0)=Pi(1)=0, Pi(2)=1, Pi(3)=Pi(4)=2,...

This functions a co-invers: z64::Pi(z64::p(i)) == i.

For very small prime (till ~8000) we use special tables.

isPrime:

  static bool isPrime( UINT64 n); // is prime(n)?

For numbers till z64::maxN() we use seive. If n<232 and not in the sieve, we use Miller–Rabin primality test by bases 2 and 3, and after that check all 103 composite number (unsigned BCompos) less then 232 the method wrong. So, for numbers less 232 this functiuon is exact. For larger numbres we use 12 Miller-Rabin tests by bases 2,3,..,37.

mul_mod
This is a key feature of the package. Its assembler implementation accelerates work more than an order of magnitude. An assembler implementation is verified only for gcc-64/Linux.

UINT64  z64::mul_mod(UINT64 a, UINT64 b, UINT64 c)        // (a*b)%c
{
#if defined(_MSC_VER) || defined( _WIN32 )
UINT64 res=0;
if(a>c) a %= c;
while(b){
if(b&1) {res+=a; if(res>c) res-=c; b--; }
else    { a<<=1; if(a  >c) a  -=c; b>>=1;}
}
return res;
#else
UINT64 d; 				    // to hold the result of a*b mod c
// calculates a*b mod c, stores result in d
asm ("mov %1, %%rax;"         // put a into rax
"mul %2;"                // mul a*b -> rdx:rax
"div %3;"                // (a*b)/c -> quot in rax remainder in rdx
"mov %%rdx, %0;"         // store result in d
:"=r"(d)                 // output
:"r"(a), "r"(b), "r"(c)  // input
:"%rax", "%rdx"          // clobbered registers
);
return d;
#endif
} // z64::mul_mod


primeFactor:

  static bool  primeFactor( UINT64 n); // // prime divisor or 1, if not found

After some research and a large number of experiments in our range (till 264) among the elementary methods the most effective is Pollard's-\rho algorithm, all the rest are not worth to implement. If all the prime factors of a number are great enough (till some billions), the method may take up to 2-3 hundred thousand steps. But other methods - still even more slowly. But, for example, the prime decomposition of p-1 for simple p is easily.

In addition, the method doesn't work on squares of primes, so consider them separately:

  UINT64 sn = (UINT64) sqrt ((double)n);
if( sn*sn==n ) return sn;

This test for n<264 is exact!.

factorString:

  static char *factorString(UINT64 n); // string with factorisation: 180200 -> "2^3*5^2*17*53"

This function uses always the same static memory to form string. That is, this funtions always return one and the same pointer. Memory does not need to allocate or not released. Just remember that when you call this function, it again uses the same memory. If you need to convert to a string many numbers, you can use the function
  static void toString(char *s, UINT64 n, int w=0); // s <- to String of width w

But to allocate sufficient memory, then it will have to take care yourself.

Usage::

#include "z64.h"
...
z64::Efill(1<<24);              // fill the sieve till 2^24
z64::Esave("16M.sieve");        // save the sieve
UINT64 p = z64::p(1000000);     // p = 15485863
UINT64 i = z64::Pi(p);          // i =  1000000
UINT64 p2= z64::p( 700000);     // p = 10570841
UINT64 n = p*p2;                // n = 163 698 595 520 783
bool   r = z64::isPrime(n);     // r = false
UINT64 f = z64::primeFactor(n); // f = 10570841


Basically, the module is designed for educational purposes, and speed was not the main goal. However, for numbers in 64 bit, this functions are faster than the library GMP. For example, the computation time for function pow_mod by processor AMD Athlon II X2 240, 2.8 GHz for 63-bit numbers:
bit depth lang. time, mks
32 C++ 75
64 C++ 31
64 asm 2.7
64 GMP 7 