Fast Exact Bigint Factorial

Fast exact bigint factorial

I have a solution:

(4N!)=((2N!)^2) . mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]

sub-terms of T2 are always prime^exponent where exponent can be computed on small integers like this:

for (e=0,j=N4;j;e+=j&1,j/=p);

where e is exponent, p is prime and N4 is 4*N

Code for the new equation:

// edit beg:
// Sorry, forget to copy sorted list of all primes up to max n here it is
// end of table is marked with 0
// Primes are in DWORDs so they only 4Byte per number
// so the table is very small compared with lookup table for the same max n!
// and also primes are needed for many other routines in bignum
// can compute n! for n <= max prime in table
DWORD _arithmetics_primes[]={2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,0};
// edit end.

longnum fact(const DWORD &x)
{
if (x<=4)
{
if (x==4) return 24;
if (x==3) return 6;
if (x==2) return 2;
if (x==1) return 1;
if (x==0) return 1;
}
int N4,N2,p,i,j,e; longnum c,pp;
N4=(x>>2)<<2;
N2=N4>>1;
c=fact(N2); c*=c; // c=((2N)!)^2;
for (i=0;;i++) // c*= T2
{
p=_arithmetics_primes[i];
if (!p) break;
if (p>N4) break;
for (e=0,j=N4;j;e+=j&1,j/=p);
if (e) // c*=p^e
{
if (p==2) c<<=e;
else for (pp=p;;)
{
if (int(e&1)) c*=pp;
e>>=1; if (!e) break;
pp*=pp;
}
}
}
for (i=N4+1;i<=x;i++) { c*=i; } c.round();
return c;
}

Here are rough time measurements for the first 128 factorials so you can estimate real complexity.

Fixed point 768.128 bits arithmetics ... 231.36 decimals.

[ 0.001 ms ] 1! = 1
[ 0.000 ms ] 2! = 2
[ 0.000 ms ] 3! = 6
[ 0.000 ms ] 4! = 24
[ 0.006 ms ] 5! = 120
[ 0.006 ms ] 6! = 720
[ 0.007 ms ] 7! = 5040
[ 0.005 ms ] 8! = 40320
[ 0.006 ms ] 9! = 362880
[ 0.007 ms ] 10! = 3628800
[ 0.008 ms ] 11! = 39916800
[ 0.012 ms ] 12! = 479001600
[ 0.013 ms ] 13! = 6227020800
[ 0.014 ms ] 14! = 87178291200
[ 0.016 ms ] 15! = 1307674368000
[ 0.014 ms ] 16! = 20922789888000
[ 0.015 ms ] 17! = 355687428096000
[ 0.017 ms ] 18! = 6402373705728000
[ 0.019 ms ] 19! = 121645100408832000
[ 0.016 ms ] 20! = 2432902008176640000
[ 0.017 ms ] 21! = 51090942171709440000
[ 0.019 ms ] 22! = 1124000727777607680000
[ 0.021 ms ] 23! = 25852016738884976640000
[ 0.023 ms ] 24! = 620448401733239439360000
[ 0.025 ms ] 25! = 15511210043330985984000000
[ 0.027 ms ] 26! = 403291461126605635584000000
[ 0.029 ms ] 27! = 10888869450418352160768000000
[ 0.032 ms ] 28! = 304888344611713860501504000000
[ 0.034 ms ] 29! = 8841761993739701954543616000000
[ 0.037 ms ] 30! = 265252859812191058636308480000000
[ 0.039 ms ] 31! = 8222838654177922817725562880000000
[ 0.034 ms ] 32! = 263130836933693530167218012160000000
[ 0.037 ms ] 33! = 8683317618811886495518194401280000000
[ 0.039 ms ] 34! = 295232799039604140847618609643520000000
[ 0.041 ms ] 35! = 10333147966386144929666651337523200000000
[ 0.039 ms ] 36! = 371993326789901217467999448150835200000000
[ 0.041 ms ] 37! = 13763753091226345046315979581580902400000000
[ 0.044 ms ] 38! = 523022617466601111760007224100074291200000000
[ 0.046 ms ] 39! = 20397882081197443358640281739902897356800000000
[ 0.041 ms ] 40! = 815915283247897734345611269596115894272000000000
[ 0.044 ms ] 41! = 33452526613163807108170062053440751665152000000000
[ 0.046 ms ] 42! = 1405006117752879898543142606244511569936384000000000
[ 0.049 ms ] 43! = 60415263063373835637355132068513997507264512000000000
[ 0.048 ms ] 44! = 2658271574788448768043625811014615890319638528000000000
[ 0.050 ms ] 45! = 119622220865480194561963161495657715064383733760000000000
[ 0.054 ms ] 46! = 5502622159812088949850305428800254892961651752960000000000
[ 0.056 ms ] 47! = 258623241511168180642964355153611979969197632389120000000000
[ 0.056 ms ] 48! = 12413915592536072670862289047373375038521486354677760000000000
[ 0.060 ms ] 49! = 608281864034267560872252163321295376887552831379210240000000000
[ 0.063 ms ] 50! = 30414093201713378043612608166064768844377641568960512000000000000
[ 0.066 ms ] 51! = 1551118753287382280224243016469303211063259720016986112000000000000
[ 0.065 ms ] 52! = 80658175170943878571660636856403766975289505440883277824000000000000
[ 0.069 ms ] 53! = 4274883284060025564298013753389399649690343788366813724672000000000000
[ 0.072 ms ] 54! = 230843697339241380472092742683027581083278564571807941132288000000000000
[ 0.076 ms ] 55! = 12696403353658275925965100847566516959580321051449436762275840000000000000
[ 0.077 ms ] 56! = 710998587804863451854045647463724949736497978881168458687447040000000000000
[ 0.162 ms ] 57! = 40526919504877216755680601905432322134980384796226602145184481280000000000000
[ 0.095 ms ] 58! = 2350561331282878571829474910515074683828862318181142924420699914240000000000000
[ 0.093 ms ] 59! = 138683118545689835737939019720389406345902876772687432540821294940160000000000000
[ 0.089 ms ] 60! = 8320987112741390144276341183223364380754172606361245952449277696409600000000000000
[ 0.093 ms ] 61! = 507580213877224798800856812176625227226004528988036003099405939480985600000000000000
[ 0.098 ms ] 62! = 31469973260387937525653122354950764088012280797258232192163168247821107200000000000000
[ 0.096 ms ] 63! = 1982608315404440064116146708361898137544773690227268628106279599612729753600000000000000
[ 0.090 ms ] 64! = 126886932185884164103433389335161480802865516174545192198801894375214704230400000000000000
[ 0.100 ms ] 65! = 8247650592082470666723170306785496252186258551345437492922123134388955774976000000000000000
[ 0.104 ms ] 66! = 544344939077443064003729240247842752644293064388798874532860126869671081148416000000000000000
[ 0.111 ms ] 67! = 36471110918188685288249859096605464427167635314049524593701628500267962436943872000000000000000
[ 0.100 ms ] 68! = 2480035542436830599600990418569171581047399201355367672371710738018221445712183296000000000000000
[ 0.121 ms ] 69! = 171122452428141311372468338881272839092270544893520369393648040923257279754140647424000000000000000
[ 0.109 ms ] 70! = 11978571669969891796072783721689098736458938142546425857555362864628009582789845319680000000000000000
[ 0.119 ms ] 71! = 850478588567862317521167644239926010288584608120796235886430763388588680378079017697280000000000000000
[ 0.104 ms ] 72! = 61234458376886086861524070385274672740778091784697328983823014963978384987221689274204160000000000000000
[ 0.124 ms ] 73! = 4470115461512684340891257138125051110076800700282905015819080092370422104067183317016903680000000000000000
[ 0.113 ms ] 74! = 330788544151938641225953028221253782145683251820934971170611926835411235700971565459250872320000000000000000
[ 0.118 ms ] 75! = 24809140811395398091946477116594033660926243886570122837795894512655842677572867409443815424000000000000000000
[ 0.118 ms ] 76! = 1885494701666050254987932260861146558230394535379329335672487982961844043495537923117729972224000000000000000000
[ 0.123 ms ] 77! = 145183092028285869634070784086308284983740379224208358846781574688061991349156420080065207861248000000000000000000
[ 0.129 ms ] 78! = 11324281178206297831457521158732046228731749579488251990048962825668835325234200766245086213177344000000000000000000
[ 0.133 ms ] 79! = 894618213078297528685144171539831652069808216779571907213868063227837990693501860533361810841010176000000000000000000
[ 0.121 ms ] 80! = 71569457046263802294811533723186532165584657342365752577109445058227039255480148842668944867280814080000000000000000000
[ 0.119 ms ] 81! = 5797126020747367985879734231578109105412357244731625958745865049716390179693892056256184534249745940480000000000000000000
[ 0.131 ms ] 82! = 475364333701284174842138206989404946643813294067993328617160934076743994734899148613007131808479167119360000000000000000000
[ 0.150 ms ] 83! = 39455239697206586511897471180120610571436503407643446275224357528369751562996629334879591940103770870906880000000000000000000
[ 0.141 ms ] 84! = 3314240134565353266999387579130131288000666286242049487118846032383059131291716864129885722968716753156177920000000000000000000
[ 0.148 ms ] 85! = 281710411438055027694947944226061159480056634330574206405101912752560026159795933451040286452340924018275123200000000000000000000
[ 0.154 ms ] 86! = 24227095383672732381765523203441259715284870552429381750838764496720162249742450276789464634901319465571660595200000000000000000000
[ 0.163 ms ] 87! = 2107757298379527717213600518699389595229783738061356212322972511214654115727593174080683423236414793504734471782400000000000000000000
[ 0.211 ms ] 88! = 185482642257398439114796845645546284380220968949399346684421580986889562184028199319100141244804501828416633516851200000000000000000000
[ 0.151 ms ] 89! = 16507955160908461081216919262453619309839666236496541854913520707833171034378509739399912570787600662729080382999756800000000000000000000
[ 0.157 ms ] 90! = 1485715964481761497309522733620825737885569961284688766942216863704985393094065876545992131370884059645617234469978112000000000000000000000
[ 0.166 ms ] 91! = 135200152767840296255166568759495142147586866476906677791741734597153670771559994765685283954750449427751168336768008192000000000000000000000
[ 0.161 ms ] 92! = 12438414054641307255475324325873553077577991715875414356840239582938137710983519518443046123837041347353107486982656753664000000000000000000000
[ 0.169 ms ] 93! = 1156772507081641574759205162306240436214753229576413535186142281213246807121467315215203289516844845303838996289387078090752000000000000000000000
[ 0.173 ms ] 94! = 108736615665674308027365285256786601004186803580182872307497374434045199869417927630229109214583415458560865651202385340530688000000000000000000000
[ 0.188 ms ] 95! = 10329978488239059262599702099394727095397746340117372869212250571234293987594703124871765375385424468563282236864226607350415360000000000000000000000
[ 0.181 ms ] 96! = 991677934870949689209571401541893801158183648651267795444376054838492222809091499987689476037000748982075094738965754305639874560000000000000000000000
[ 0.187 ms ] 97! = 96192759682482119853328425949563698712343813919172976158104477319333745612481875498805879175589072651261284189679678167647067832320000000000000000000000
[ 0.194 ms ] 98! = 9426890448883247745626185743057242473809693764078951663494238777294707070023223798882976159207729119823605850588608460429412647567360000000000000000000000
[ 0.201 ms ] 99! = 933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582511852109168640000000000000000000000
[ 0.185 ms ] 100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
[ 0.191 ms ] 101! = 9425947759838359420851623124482936749562312794702543768327889353416977599316221476503087861591808346911623490003549599583369706302603264000000000000000000000000
[ 0.202 ms ] 102! = 961446671503512660926865558697259548455355905059659464369444714048531715130254590603314961882364451384985595980362059157503710042865532928000000000000000000000000
[ 0.207 ms ] 103! = 99029007164861804075467152545817733490901658221144924830052805546998766658416222832141441073883538492653516385977292093222882134415149891584000000000000000000000000
[ 0.242 ms ] 104! = 10299016745145627623848583864765044283053772454999072182325491776887871732475287174542709871683888003235965704141638377695179741979175588724736000000000000000000000000
[ 0.210 ms ] 105! = 1081396758240290900504101305800329649720646107774902579144176636573226531909905153326984536526808240339776398934872029657993872907813436816097280000000000000000000000000
[ 0.215 ms ] 106! = 114628056373470835453434738414834942870388487424139673389282723476762012382449946252660360871841673476016298287096435143747350528228224302506311680000000000000000000000000
[ 0.221 ms ] 107! = 12265202031961379393517517010387338887131568154382945052653251412013535324922144249034658613287059061933743916719318560380966506520420000368175349760000000000000000000000000
[ 0.217 ms ] 108! = 1324641819451828974499891837121832599810209360673358065686551152497461815091591578895743130235002378688844343005686404521144382704205360039762937774080000000000000000000000000
[ 0.226 ms ] 109! = 144385958320249358220488210246279753379312820313396029159834075622223337844983482099636001195615259277084033387619818092804737714758384244334160217374720000000000000000000000000
[ 0.232 ms ] 110! = 15882455415227429404253703127090772871724410234473563207581748318444567162948183030959960131517678520479243672638179990208521148623422266876757623911219200000000000000000000000000
[ 0.240 ms ] 111! = 1762952551090244663872161047107075788761409536026565516041574063347346955087248316436555574598462315773196047662837978913145847497199871623320096254145331200000000000000000000000000
[ 0.213 ms ] 112! = 197450685722107402353682037275992488341277868034975337796656295094902858969771811440894224355027779366597957338237853638272334919686385621811850780464277094400000000000000000000000000
[ 0.231 ms ] 113! = 22311927486598136465966070212187151182564399087952213171022161345724023063584214692821047352118139068425569179220877461124773845924561575264739138192463311667200000000000000000000000000
[ 0.240 ms ] 114! = 2543559733472187557120132004189335234812341496026552301496526393412538629248600474981599398141467853800514886431180030568224218435400019580180261753940817530060800000000000000000000000000
[ 0.252 ms ] 115! = 292509369349301569068815180481773552003419272043053514672100535242441942363589054622883930786268803187059211939585703515345785120071002251720730101703194015956992000000000000000000000000000
[ 0.248 ms ] 116! = 33931086844518982011982560935885732032396635556994207701963662088123265314176330336254535971207181169698868584991941607780111073928236261199604691797570505851011072000000000000000000000000000
[ 0.598 ms ] 117! = 3969937160808720895401959629498630647790406360168322301129748464310422041758630649341780708631240196854767624444057168110272995649603642560353748940315749184568295424000000000000000000000000000
[ 0.259 ms ] 118! = 468452584975429065657431236280838416439267950499862031533310318788629800927518416622330123618486343228862579684398745837012213486653229822121742374957258403779058860032000000000000000000000000000
[ 0.261 ms ] 119! = 55745857612076058813234317117419771556272886109483581752463927935846946310374691578057284710599874844234646982443450754604453404911734348832487342619913750049708004343808000000000000000000000000000
[ 0.254 ms ] 120! = 6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000
[ 0.263 ms ] 121! = 809429852527344373968162284544935082997082306309701607045776233628497660426640521713391773997910182738287074185078904956856663439318382745047716214841147650721760223072092160000000000000000000000000000
[ 0.270 ms ] 122! = 98750442008336013624115798714482080125644041369783596059584700502676714572050143649033796427745042294071023050579626404736512939596842694895821378210620013388054747214795243520000000000000000000000000000
[ 0.281 ms ] 123! = 12146304367025329675766243241881295855454217088483382315328918161829235892362167668831156960612640202170735835221294047782591091570411651472186029519906261646730733907419814952960000000000000000000000000000
[ 0.290 ms ] 124! = 1506141741511140879795014161993280686076322918971939407100785852066825250652908790935063463115967385069171243567440461925041295354731044782551067660468376444194611004520057054167040000000000000000000000000000
[ 0.322 ms ] 125! = 188267717688892609974376770249160085759540364871492425887598231508353156331613598866882932889495923133646405445930057740630161919341380597818883457558547055524326375565007131770880000000000000000000000000000000
[ 0.303 ms ] 126! = 23721732428800468856771473051394170805702085973808045661837377170052497697783313457227249544076486314839447086187187275319400401837013955325179315652376928996065123321190898603130880000000000000000000000000000000
[ 0.313 ms ] 127! = 3012660018457659544809977077527059692324164918673621799053346900596667207618480809067860692097713761984609779945772783965563851033300772326297773087851869982500270661791244122597621760000000000000000000000000000000
[ 0.307 ms ] 128! = 385620482362580421735677065923463640617493109590223590278828403276373402575165543560686168588507361534030051833058916347592172932262498857766114955245039357760034644709279247692495585280000000000000000000000000000000
refernce 128! = 385620482362580421735677065923463640617493109590223590278828403276373402575165543560686168588507361534030051833058916347592172932262498857766114955245039357760034644709279247692495585280000000000000000000000000000000

My measurements reveal that N! uses

  • max of 2.2N fast low level long operations (+,-,<<,>>)
  • slightly less than N/2 long multiplications, but most of them are convenient in size which speeds up the multiplication, so the measured times do not match the obvious O(N/2*O(multiplication)).
  • After using analysis on the times the observed complexity best matches O(N^1.4) using Karatsuba as results are still way below NTT based multiplication, after that the complexity should be even better.

Also I have tried factorial as non recursive multiplication of primes only (similar to T2 term), but the results was much slower.

P.S.: Code posted in the question is also 100% working, but slower than new one (even if it uses fewer multiplications - because of more memory needed for recursion and not optimized multiplicants order).

Find factorial n modulo m faster than O(n)

You can easily have O(m) time complexity in the worst case (when m is a prime) and it seems to be good enough since you have m <= 1e6 (while n can be up to 1e18). Note, that when n >= m

 n! = 1 * 2 * ... * m * ... * n
^
factorial is divisible by m

and that's why

 n! % m == 0       # whenever n >= m

Another implementation detail is that you don't have to compute n! % m as 1 * 2 * ... * n % m but you can do it as ((..(1 % m) * 2 % m) ... * n % m) in order not to deal with huge numbers.

C# code example

private static int Compute(long n, long m) {
if (n >= m)
return 0;

long result = 1;

// result != 0 - we can well get 0 and stop looping when m is not prime
for (long d = 2; d <= n && result != 0; ++d)
result = (result * d) % m;

return result;
}

Fast algorithms for computing the factorial

Check out this paper (PDF link) by Richard Fateman. The code samples are in Lisp, in but in any event, much of the secret boils down to minimizing the number of bignum (arbitrary precision integer) calculations you have to do.

Naturally, if you don't need/have bignums, it's trivial; either a lookup table or a simple loop will be fine.

EDIT: If you can use an approximate answer, you can either compute the logarithm of the factorial directly by summing log(k) for k = 2 ... n, or by using the venerable Stirling approximation. You want to work with the logarithm wherever possible to avoid overflow; in particular, a naive application of Stirling's approximation will overflow in a lot of places where it doesn't have to.

Factorial of integer mod m fast calculation

There are few fast algorithms for Factorial out there

  • so the answer is: Yes you can compute factorial without looping through all values
  • all I saw uses primes decompositions (including mine algorithm)
  • so from that it is just matter of usein mod multiplication instead of normal multiplication
  • look here: Fast exact bigint factorial is mine fast algorithm
  • and the other answer also contains link to swinging primes algorithm ...

[Notes]

  • for N! you will need a list of primes up to N
  • but the rest of code can work on arithmetics capable of holding N,m
  • so no need for huge numbers ...

[edit1] mine 32bit C++ implementations

//---------------------------------------------------------------------------
DWORD modmul(DWORD a,DWORD b,DWORD n)
{
DWORD _a,_b,_n;
_a=a;
_b=b;
_n=n;
asm {
mov eax,_a
mov ebx,_b
mul ebx // H(edx),L(eax) = eax * ebx
mov ebx,_n
div ebx // eax = H(edx),L(eax) / ebx
mov _a,edx // edx = H(edx),L(eax) % ebx
}
return _a;
}
//---------------------------------------------------------------------------
DWORD modfact0(DWORD n,DWORD m) // (n!) mod m (naive approach)
{
DWORD i,f;
for (f=1,i=2;i<=n;i++) f=modmul(f,i,m);
return f;
}
//---------------------------------------------------------------------------
DWORD modfact1(DWORD n,DWORD m) // (n!) mod m (mine fast approach)
{
if (n<=4)
{
if (n==4) return 24;
if (n==3) return 6;
if (n==2) return 2;
if (n==1) return 1;
if (n==0) return 1;
}
int N4,N2,p,i,j,e; DWORD c,pp;
N4=(n>>2)<<2;
N2=N4>>1;
c=modfact1(N2,m); c=modmul(c,c,m); // c=((2N)!)^2;
for (i=0;;i++) // c*= T2
{
p=primes_i32.dat[i];
if (!p) break;
if (p>N4) break;
for (e=0,j=N4;j;e+=j&1,j/=p);
if (e) // c*=p^e
{
if (p==2) c<<=e;
else for (pp=p;;)
{
if (int(e&1)) c=modmul(c,pp,m);
e>>=1; if (!e) break;
pp=modmul(pp,pp,m);
}
}
}
for (i=N4+1;i<=n;i++) c=modmul(c,i,m);
return c;
}
//---------------------------------------------------------------------------

primes:

  • DWORD primes_i32.dat[] is precomputed sorted (ascending) list of all primes up to n

Here the result:

[  18.529 ms] slow modfact0(1000000,1299721) = 195641
[ 2.995 ms] fast modfact1(1000000,1299721) = 195641
[ 96.242 ms] slow modfact0(5000000,9999991) = 2812527
[ 13.305 ms] fast modfact1(5000000,9999991) = 2812527
  • 1299721 is first prime close to 1000000 I found
  • if m is not prime and subresult hits zero then you can ignore the rest of multiplication to massive speed up...

Hope the result is OK have nothing to compare with ...

Finding fast every (2k) factorial

This is not an answer, but rather an extended comment.

If your goal is to compute factorials, you cannot get seriously faster. However, I assume that this question is asked in the context of your previous question, regarding summation of a certain series,

∑(2k+1)/(2k)! , k =0,... ,∞

If I am correct, the answer is to not compute factorials at all. Use a Horner schedule instead. Your sum can be expressed as

1 + 1/(1*2)(3 + 1/(3*4)(5 + 1/(5*6)(7 + 1/(7*8)(9 + ...)))...)))

See how factorials disappear. Now fix a number of terms you want to add, and work this expression inside out.

Finding a number of terms to achieve a desired precision is a totally different topic. As I mentioned in the comment once, a Taylor theorem is very helpful.

What is the fastest factorial function in JavaScript?

You can search for (1...100)! on Wolfram|Alpha to pre-calculate the factorial sequence.

The first 100 numbers are:



Related Topics



Leave a reply



Submit