Factor a Large Number Efficiently with Gmp

Factoring large numbers

Have you tried the quadratic sieve or the general number field sieve? (QS simpler and easier to implement, GNFS faster for very large numbers, but your numbers may not be large enough for GNFS to be much faster than QS)

edit: According to a paper by Eric Landquist, the crossover point between QS and GNFS is about 110 digits = approx 365 bits.

Most efficient way to find all factors with GMPY2 (or GMP)?

I just made some quick changes to your code to eliminate redundant name lookups. The algorithm is still the same but it is about twice as fast on my computer.

import gmpy2
from gmpy2 import mpz

def factors(n):
    result = set()
    n = mpz(n)
    for i in range(1, gmpy2.isqrt(n) + 1):
        div, mod = divmod(n, i)
        if not mod:
            result |= {mpz(i), div}
    return result

print(factors(12345678901234567))

Other suggestions will need more information about the size of the numbers, etc. For example, if you need all the possible factors, it may be faster to construct those from all the prime factors. That approach will let you decrease the limit of the range statement as you proceed and also will let you increment by 2 (after removing all the factors of 2).

Update 1

I've made some additional changes to your code. I don't think your all_multiplies() function is correct. Your range() statement isn't optimal since 2 is check again but my first fix made it worse.

The new code delays computing the co-factor until it knows the remainder is 0. I also tried to use the built-in functions as much as possible. For example, mpz % integer is faster than gmpy2.f_mod(mpz, integer) or gmpy2.f_mod(integer, mpz) where integer is a normal Python integer.

import gmpy2
from gmpy2 import mpz, isqrt

def factors(n):
    n = mpz(n)

    result = set()
    result |= {mpz(1), n}

    def all_multiples(result, n, factor):
        z = n
        f = mpz(factor)
        while z % f == 0:
            result |= {f, z // f}
            f += factor
        return result

    result = all_multiples(result, n, 2)
    result = all_multiples(result, n, 3)

    for i in range(1, isqrt(n) + 1, 6):
        i1 = i + 1
        i2 = i + 5
        if not n % i1:
            result |= {mpz(i1), n // i1}
        if not n % i2:
            result |= {mpz(i2), n // i2}
    return result

print(factors(12345678901234567))

I would change your program to just find all the prime factors less than the square root of n and then construct all the co-factors later. Then you decrease n each time you find a factor, check if n is prime, and only look for more factors if n isn't prime.

Update 2

The pyecm module should be able to factor the size numbers you are trying to factor. The following example completes in about a second.

>>> import pyecm
>>> list(pyecm.factors(12345678901234567890123456789012345678901, False, True, 10, 1))
[mpz(29), mpz(43), mpz(43), mpz(55202177), mpz(2928109491677), mpz(1424415039563189)]

Is there a smart way to use large numbers in C++

GMP is a very high performance library for arbitrary precision integer math in C, and it can also be used in C++. There is a string conversion function which will allow you to accept input from the terminal or other string sources.

https://gmplib.org/

Questions on using GMP for prime factorization already exist on Stack Overflow too :)

Prime factorization of a large number

The prime factors are 2, 3, 5, 97 and 211.

This can be found quickly by trial division of basic primes.

Factorization of really big numbers in C

Factorization of really big numbers

I would like the user to be able to type gigantic numbers, for example: 7777777777777777777777777772

That is a 93 bit number, not that gigantic, so one could simplistically brute force it.

Something like the below if you have access to a unsigned __int128. C does specify 64-bit types, yet beyond that, you are on your own.

This modest factorization I'd estimate could take some minutes.

https://www.dcode.fr/prime-factors-decomposition reports the answer in seconds.

Of course many improvement can be had.

unsigned __int128 factor(unsigned __int128 x) {
  if (x <= 3) {
    return x;
  }
  if (x %2 == 0) return 2;
  for (unsigned __int128 i = 3; i <= x/i; i += 2) {
    static unsigned long n = 0;
    if (++n >= 100000000) {
      n = 0;
      printf(" %llu approx %.0f\n", (unsigned long long) i, (double)(x/i));
      fflush(stdout);
    }
    if (x%i == 0) {
      return i;
    }
  }
  return x;
}

void factors(unsigned __int128 x) {
  do {
    unsigned __int128 f = factor(x);
    printf("%llu approx %.0f\n", (unsigned long long) f, (double)x);
    fflush(stdout);
    x /= f;
  } while (x > 1);
}

void factors(unsigned __int128 x) {
  do {
    unsigned __int128 f = factor(x);
    printf("approx %0.f approx %.0f\n", (double) f, (double)x);
    fflush(stdout);
    x /= f;
  } while (x > 1);
}

Output

approx 2 approx 7777777777777778308713283584
approx 2 approx 3888888888888889154356641792
approx 487 approx 1944444444444444577178320896
approx 2687 approx 3992699064567647864619008
 99996829 approx 14859790387308
 199996829 approx 7429777390798
 299996829 approx 4953158749339
 399996829 approx 3714859245385
 499996829 approx 2971882684351
 ...
 38399996829 approx 38696146902
 38499996829 approx 38595637421
approx 1485931918335559335936 approx 1485931918335559335936

The right answer though is to use more efficient algorithms and then consider the types needed.