Values for and will automatically get reduced mod . Values for will automatically get reduced mod .

Prime | |
---|---|

The discrete logarithm problem is the problem of solving for . Here, is known, but is not. For example, we might want to solve . One algorithm is to iterate through every value of until we find the right answer, but this is obviously unsatisfactorily slow. In fact, in some cases, there may not even be a solution.

The discrete logarithm problem is the basis of many aspects of public-key cryptography. It is primarily the basis of the Diffie-Hellman Key Exchange, which is not actually a cryptosystem. We frequently use public-key cryptography to generate a key to use with AES. RSA does this by encrypting a symmetric AES key, but Diffie-Hellman is not actually encryption, at least not in the same sense as RSA. There are also cryptosystems (namely ElGamal) and digital signatures (e.g., DSA) that make use of discrete logarithms. We will focus instead on key exchanges. The discrete logarithm problem is also very general and works with other groups such as elliptic curves.

A primitive root (also known as a generator) mod is an integer that generates the nonzero numbers mod . That is, for any , has a solution. For example, 3 is a primitive root mod 7 since the powers of 3 are 3, 2, 6, 4, 5, 1, which is a permutation of 1, 2, 3, 4, 5, 6.

Note that may not be solvable. It is only guaranteed to be solvable when is a primitive root. For example, has no solution. (A perfect square is never a primitive root.)

It is not easy to find a primitive root in general, but once you do find one, it is easy to find the others. Given a primitive root , is a primitive root whenever . It turns out that is commonly a primitive root, so it's a good guess to start with. To verify a number is a primitive root, one technique is to write out the factorization of . The order of (that is, the smallest number such that ) must divide . For example, to verify 2 is a primitive root mod 11, we can write . So the order of 2 must be either 2, 5, or 10. Since and , the order must be 10, so 2 is a primitive root. Of course, factoring is difficult, so this is not an ideal algorithm.

The Diffie-Hellman Key Exchange gives a way for two parties, Aaron and Blumenfeld, to agree on a private key that they know about but anyone intercepting their communication will not be able to obtain the key (without the ability to solve discrete logarithms). Of course, this page features a discrete logarithm calculator, but in real life, primes that are over 1000 bits long should be used.

Diffie-Hellman proceeds as follows: Aaron and Blumenfeld agree on a prime and a primitive root . This is public knowledge. Aaron then chooses a secret integer and sends to Blumenfeld. Blumenfeld chooses a secret integer and sends to Aaron. The secret key is then . Aaron can compute this as , and Blumenfeld can compute this as . But anyone who intercepts either or cannot compute unless they can solve the discrete logarithm problem (unless there's another way to compute , which seems unlikely, but which remains an open problem).

There are issues of man-in-the-middle attacks, however. Eve could intercept and pretend to be Blumenfeld, sending for her own choice of to Aaron, and similarly send to Blumenfeld after receiving . We will not discuss this issue further here.

One common algorithm is the Pohlig-Hellman Algorithm, which notes that the value of the logarithm is unique mod (at least when is a primitive root). The idea is to factor and compute the logarithm mod for each , then find the solution mod using the Chinese Remainder Theorem. This attacks works best when consists completely of small factors. For example, when , the largest prime factor of is . This is a good reason to ensure that has at least one large prime factor when choosing .

Another algorithm is called Baby Step, Giant Step. The idea is to choose an integer and form two tables. The first table is pairs for . The second table is pairs for . We then look for a match between the two lists. Once a match is found, we write , which means . Then is the desired solution. This match must exist because it is a number mod and can thus be written uniquely in base as for .

It may be surprising that with just 23 people in a room, there's a greater than 50% chance that at least two people in the room share a birthday. The exact probability can be computed as $$1 - (1 - 1/365)(1 - 2/365)\ldots(1 - 22/365) > 0.5.$$

In general, if we have objects and people, each choosing one of the objects (with replacement, so there can be collisions), the exact probability can be computed in this way, but as grows, the probability is actually estimated quite well by . This can be proven using basic calculus. Choosing requires to result in a 50% chance of two people choosing the same object.

An algorithm based on the Birthday Paradox is Pollard's Rho Algorithm. This algorithm also has variants for factoring integers among other computational tasks. The idea is essentially that of finding a loop in a linked list, but we don't actually store the list.

We break up the integers mod into three sets: . For example, we could place them according to their residue mod 3. Fix and (we want to solve for ). Define for a randomly chosen value of , and for , $$a_{i+1} = \begin{cases}ya_i, &a_i \in S_0, \\ a_i^2, &a_i \in S_1, \\ ga_i, &a_i \in S_2.\end{cases}$$

We then look at pairs of the form (i.e., , etc.). Since we can compute from using the definition of the sequence and from by iterating the sequence definition twice, we need not store any large tables. The pigeonhole principle immediately implies that there will be a collision at some point: with . In fact, there must be a collision with the form .

Once we have , it follows that for all . So set and solve for . This has the solution . Then , or , as desired.

But this requires . This assumption is not stifling, however. Indeed, if , but , then we know we have a cycle of length (or some divisor of ). So there exists an integer with . In this case, we know , which must also be equal to . Since and , we see that , so it is safe to assume that in the first place. What we are doing here is traversing the cycle multiple times.

To give a concrete example, suppose that . But then . But there must be a cycle of length 3 (or possibly 1), so . Since , we can let , and we observe that , or .

Here is a visualization of what the sequence looks like. There is a tail which leads into a cycle. This looks like the Greek letter , hence the name for the algorithm. In this case, we find , and .

When we look for collisions, we also keep track of the exponents. Each term of the sequence takes the form for some values of and . Once we find an equality, we have . Since (we are trying to solve for ), we can write , or $$i_1 - i_2 \equiv x(j_2 - j_1) \pmod{p-1},$$ We can solve for provided that . But even if , we can invert mod and lift the solution to all possible solutions mod to see which one is correct.

Here's an example. Let's compute (10 is a primitive root mod 59). This means we want an with . Let's start with , and partition the integers mod 37 into according their residue mod 3. We have the following computations:

We then discover that , which means , or . This tells us that , which reduces to . Since , we cannot divide by 2, but we can view this mod 29 and get (if for some integer , then , or . Then there are two solutions to try : and . We check that , but . Therefore, $$\log_{10}(37) \equiv 41 \pmod{58}.$$

This algorithm takes time. In fact, it can be shown that a match should be found, on average, within iterations, assuming the sequence is "random" (which may or may not be the case, and which we won't further study here). The Birthday Paradox tells us that we should expect a probability over 50% of a match (although not necessarily a match with indices ) within 10 iterations in this example (which was achieved), again assuming randomness of the sequence.

The most effective algorithm against discrete logarithms is called the Index Calculus. The first step is a pre-computation step. We choose an upper bound and form a factor base consisting of all primes less than . We compute for several values of and try to write it as a product of primes in . We may not always be able to do this, in which case we try more values of , or a larger value for .

Each congruence we obtain will look like , which means . With enough such relations, we can solve for for each . Next, for several values of , we compute and also try to write it as a product of primes in . This means , so we can easily solve for given knowledge of for each .

Here's an example of the Index Calculus in action. Let's try to solve . Note that 5 is a primitive root mod 47. Let , so that . First the pre-computation:

Recall that logs live mod , not . The second equation implies that , which means that . The third equation then implies that , or . Clearly . The first equation gives , or . Therefore, our pre-computation yields:

Now for the actual computation, let's find . Compute . Therefore, $$\log_5(23) \equiv \log_5(3) + \log_5(7) - \log_5(5) \equiv 20 + 32 - 1 \equiv 5 \pmod{46}.$$

Note that the pre-computation step can be reused for other finding other logarithms with the same base. There is certainly a tradeoff with how to choose . For larger values of , the pre-computation step takes longer, but the main computation becomes easier (since it may be hard to write as products of primes in a smaller factor base).