Monday, January 18, 2010

Solving Project Euler problems

Late last year I got into Project Euler in my spare time for a couple of months. I solved over 150 of the problems, then I lost interest and have only sporadically solved any since. However I know I could do many more if I put some more time into it. I also contributed P240 which a number of people liked.

That's not a great track record, but it is not a bad one either.

This post is my advice for people who can solve basic Project Euler problems but want to get better at solving them. I happened to use Perl to solve problems, but the advice should work in a variety of languages. The techniques I used use closures a lot, but that's because that's the kind of guy I am.

First if you are unable to solve the first 5 then I would suggest that you're missing some basic background. I would suggest learning a programming language first, then some math. Only after that should you try Project Euler.

If you're going to tackle Project Euler problems, there are some things that come up over and over again. In particular I would recommend learning how to use dynamic programming to solve problems. That comes up a lot. Secondly I would recommend learning how to handle infinite streams of data. Thirdly do not be afraid of attempting brute force techniques with clever bucketing and pruning. If your program runs in a minute, you've found what they were looking for.

With those basic techniques out of the way, as you solve problems you'll develop a library of functions you use over and over again. Here are the functions in my library, in no particular order, with no particular attempt at good design (after all PE is just one throw away program after another...):

  • gcd Calculates the GCD using the Euclidean Algorithm

  • prime_iterator Returns a function which, every time it is called, will return the next prime. Internally I use the Sieve of Eratosthenes to generate this list. Note that in this code an "iterator" is always a function which iterates through some (possibly infinite) set as you call it.

  • is_prime Takes a number, makes sure I've generated all of the primes up to that number, then checks whether it is prime.

  • is_prime2 Takes a number, generates all of the primes up to the square root of that number, then uses trial division to check whether it is prime. This routine is slower than the previous if you're going to check a lot of primes, but uses much less memory. P118 is a good example of where you would use this version.

  • ith_prime Access the list of primes as if it were an array. (I had ith_prime(0) return 2, ith_prime(1) return 3, and so on. Remember what I said about not being carefully designed?)

  • factor Takes a number, returns a hash of prime factors and multiplicities. Trial division by primes up to the square root was efficient enough for all of the PE problems that I tackled.

  • factors Returns the prime factors of a number as an array with duplication to represent multiplicity.

  • n_pow_m_mod_k What the name says. I used it for P133 and stuck it in my library because I wanted to reuse it for P66 but it didn't work for the second one. (Search for Pell's Equation for how to handle that problem.)

  • minimal_divisible_repunit See P129 and P130 for what this is about. Internally this reuses the n_pow_m_mod_k function in its calculations.

  • pythagorean_triple_iterator Iterates through all Pythagorean triples ordered by the lengths of their sides. I'll have more to say on the algorithm in a bit.

  • generate_stream A promise is a function which, every time it is called, will return some set of future values you haven't reached yet, returning nothing when it runs out. This function takes a promise and lets you treat it like an infinite array, ask for the 1st, 5th or 500th element and it will generate the promise until it has enough stuff then return the value.

  • generate_memoryless_stream Like the previous but instead of accessing it like an array you access it like an iterator. This is useful for cases where it is convenient to calculate a block of values at a time. A perfect and pertinent example is the Sieve of Eratosthenes, which calculates blocks of primes at a time.


Now on to the Pythagorean triple iterator. How do I iterate through them? Well first I use Euler's formula to generate all primitive triples in the order that I want, and then I merge together all of the multiples of that stream to get all triples. To understand how that merging works, it would be good to go back to the infinite streams article and actually read it.

This sounds simple in theory. But in practice you have a problem. I'm merging a lot of data. I do that by maintaining a partially sorted pool of things I want to merge in, and then always taking the least next element. There is a data structure for that. It is called a heap. A heap is a tree, which has the heap property that each node is less than or equal to its children. This makes the top node at the root of the tree the smallest element. (Why do CS people always draw trees upside down so that the root is at the top? Really it would make more sense to call it a "root system" or else to draw it the other way around. Oh well, that's the convention.)

Luckily CPAN has multiple heap implementations. Unfortunately the one named Heap sucks, and I couldn't be bothered to look through the rest to find one that met my needs. So I took pointers from an implementation by Abigail, and implemented my own. The clever (and standard) trick in that implementation is that arrays can be used to represent a tree. The two nodes below the n'th element in the tree are at 2n+1 and 2n+2. So you can take an array of data, turn it into a heap, and then manipulate it as if it was a tree.

Writing your own heap implementation is a good exercise, so I'll just give the methods in my heap implementation for anyone who wants to duplicate it.

  • new Construct a new Heap. There is one optional argument, which is the comparison function. (Which is passed in as a closure, as you might expect.)

  • append Push a new element on the end, then let it "bubble up" to its natural position. Here is what I mean by "bubbling up". If it is currently at position $n > 0 then its parent is at position int(($n-1)/2). If it is greater than or equal to its parent you're done. Otherwise you need to swap with its parent and continue until it either has a smaller parent or reaches the root.

  • top Peek at the root element.

  • swap_top Take the root element out, replace it with something else, and let the something else "fall down" to its natural level. Returns the previous top.

  • extract First pop the last element off of the array, then call swap_top. This trick gives you the least element in the heap and makes everything else be represented in an array that is 1 smaller than the previous.


Hopefully this helps others who are trying to tackle Project Euler problems and haven't made much headway yet.

6 comments:

Anonymous said...

Also needed: routines for dealing with very large integers. Perl provides that already, but I chose to write my own. IIRC, what's needed builds up gradually (adding a small integer into a large one, then adding two large integers, multiplying a small integer into a large one, etc.)

Ben Tilly said...

Yes, you need that. But I've solved that particular problem before, so I was willing to use the standard Perl solution this time around.

Crag said...

Memoization is another important tool for Project Euler.

pqnelson said...

"Now on to the Pythagorean triple iterator. How do I iterate through them? Well first I use Euler's formula..." I think you mean Euclid's formula.

At any rate, project Euler is great for mathematicians to practice their ability to do mental calculations.

OTOH, I've been working on the first 5 problems for quite some time now...but it's all good fun in mathematics!

viraj123 said...

Check out my blog:
totallygamed.wordpress.com for C++ solutions

Akshika Wijesundara said...

i solved 35 problems,and i did that using c language,nothing else,i find it bit difficult to go any further, what is you advice for me too improve on that??