## Monday, February 8, 2010

### Math Mystery

It is amazing how easily you can get things wrong when you forget the basics. Here is a case and point from today, I was recalling the Taylor series of ex, and something didn't add up. First lets look at the intergral of ex:

$\int e^x dx = e^x$

Noting the Taylor Series of ex:

$\int e^x dx = \int \sum_{n=0}^{\infty}\frac{x^n}{n!} dx = \sum_{n=0}^{\infty} \int \frac{x^n}{n!} dx$
$=\sum_{n=0}^{\infty}\frac{x^{n+1}}{(n+1)*n!}=\sum_{n=0}^{\infty}\frac{x^{n+1}}{(n+1)!}=\sum_{n=1}^{\infty}\frac{x^{n}}{(n)!}$
$= e^x - 1$

Clearly these two methods yield a different result. Spot the error that I made, it took me a second to figure it out....

## Wednesday, August 19, 2009

### Why there can never been an end all be all API

I have heard people rail on that open source reduces coding time because it prevents you from having to reinvent the wheel. This argument is only true in part, because sometimes it is necessary to optimize the code heavly, which means making certain assumptions about your domain. Sometimes, the open source layer, is on the other extreme and made to many assumptions for your application. Please note that closed source API's are no different in this regard. So in the end keep in mind that sometimes, you might be stuck rewriting, or at least thoroughly revamping something as ordinary as a MP3 decoder.

Open source code is generally very poorly documented, when you buy source code from a vendor it is typically documented better and you will get support for the code, even if you need to change it. They will first see if its possible to do what you want to do without modification of the base API, this is a good thing. However if you find that this is not the case they should help you muck their code base into another hemisphere.

## Wednesday, July 22, 2009

### Know thy Hash

I work a lot with various kinds of hashes in my profession, and I am finding more and more that peoples understanding of hashes is haphazard at best. Hashes are defined to meet the following goals:

• It is easy to compute a hash for a given length.
• It is infeasible to find a message given a hash.
• It is infeasible to change the message without changing the hash.
• It is infeasible to find to messages with the same hash.

Oddly enough the middle two are more for cryptographic purposes, then they are for "general" hashes. Using a hash to store data into a Hash Table for example, there is no need to use a complex hash.

So this is the first observation, know what your hash was designed for. Hashes are typically designed for one of two purposes. The first is Cryptography examples include MD5, SHA-1, etc. These are generally the slowest to compute, have a large number of output bits. These hashes should be reserved for what they are designed for. Their strength and bit complexity does make they useful at uniquely identifying files.

The second is to discriminate some data from others quickly, generally in a hash table. The first lesson is that uniqueness is never guaranteed. The second is use a hash that is both fast and is very low from a collision point of view. You also want a hash whose result will be in the same number of bits as the native register. 32 or 64 bits right now. Here are some examples of fast hashes.

Make sure to deal with collisions, they are simply far more likely then you think. Also remember that this analysis assumes that the hash is strongly collision free, if that is not the case the results are worse.

Finally, there is a trend to use things that where never designed to be hashes as hashes. For example the CRC-32 algorithm. Using this algorithm as a hash is generally a bad idea because there are hashes that are much faster to compute with similar properties for being collision free.

## Wednesday, June 24, 2009

### Mathematics of a Perfect Hash

This is in response to a problem known as the "Kember Identity". I wanted to know if this search was in vane, or had some chance of success. The third requirement of a Hash is that "It is computationally infeasible to find messages m1 and m2 with h(m1) = h(m2) (in this case, the function h is said to be strongly collision-free)" (Trappe p219). Given that statement it is reasonable to state that the bits of the hash should be randomly flipped.

The idea of the Kember Identity is simple, find a hash of a message (in this case MD5) such that: h(m) = m. In order for this to occur the bits of at least one 128-bit message need to hash into the same 128 bit digest. Assuming a hash that has the property of treating all bit combinations as equal even its own, there is a 1 in 2128 chance that any single value will achieve this. This can be restated as there is a 2128-1 in 2128 chance that it will be anything else. The question is what are the odds that all 2128 will be anything else, the answer is:

$\left (\frac{2^{128}-1}{2^{128}} \right )\left (\frac{2^{128}-1}{2^{128}} \right )\left ( ... \right )=\left (\frac{2^{128}-1}{2^{128}} \right )^{2^{128}}$

Using my favorite package to ask big math questions on (PARI), I came up with an answer of approximately 37%. That means there is a 37% chance that in a perfect hash there will be no combination of a 128-bit message that will produce the same output. To put it in the reverse which I am sure my friend who is searching for the "Kember Identity" is interested in, that means that there is a 63% percent chance that there is at least one 128-bit message whose digest is the same.

Kember came out here much better then I predicted. I should have remembered the lessons of the Birthday Paradox. Needless to say if MD5 performs near the assumptions above there is a better then 50/50 chance that such a hash exists. I wish Kember the best of luck in his search.

UPDATE: Since 2128 is a very large number lets see what happens when the above equation goes to infinity:

$\lim_{n\to \infty } \left (\frac{n-1}{n} \right )^{n} = e^{-1}\approx 0.37$

Which gives an answer near the long and more correct way to calculate this.

## Thursday, June 18, 2009

### Overpayments on Loans

Someone came up to me today and asked about what are known as Bi-weekly loans. The statement he made at first was slightly counter intuitive, "You can make half your monthly payment and pay off your loan in six to eight years earlier." I was doubtful, and the math did not yield the result, if anything more compounding cycles means you would pay more over 30 years not less.

The "trick" or lie (if your the bank) is that with the Bi-weekly schedule you actually make what is equivalent to an extra month payment every year. My contention was that if you overpaid the mortgage by the same amount each month that would sum up to what you where effectively doing with a Bi-weekly schedule you would wind up in the same position or near the same position, so lets take a look at that.

Resolving our relationship X from the previous article for nt yields:

$nt=-\frac{\ln(1-(r/t)(1/(X+OP))(P-D))}{\ln(1+r/t)}$

So if you overpay OP by X/12 each schedule you will find that you will pay the loan off by about the same amount of time. Example r = 4%, t = 12, X = 1432.25, P = 300,000. Note that a payment of 1432.25 is what it takes pay of the loan in 30 years. Applying this the equation above we find that overpaying by the specified amount would yield 310.78 terms or you would be able to pay it off in 25 years and 11 months, the last month you would make a reduced payment of 1211. So if you don't want to refinance your loan, just make the overpayment and make sure to write on the stub that you want the additional amount to be applied to the principal. ## Wednesday, June 17, 2009 ### Here a geometric series there a geometric series Over the past two weeks I have been running into geometric series everywhere. Geometric series are simple little equations that have the form of: $\sum_{i=0}^{n} ak^i$ There is wiki that goes into the basics of geometric sums and their basic identies on Wikipedia: http://en.wikipedia.org/wiki/Geometric_series. For now I only present to you one of the places I have run into this little bad boy, and that is in the calculation of amortization. Every time I go through a application that does this I wondered where the math came from, I finally decided to work it out for my self. Lets start with the knowns, r is the yearly APR, t is the number of times the APR is compounded in a year, n is the number of years, P is the principle on the loan, and D is your down payment. The only variable we are looking for X is your monthly payments. At the end of the loan term or n*t payments we would like there to be no balance in the account. First the amount that in the loan has to start with: $y(0) = P - D$ Each subsequent payment takes interest from the previous period, and deducts your payment: $y(1) = (1+r/t)y(0) - X$ $y(2) = (1+r/t)y(1) - X$ $y(nt) = (1+r/t)y(nt-1) - X$ Expanding the recursive relationship manually reveals a equation that can have Horner's Rule applied to reduce it to a more manageable form: \begin{align*} y(nt) &= -X + (1+r/t)(-X + (1+r/t)(...\ +\\ &+ (1+r/t)(P-D)) \end{align*} Transposing by Horner's Rule and collapsing the subsequent series into a Geometric Series: $y(nt) = (P-D)(1+r/t)^{nt}-\sum_{i=0}^{nt-1} X (1+r/t)^i$ Apply the identity from the Wikipedia page above: $y(nt) = (P-D)(1+r/t)^{nt}-X\frac{(1+r/t)^{nt}-1}{r/t}$ Recalling that at the end of the term we desire no net balance, solving for X and simplifying: $y(nt) = 0$ $(P-D)(1+r/t)^{nt}=X\frac{(1+r/t)^{nt}-1}{r/t}$ $X=\frac{r/t(P-D)(1+r/t)^{nt}}{(1+r/t)^{nt}-1}$ Now I have the magic amortization formula in my back pocket in case I ever need it again. You can also quickly calculate your total payments Xnt or the amount of interest you paid Xnt - (P - D). The two other areas I have had the Geometric Series show up was calculating the probability of winning at a game of dice, and a 401k what if Excel sheet I made. The Dice game is really interesting because it deals with an unbounded Geometric Series. UPDATE: A simplification of the formula above is possible: $X=\frac{r/t(P-D)}{1-(1+r/t)^{-nt}}$ If you know how much you can pay each month X and are looking for how much of a mortgage you can afford you can resolve the equation above for P-D $P-D=\frac{X(1-(1+r/t)^{-nt})}{r/t}$ Also note that there is a linear relationship between the amount you can pay and the amount of a mortgage you can afford. $\frac{\partial}{\partial X}(P-D)=\frac{1-(1+r/t)^{-nt}}{r/t}$ UPDATE: Homer's Rule replaced with Horner's Rule ## Tuesday, June 16, 2009 ### Why Traditional 401k wins, assuming same tax rate One of my friends asked me to explain why the Traditional 401k wins over the Roth 401k in more detail assuming the tax rates are the same as when they retire. Lets assume an individual makes50,000 today, this means he is taxed at an average tax rate of 11.9% (Federal Income only) but is in the 25% tax bracket. Lets say the same individual puts away $2500 dollars. This means that his average tax rate dropped to 11.21%, also his take away income only dropped$1875. Now lets say we assume he retires as the financial experts say with 80% of todays income, or 40,000. When he re walks the system his tax rate has dropped even again to 9.45% of his income coming out of the 401k, and he has even dropped to the 15% bracket.

In the case of the Roth 401k you are always at a tax rate of 11.9%. This means when you assume that taxes will go up in the future (comparing the Roth to the Traditional) you must also assume the tax rates will go up enough in the future to cancel out this effect. Since I don't see the marginal tax system in the country going any where I am going to assume that odds are you are going to find that it will probably stay the same or even get better unless you are in the top 2%.

Source for Tax Calculations: http://www.dinkytown.net/java/TaxMargin.html