Oct 18, 2011

[Test] Horrible horrible bug (and fever)

Home sick from work I forced my feverish self to go through a few of the horrendous test tournaments I've run since Sunday (the doomsday of my new version, where all I gained suddenly was nullified).

Firstly, I noticed both v0.34 and my new version are quite bad at evaluating pawns. Sure I run the tournaments at very fast time controls (10sec+0.1) and I can't expect them to calculate all the way to queen all the time. But I do think that's where I have the main evaluation weakness. I'll see what I can do about that.

But there were games where my new version's evaluation suddenly dropped from winning to severely losing. On some occasions it was just a bad position that would inevitably lead to doom, but then I found tenths of games where it just dropped the queen for nothing, returning a mate score.

Hopeful and confused I started digging. The "mates" were all singular response moves, like checking with the queen next to the king with the only move available being capturing the queen.

The big problem was I couldn't replicate any of the positions. Inserting them and searching would always give a reasonable move.

Suspecting my new check evasion algorithm I put down traces where the quiescent search returned mate and ran hundreds of test positions, but no luck. So I added traces to every conceivable place where mate could be returned and then finally I found a position where the main alpha-beta loop would return a mate score even though there existed a legal move that avoided it.

So after some stepping through the code I found the culprit, some time for some idiotic reason I had removed the scoring of the hashmove. Since the score isn't really used (hasmoves are always searched first) I must have deemed it unnecessary.

But with my new way of storing moves (by ply) it's imperative that every move is given a score, even if it isn't used. Reason being further searches can stumble upon old scores and act on them. Like setting the hashmove score to -10000 to avoid researching it, and then not research the next hashmove either since the -10000 score is lingering around.

Which is exactly what happened.

I seem to remember mentioning this not so long ago, and still I managed to fall into the trap again. :)

After having fixed the bug I ran a quick 128 game test to make sure it solved the problem. And being burnt from Sunday's fiasco I ran another one as well. :) The result:

Program Elo + - Games Score Av.Op. Draws
1 M1-1 : 2441 37 37 256 61.7 % 2359 28.1 %
2 M34 : 2359 37 37 256 38.3 % 2441 28.1 %
Back in business again! Now I just have to get rid of this stupid cold. :)

No comments: