I dunno. Clearly people are making these things work, but I get the impression that mostly it's throwing a bucket of tacks at the problem and hoping something will stick.
Update: some perusing of the forum led me to the realization that I wasn't testing trigrams correctly. Looking at only the first trigram in a sentence gave me "[s] One two", so that any spelling error in "One" would be lost. Once that was fixed, trigram double-backoff worked as well as bigram backoff. A little twiddling with the backoff coefficients got me slightly better performance than my original bigram backoff with a 0.4.
Moral of the story: the choice of backoff coefficient makes a difference. Which is why I hate statistical approaches.
No comments:
Post a Comment