Wikipedia has a really nice and thorough language recognition chart. It would be nice to put that into a Perl module. The Wikipedia page also lists a couple of additional leads that are kind of neat:
- Translated online guesser - uses a vector space model
- Huh. The other two links are dead. That's a shame - but it may be worth following up on them at a later date.
Perl already has Lingua::Identify (0.30 in 2011), but I don't know how accurate it is or its coverage. It's definitely worth looking at, though. There's also a statistical approach in Lingua::Ident (1.7 in 2010).
No comments:
Post a Comment