A quick experiment that took a bit due to kNN scaling on large datasets: the kNN + Gzip method is a bit better than cosine similarity on count vectors. On the IMDb Movie review dataset:
– 70% test acc for gzip
– 65% test acc for cosine distance My (re)implementation code:
kNN Gzip Method Outperforms Cosine Similarity on IMDb Reviews
By
–
Leave a Reply