I’ve been playing with clustering my email, just a sample set of 300 or so messages. It’s been a while since I’ve done any “NLP” work and it’s really quite fun.
Some learnings:
As the dimentionality of space increases everything starts to sit at the origin:
initially you might have 120 unique words in an email message, some [...]



