Iterative programming … and clustering

The good part about writing everything in a simple language, it that it makes iterative programming easy…  Time to fix and other such fun issues would be a pain in the but if I was doing all of my Clustering 101 development work in C++, though it might reduce the run time down.

That said, finally got a full run in last night, took 5 hours — ah the joys of sleep — and discovered that I’d accomplished … something!  Oh, but wait everything was high frequency terms, no real surprise.   Now, that I’ve got a working system, it time to really focus in on making meaningful token vectors as the inputs.

For your enjoyment, here’s some of the cluster leaders:

  • we, have, about
  • as, was, but
  • use, how, your
  • their, are, can
  • are, google, your
  • video, game, can
  • we, our, as
  • as, an, about
  • are, as, will
  • new, has, company
  • nbsp, are, have  [hmm… my de-HTMLing has a bug]