I'm crawling data from internet,without classifying.
Is there such a library to recommend?
EDIT
I'm crawling jobs from other website,and I need to group them into different industries.
cluster-analysis
I'm crawling data from internet,without classifying.
Is there such a library to recommend?
EDIT
I'm crawling jobs from other website,and I need to group them into different industries.
Best Solution
To sort unlabelled data into groups, you want clustering, not classification. The most complete machine learning library is the Java-based Weka. You'll probably want to start by extracting text from the web pages (remove script and style elements completely, strip other tags), and then running the text through the StringToWordVector filter before performing clustering.