Java – Increase accuracy of WEKA Multilayer Perceptron model

artificial-intelligencejavamachine-learningneural-networkweka

I am currently practicing the ropes of WEKA modelling with the free UCI breast cancer .arff file and from the various posts here I was able to tweak it's accuracy ranging from 63% to 73%. I use WEKA 3.7.10 in a Windows 7 Starter machine.

  • I used attribute selection to decrease the number of variables using InfoGainAttributeEval with Ranker. I chose the topmost five with the following result:

    Evaluator:    weka.attributeSelection.InfoGainAttributeEval 
    Search:       weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1
    Relation:     breast-cancer
    Instances:    286
    Attributes:   10
                 age
                 menopause
                 tumor-size
                 inv-nodes
                 node-caps
                 deg-malig
                 breast
                 breast-quad
                 irradiat
                 Class
    Evaluation mode:    10-fold cross-validation
    
    
    
    === Attribute selection 10 fold cross-validation (stratified), seed: 1 ===
    
    average merit      average rank  attribute
    0.078 +- 0.011     1.3 +- 0.64    6 deg-malig
    0.071 +- 0.01      1.9 +- 0.3     4 inv-nodes
    0.061 +- 0.008     3   +- 0.77    3 tumor-size
    0.051 +- 0.007     3.8 +- 0.4     5 node-caps
    0.026 +- 0.006     5   +- 0       9 irradiat
    0.012 +- 0.003     6.4 +- 0.49    1 age
    0.01  +- 0.003     6.6 +- 0.49    8 breast-quad
    0.003 +- 0.001     8.5 +- 0.5     7 breast
    0.003 +- 0.002     8.5 +- 0.5     2 menopause
    
  • After removing the low-ranked variables, I proceeded to create my model. I chose Multilayer Perceptron because it was a required algorithm from the journal I was basing my study on.

The suggestion of Bernhard Pfahringe to use 0.1 for the learning rate and momentum and the factors of exponential numbers 1, 2, 4, 8, for the hidden nodes and epoch and so on.

After a few tries with the method, I noticed a pattern of using 2 for the hidden layers and a decimal equivalent of a binary number ie. 512, 1024, 2048, … that results to increasing accuracy. For example, a hidden node of 2 with epoch 1024 and so on.

I have a varied series of results but the highest one I got so far was with the following (using hidden node 2 and epoch 16384:

    Scheme:       weka.classifiers.functions.MultilayerPerceptron -L 0.1 -M 0.1 -N 16384 -V 0 -S 0 -E 20 -H 2
    Relation:     breast-cancer-weka.filters.unsupervised.attribute.Remove-R1-2,7-8
    Instances:    286
    Attributes:   6
                  tumor-size
                  inv-nodes
                  node-caps
                  deg-malig
                  irradiat
                  Class
    Test mode:    10-fold cross-validation

    === Classifier model (full training set) ===

    Sigmoid Node 0
        Inputs    Weights
        Threshold    -2.4467109489840375
        Node 2    2.960926490700117
        Node 3    1.5276384018358489
    Sigmoid Node 1
        Inputs    Weights
        Threshold    2.446710948984037
        Node 2    -2.9609264907001167
        Node 3    -1.5276384018358493
    Sigmoid Node 2
        Inputs    Weights
        Threshold    0.8594931368555995
        Attrib tumor-size=0-4    -0.6809394102558067
        Attrib tumor-size=5-9    -0.7999278705976403
        Attrib tumor-size=10-14    -0.5139914771540879
        Attrib tumor-size=15-19    2.3071396030112834
        Attrib tumor-size=20-24    -6.316868254289899
        Attrib tumor-size=25-29    5.535754474315768
        Attrib tumor-size=30-34    -12.31495416708197
        Attrib tumor-size=35-39    2.165860489861981
        Attrib tumor-size=40-44    10.740913335424047
        Attrib tumor-size=45-49    9.102261927484186
        Attrib tumor-size=50-54    -17.072392893550735
        Attrib tumor-size=55-59    0.043056333044031
        Attrib inv-nodes=0-2    9.578867366884618
        Attrib inv-nodes=3-5    1.3248317047328586
        Attrib inv-nodes=6-8    -5.081199984305494
        Attrib inv-nodes=9-11    -8.604844224457239
        Attrib inv-nodes=12-14    2.2330604430275907
        Attrib inv-nodes=15-17    -2.8692154868988355
        Attrib inv-nodes=18-20    0.04225234708199947
        Attrib inv-nodes=21-23    0.017664071511846485
        Attrib inv-nodes=24-26    -0.9992481277256989
        Attrib inv-nodes=27-29    -0.02737484354173595
        Attrib inv-nodes=30-32    -0.04607516719307534
        Attrib inv-nodes=33-35    -0.038969156415242706
        Attrib inv-nodes=36-39    0.03338452826774849
        Attrib node-caps    6.764954936579671
        Attrib deg-malig=1    -5.037151186065571
        Attrib deg-malig=2    12.469858109768378
        Attrib deg-malig=3    -8.382625277311769
        Attrib irradiat    8.302010702287868
    Sigmoid Node 3
        Inputs    Weights
        Threshold    -0.7428771456532647
        Attrib tumor-size=0-4    3.5709673152321555
        Attrib tumor-size=5-9    3.563713261511895
        Attrib tumor-size=10-14    7.86118954430952
        Attrib tumor-size=15-19    2.8762105204084167
        Attrib tumor-size=20-24    4.60168522637948
        Attrib tumor-size=25-29    -5.849391383398816
        Attrib tumor-size=30-34    -1.6805815971562046
        Attrib tumor-size=35-39    -12.022394228003419
        Attrib tumor-size=40-44    11.922229608392747
        Attrib tumor-size=45-49    -1.9939414047194557
        Attrib tumor-size=50-54    -5.9801974214306215
        Attrib tumor-size=55-59    -0.04909236196295539
        Attrib inv-nodes=0-2    5.569516359775502
        Attrib inv-nodes=3-5    -7.871275549119543
        Attrib inv-nodes=6-8    3.405277467966008
        Attrib inv-nodes=9-11    -0.3253699778307026
        Attrib inv-nodes=12-14    1.244234346055825
        Attrib inv-nodes=15-17    1.179311225120216
        Attrib inv-nodes=18-20    0.03495291263409073
        Attrib inv-nodes=21-23    0.0043299366591334695
        Attrib inv-nodes=24-26    0.6595250300030937
        Attrib inv-nodes=27-29    -0.02503529326219822
        Attrib inv-nodes=30-32    0.041787638417097844
        Attrib inv-nodes=33-35    0.008416652090130837
        Attrib inv-nodes=36-39    -0.014551878794926747
        Attrib node-caps    4.7997880904143955
        Attrib deg-malig=1    1.6752746955482163
        Attrib deg-malig=2    6.130488722916935
        Attrib deg-malig=3    -6.989852429736567
        Attrib irradiat    8.716254786514295
    Class no-recurrence-events
        Input
        Node 0
    Class recurrence-events
        Input
        Node 1


    Time taken to build model: 27.05 seconds

    === Stratified cross-validation ===
    === Summary ===

    Correctly Classified Instances         210               73.4266 %
    Incorrectly Classified Instances        76               26.5734 %
    Kappa statistic                          0.2864
    Mean absolute error                      0.3312
    Root mean squared error                  0.4494
    Relative absolute error                 79.1456 %
    Root relative squared error             98.3197 %
    Coverage of cases (0.95 level)          98.951  %
    Mean rel. region size (0.95 level)      97.7273 %
    Total Number of Instances              286     

    === Detailed Accuracy By Class ===

                     TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                     0.891    0.635    0.768      0.891    0.825      0.300    0.633     0.748     no-recurrence-events
                     0.365    0.109    0.585      0.365    0.449      0.300    0.633     0.510     recurrence-events
    Weighted Avg.    0.734    0.479    0.714      0.734    0.713      0.300    0.633     0.677     

    === Confusion Matrix ===

       a   b   <-- classified as
     179  22 |   a = no-recurrence-events
      54  31 |   b = recurrence-events

My question is how can I raise this data's accuracy to at least in the 90% mark?
Do I have to do filtering, use another pattern of MLP input parameters?

I plan to have another set of data that I will use after I've learned how to do this (it has around 50 variables and 100,000 instances).

Best Solution

There is obviously no good answer for such a question, but I will give you some more or less general hints for using MLP:

  • First, why do you remove features while working on such small dataset? Feature selection is important in high dimensional problems and/or computationally expensive models. Neither is true for breast-cancer and MLP.
  • Iteration count is the worst stopping criteria for MLP, you should stop training when validation error rises, not after some fixed amount of iterations
  • I do not know what cost function you use, but the most important part is the regularization, as MLP is prone to overfitting. Some Tikhonov regularization is a minimum.
  • Using more than one hidden layer for such a problem is completely redundant. Especially, that training more than one hidden layer in MLP is often impossible due to vanishing gradient phenomena.
  • In order to get free from learning algorithms parametrization I would also suggest to abandon the naive algorithm and use at least resillent propagation, which proves to work very well in many applications.
Related Question