Computational and Mathematical Methods in Medicine
Volume 2012 (2012), Article ID 712542, 12 pages
http://dx.doi.org/10.1155/2012/712542
Research Article

Recursive Feature Selection with Significant Variables of Support Vectors

1Department of Agronomy, National Taiwan University, Taipei 106, Taiwan
2Department of Statistics, Columbia University, New York, NY 10027, USA
3Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR 72079, USA
4Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Taipei 115, Taiwan

Received 29 November 2011; Revised 9 May 2012; Accepted 17 May 2012

Academic Editor: Seiya Imoto

Copyright © 2012 Chen-An Tsai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.