# FSROC (Feature Selection via ROC curve related measures)

Summary: FSROC has two parts: an AUC and a partial AUC based feature selection software based on the PTIFS and SpAUC algorithms proposed, respectively, in Wang et al (2007) and Wang and Chang (2009a), which aims for finding the linear combination of features such that the (partial) area under ROC curve is maximized. This algorithm is based on a similar idea of LARS and is available R and Fortran programming languages. Current version (version 2) is for R under MS-windows system only (XP and Vista). (Example of FSROC:PTIFS and SpAUC;Normalization of variables may be required for some applications.) updated 2010-05-15

# GroupAUC (Group feature selection via AUC)

Summary: GroupAUC is an AUC based group features subsect selection software, which is developed for gene sets selection in Wang, et al. (2009). In this program, there is no clustering algorithms involved. That is, we assume the clustering information is available beforehand. However, as mentioned in the following report, if there is no cluster information available, then one can apply the existing clustering algorithms to form clusters first. For clustering algorithm, please refer to Dr. C-h. Chen's web page. (Example of GroupAUC) updated 2010-05-15

**GoldAUC (AUC-type measure without Binary Gold Standard)**

summary: GoldAUC calculates AUC-type measures without a binary gold standard as a reference based on the technical report of Wang and Chang (2010b), where they introduced a new AUC-type summary index based on a continuous gold reference. In addition, the linear combination of variables that maximizes such an index can be obtained based on a TGDM-based algorithm. Under normality assumption, these optimal combination of variables can be calculated simliar to solving a linear regression model, hence it can done using the LARS algorithm with lenghty variable situation. (For the program of the ordinal gold standard, please write to ycchang@stat.sinica.edu.tw)

FSROC now works on R => 2.11.1. GroupAUC and GoldAUC are only good for R <= 2.11.0.

Data Sets:

- Prostate Cancer (Please contact the original owner for permission to use this data set: Adam, Bao-Ling, Department of Microbiology and Molecular Cell Biology, Eastern Virginia Medical School, Norfolk, VA 23507, USA)
- Liver Cancer

References:

back

Zhanfeng Wang, et al. (2007). A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve, Bioinformatics 2007 23(20):2788-2794.@article{wang-et-al07,

Author = {Wang, Z. and Chang, Y-c. I. and Ying, Z. and Zhu, L. and Yang, Y.},

Journal = {Bioinformatics},

Number = {20},

Pages = {2788 -- 2794},

Title = {A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve},

Volume = {23},

Year = {2007}}

Zhanfeng WangandY-c. I. Chang*(2010a). Markers selection via maximizing the partial area under ROC curve of linear risk scores, Biostatistics (accepted) .@article{wang-chang10,

Author = {Wang, Z. and Chang, Y-c. I. },

Journal = {Biostatistics},

Number = {},

Pages = {},

Title = {Markers selection via maximizing the partial area under ROC curve of linear risk scores},

Year = {2010}}

. Identifying Differential Gene Sets Through the Linear Combination of Gene Sets that Maximizes the Area Under Receiver Operating Characteristic Curve, Techinical Report 2009-07, Institute of Statistical Science, Academia SinicaZhanfeng Wang, Chen-An Tsai andYuan-chin I. Chang*(2009)@techreport{wang-tsai-chang09,

Author = {Wang, Z. and Tsai, C. and Chang, Y-c. I. },

Institution = {Institute of Statistical Science, Academia Sinica},

Number = {2009-07},

Title = {Identifying Differential Gene Sets Through the Linear Combination of Gene Sets that Maximizes the Area Under Receiver Operating Characteristic Curve},

Year = {2009}}

Zhanfeng Wang and Yuan-chin Ivan Chang (2010b).Study of Receiver-operating characteristic curve type meaures when the gold standard is continuous, Technical Report 2010-02, Academia Sinica.@techreport{wang-chang10-gold,

Author = {Wang, Z. and Chang, Y-c. I. },

Institution = {Institute of Statistical Science, Academia Sinica},

Number = {2010-02},

Title = {Study of Receiver-operating characteristic curve type meaures when the gold standard is continuous},

Year = {2010}}