Worst Result Ever: No Separation

Wednesday, January 23, 2008

Hoping for separation between the blue data set and the red data set...

Jean-Claude Bradley said...: Nice idea! If possible try to explain a bit more about what the plots represent.; January 24, 2008 at 6:27 AM
shwu said...: This plot is actually mine, so I'll explain here.

We were attempting to build a model for predicting phosphorylation sites in protein structures. The red distribution represents the negative training set and the blue represents the positive training set. We scored both training sets with the model and plotted the distribution of scores (scores on x-axis, # of examples on the y-axis; the red distribution is flipped for easier viewing of the two distributions).

We hoped there would be good separation between the positive and negative scores because that would indicate a good model. Unfortunately, the model could barely tell the difference between real sites and negative sites.; January 24, 2008 at 2:16 PM
Jean-Claude Bradley said...: Hopefully you'll build a better model :); January 24, 2008 at 2:30 PM
Sami said...: i remember this project!!! ;); January 26, 2008 at 10:28 AM