Wednesday, January 23, 2008

No Separation


Hoping for separation between the blue data set and the red data set...

4 comments:

Jean-Claude Bradley said...

Nice idea! If possible try to explain a bit more about what the plots represent.

shwu said...

This plot is actually mine, so I'll explain here.

We were attempting to build a model for predicting phosphorylation sites in protein structures. The red distribution represents the negative training set and the blue represents the positive training set. We scored both training sets with the model and plotted the distribution of scores (scores on x-axis, # of examples on the y-axis; the red distribution is flipped for easier viewing of the two distributions).

We hoped there would be good separation between the positive and negative scores because that would indicate a good model. Unfortunately, the model could barely tell the difference between real sites and negative sites.

Jean-Claude Bradley said...

Hopefully you'll build a better model :)

Sami said...

i remember this project!!! ;)