We were attempting to build a model for predicting phosphorylation sites in protein structures. The red distribution represents the negative training set and the blue represents the positive training set. We scored both training sets with the model and plotted the distribution of scores (scores on x-axis, # of examples on the y-axis; the red distribution is flipped for easier viewing of the two distributions).

We hoped there would be good separation between the positive and negative scores because that would indicate a good model. Unfortunately, the model could barely tell the difference between real sites and negative sites.

## 4 comments:

Nice idea! If possible try to explain a bit more about what the plots represent.

This plot is actually mine, so I'll explain here.

We were attempting to build a model for predicting phosphorylation sites in protein structures. The red distribution represents the negative training set and the blue represents the positive training set. We scored both training sets with the model and plotted the distribution of scores (scores on x-axis, # of examples on the y-axis; the red distribution is flipped for easier viewing of the two distributions).

We hoped there would be good separation between the positive and negative scores because that would indicate a good model. Unfortunately, the model could barely tell the difference between real sites and negative sites.

Hopefully you'll build a better model :)

i remember this project!!! ;)

Post a Comment