If I may believe my slides that explained the topic somewhat, the backpropagation algorithm also works with the treshold function, but instead of using the derative of the scaling function, 1 is used (like the term for the derative is ommitted).   If i could find a good sample implementation that's well documented, I would at least be certain that I'm implementing it right and have a bug somewhere, but right now i'm not even sure that i'm using the correct formula's :/     Edit: Well i'm finally getting some promising results , i made a quick character recognition demo, reverted back to the sigmoid function, fixed the formula (* instead of + >.>), made sure that the weights are random between [-1,1] rather than [0,1], and some optimizations here and there, and it seems to be working