Acoustic event classification in surveillance applications typically employs deep learning-based end-to-end learning methods. In real environments, their performance degrades significantly due to noise. While various approaches have been proposed to overcome the noise problem, most of these methodologies rely on supervised learning-based feature representation. Supervised learning system, however, requires a pair of noise free and noisy audio streams. Acquisition of ground truth and noisy acoustic event data requires significant efforts to adequately capture the varieties of noise types for training. This paper proposes a novel supervised learning method for noise robust acoustic event classification in an end-to-end framework named Self Subtraction Network (SSN). SSN extracts noise features from an input audio spectrogram and removes them from the input using LSTMs and an auto-encoder. Our method applied to Urbansound8k dataset with 8 noise types at four different levels demonstrates improved performances compared to the state of the art methods.