Dying ReLUs vs. Sparsity

  • Dying ReLU: having ReLUs that are not activated regardless of inputs.
  • Sparsity in activations: Only few units are turned on given input, but the pattern is changing by inputs.
    • In other words, sparsity is like a ‘sleeping‘ ReLU (or whatever the units are), which should be differentiated with dying relu – it is desired.

 

Advertisements

6 thoughts on “Dying ReLUs vs. Sparsity

    1. Yes, I’m currently using LReLU or ELU. I’m not sure I shouldn’t use ReLU at all though. Do you think so? I think at least LReLU with small alpha would be always better than ReLU, but my experience is limited.

      Like

      1. My experience is also very limited. I think ELU and parametric ReLU might have some advantage at higher learning rates. And not that I experienced too many died ReLUs.. But maybe I’ve seen some of those when checked what the first hidden layer of a neural network has learned. It would be interesting to research that if I had some time.

        Like

      2. Yeah, people mention it but I haven’t real data about it much. Elu (or other sort of advanced ReLU) outperforming would be one of the cues though.

        Like

      3. I still think there’s a use case for ReLU (or its leaky variant). Going by the ELU paper (https://arxiv.org/abs/1511.07289), they mention the following when classifying ImageNet (which needs a pretty deep ConvNet, 15 layers in the paper):

        “Currently ELU nets are 5% slower on ImageNet than ReLU nets. The difference is small because
        activation functions generally have only minor influence on the overall training time (Jia, 2014). In
        terms of wall clock time, ELUs require 12.15h vs. ReLUs with 11.48h for 10k iterations.”

        Though to be fair they add some hope that it could be remedied with faster implementations of the exponential function.

        Anyway, my take is ReLU can still be good for keeping wall time down for very deep ConvNets, with pretty much the same classification accuracy (and possibly better if more data is available, since more data could be trained on in the same amount of hours).

        With that said, I think I’ll default to ELU for most networks.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s