More on Adversary Resistant Neural Networks
Trying to make a likely-to-succeed experiment
I previously babbled about possible antidotes to the insidious adversarial attacks on neural networks. In the interests of making something which is likely to work I’ve been having shower thoughts about how to detune my construction. Instead of increasing the number of bits an attacker needs to control from 2 to some factor of N the goal is to increase it to 10 or so. That’s still a meaningful improvement and more likely to succeed. It may also result in big practical gains because although it isn’t all that much more resistant to true adversaries it may ‘hallucinate’ a lot less against accidentally adversarial data which occurs just by chance.
Trivially we can make progress towards the goal by taking all the inputs, putting them into 3 different buckets, segregating everything into 3 different smaller neural networks which lead to 3 different outputs at the end and then making the one true output be the sum of those 3. This is straightforward to implement by designating a third of each layer to each of the buckets and zeroing out all connections between buckets, then on the final layer for the designated output value set the weight of the connection from one node in each bucket to 1 and the rest to zero.
Obviously this increases adversary resistance but does so at substantial cost to accuracy. What we’d like to do is get both accuracy and adversary resistance by making it so that all inputs go into the next to last layer outputs but they do so via such wildly different paths that manipulating the original inputs doesn’t cause them all to change in tandem. Hopefully that results in a neural network which can be trained as normal and automatically has adversary resistance without a big hit in accuracy. I will now give a minimalist construction which has that structure.
Group the input values into six buckets. After that there will be phase 1, which is grouped into 15 different mini neural networks corresponding to the 15 different ways of picking exactly two of the input buckets to process. Next is a second phase which is also subdivided into 15 different mini neural networks. They correspond to the 15 different ways of taking three different things from the first phase such that each of the original buckets gets included exactly once, and those are taken as the inputs in the phase transition.
Concretely lets say that the input buckets are numbered 1 through 6. The first phase groups are A: 12, B: 13, C: 14, D: 15, E: 16, F: 23, G: 24, H: 25, I: 26, J: 34, K: 35, L: 36, M: 45, N: 46, O: 56. The second phase groupings are: AJO, AKN, ALM, BGO, BHN, BIM, CFO, CHL, CIK, DFN, DGL, DIJ, EFM, EGK, EHJ. This set of groupings has many beautiful mathematical properties, including that each of the original buckets goes into each of the final outputs exactly once, each bucket is paired with each other bucket in the first phase exactly once, and each first phase group is paired with every other first phase group exactly once in the second phase.
Finally the last layer should take an input weight one from exactly one node in each of the phase 2 groups and all its other input connections are weight zero.
In order to keep this from simply repeating all the inputs at the end of the first phase and then doing all the computation in the second phase and hence not having any adversary resistance advantage any more it’s probably necessary to pinch things a bit, either by zeroing out a significant fraction of the values in the layer before the phase transition or limiting the depth of the second phase or both.