Solving adversarial attacks in computer…

Sep 5, 2024

My arguments for solving adversarial attacks in computer vision as a baby version of general AI alignment. The shape of the problem is very similar, and we have to solve it anyway, so let's get going!

Read →

3 Comments

Martin Jurča

Sep 30

Thank you for the research on the topic, quite interesting!

I would like to point out however that adversarial attacks on human vision do exist - the most obvious example would be optical illusions. There are also more subtle form, where we see things that aren't there simply because we expect them to be there, or it could be our imagination playing out or fears on a canvas of shadows in the dark, etc.

Expand full comment

Reply (1)

Stanislav Fort

Sep 30

> "adversarial attacks on human vision do exist .... optical illusions"

I think that optical illusions are quite a different thing. An adversarial example is typically understood as an image X' that is very close to an original image X while being classified as something completely different. That is quite something else than an optical illusion that is "misunderstood" as showing Y but under deeper analysis we see that it is actually Z. There is no notion of a small change in the image inducing a large change in the way it is perceived. An optical illusion is a standalone image that is confusing to being with. I think this is something else entirely, but for sure well worth studying and understanding as well.

Expand full comment

Reply (1)

Martin Jurča

Sep 30

I see, thanks for pointing that out.

I do think that seeing things in the shadows might still apply here, though, as in such a case our brain is hallucinating symbols out of background noise, even though the noise might not be intentionally crafted.

This does make me wonder whether we couldn't learn more by studying how to these CNN systems process pure noise, and maybe work on steering them to ignore the noise in the input, but I'm purely speculating here.

Expand full comment

✨ Pokrok v AI ✨

Solving adversarial attacks in computer…