How can AIs know what we want if we don't even know? (with Geoffrey Irving)

Read the full transcript here.

What does it really mean to align an AI system with human values? What would a powerful AI need to do in order to do "what we want"? How does being an assistant differ from being an agent? Could inter-AI debate work as an alignment strategy, or would it just result in arguments designed to manipulate humans via their cognitive and emotional biases? How can we make sure that all human values are learned by AIs, not just the values of humans in WEIRD societies? Are our current state-of-the-art LLMs politically left-leaning...

Go back

How can AIs know what we want if *we* don't even know? (with Geoffrey Irving)

How can AIs know what we want if we don't even know? (with Geoffrey Irving)