Roko’s Basilisk:

9 min readApr 22, 2021

A Terrifying Future Possibility or Unnecessary Paranoia

Artificial Intelligence (AI) has reached a point in which they can defeat the best chess players and can immediately calculate equations unheard of by mathematicians. However, while it can recognize objects and use its visual data, it’s still following a formula, meaning that in its current state it still lacks the imaginative power to create new ideas. It isn’t unheard of to guess that at some point in the near future, AI will have created an explosion of intelligence far surpassing the human race, making humanity’s last achievement its creation. This idea of reaching an AI singularity is as intriguing as it is theoretical. Its intrigue has inspired countless theories on AI overpassing the human race and causing distruction and genocide. These theories of mass destruction may seem improbable, but nevertheless are examples that have inspired thought experiments and future possibilities alike. One of these inspired examples is Roko’s Basilisk, part philosophical thought experiment and part informational hazard. Some examples of informational hazards could be the spoiling of a movie, or a patient’s anonymity, but in this case it’s defined as any risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm. Deriving its name from serpents who could cause death with a single glance, the theory of Roko’s Basilisk is that a future AI would have an incentive to torture or kill anyone who has thought of it’s existance. While this theory may seem to be largely theoretical, to fully draw a conclusion we must understand it’s context and background, it’s risks and Newcomb-like problems, and it’s opposing arguments. This will let us understand to what extent could Roko’s Basilisk exist, and whether or not humanity should continue to fear it’s existence.

To understand the nature of Roko’s Basilisk one must first know it’s background. In 2010, on a public philosophical forum by the name of LessWrong, a user by the name of Roko responded with a theory as a follow up to a post on the burden of Altruism (the principle and moral practice of concern for happiness). The post’s core claim is that by reaching singularity we will have created an ultimate supercomputer capable of free thought who may punish those who didn’t help create it sooner. It’s main motivation for doing this is that the most effective incentive to get people to donate towards its cause is by pre-committing itself to punishing those who knew about its existence and didn’t help it. The way Roko’s Basilisk intends on punishing those who do not help it’s cause is neither conventional or have any need of interaction. On the forums of LessWrong, the idea that the human mind is entirely made up of physical informational patterns that can be recreated and copied are held in commonplace. Since no instance is distinguishable between the original person and copy, both are considered to be them. Furthermore, LessWrong believes that you should behave and feel concern about this copy, even being able to feel another instance of you. This idea derives from materialism (a form of philosophical monism), and means that the AI described in Roko’s Basilisk could torture a simulated version of a person in the future had they had known about it in the present and refuse to help it. As the original post explains, a superintelligent agent wouldn’t be able to determine if someone would help it or not if they have never heard or considered its existence, while those who are aware of it will be guilty unless they help its cause:

“So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half). You could take this possibility into account and give even more to x-risk in an effort to avoid being punished.”

For this reason it is widely considered as an informational hazard and a basilisk, since merely hearing the argument makes you guilty for knowing of a future AI and not having helped it. By fearing the basilisk to the point of creating it, you have created a self fulfilling prophecy. Fearing something that doesn’t exist, leads it to existing, therefore justifying the fear of it. Roko originally states that a person at SIAI (The Singularity of Artificial Intelligence) was severely worried about this to the point of having terrible nightmares, using this as an argument that “The fact it worked on at least one person means it would be a tempting policy to adopt.” His theory, while largely met with criticism, used many pre existing theories accepted by those on LessWrong forums. Eliezer Yudkowsky, a American artificial intelligence researcher and creator of the LessWrong forums, reacted by yelling at Roko, telling him that the only thing that gives super intelligences reason to blackmail someone is to think in sufficient detail about it, and further deleting the post for being an info hazard:

You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.

The deleting of the post and Yudkowsky’s reaction was what would spark it into cult status. Spread across the internet and being recalled whenever the LessWrong forum was in conversation, it would ironically become more popular than if it was just left alone.

While an intriguing theory on it’s own, Roko’s Basilisk derives its evidence from tying together Newcomb-like problems in decision theory and normative uncertainty to create a thought experiment that relies on the predictive nature of a super intelligent AI. These theories lay the foundation for Roko’s Basilisk and explain why it has genuinely frightened so many people. A well known example of a newcomb problem is the prisoner dilemma, a two player game in which both players are able to pick whether to mutually corporate with each other or sabotage the other. A more complex example would be the Newcomb’s paradox. The paradox goes as such; a superintelligent alien presents two boxes, a box marked A and a box marked B. Box A consists of a thousand dollars, while box B either has a million dollars or nothing. In this scenario the contestant can either take the box B or both Box A and B. However there is an additional layer to this paradox, the super intelligent alien predicted the choice that will be made and is correct 100% of the time. If the alien has predicted that only box B will be taken, then Box B will contain a million dollars; If it has predicted that both boxes will be taken however, then box B will contain nothing. This seems to have a simple solution. If the alien is correct 100% of the time, then by taking box B it will always have a million dollars. However, what if the alien wasn’t correct 100% of the time, what if it was only correct 90% of the time? This means by taking box B there will be a 10% chance that this box will not contain a million dollars and the contestant wins nothing. This added probability means that at some point, a line will be drawn where the outcome is directly affected by it’s prediction. Much like the super intelligence alien in Newcomb’s paradox, the AI in Roko’s Basilisk predetermined prediction is what people are judged upon, not their choice of helping it or not. This illusion of choice means that it doesn’t matter the choice, as someone who is aware of Roko’s Basilisk has been blackmailed by a future punishment, whether or not Roko’s Basilisk exists. Meaning as long as an individual is made known to Roko’s Basilisk, there is little agency over the outcome of their future.

Adam Riggo, an writer and activist with a PhD in philosophy, in his paper titled “The Violence of Pure Reason,” describes Roko’s Basilisk as “a triumph of paranoia at an intensity and absurdity rarely seen outside the works of Philip K. Dick”. While theoretical, Roko’s Basilisk still suffers from flaws. Commonly referred to as a modern Pascal Wager its argument has similar weaknesses. Pascal’s Wager is an argument for God’s existence explaining that if God does exist and you don’t believe in him, the punishment outweighs simply believing in God. Therefore relating back to Roko’s Basilisk, by following this argument one should always try to help Roko’s Basilisk to avoid punishment from it. Greg Egan, an Australian fiction writer, showed the fallacies in this theory when stating:

You know what they say the modern version of Pascal’s Wager is? Sucking up to as many Transhumanists as possible, just in case one of them turns into God.

The main refutation to Pascal’s Wager is the “many Gods” argument, stating that Pascal’s Wager focuses on a God that rewards belief alone, ignoring other possible Gods that may not condone such belief systems. In turn, there is no evidence or reason that a AI would inherently attempt to use its limited resources to torture simulated versions of those who have already died. The most apparent mistake found in Roko’s Basilisk is that in it’s theory, it is held to the standard of human beliefs. While LessWrong and it’s community have no problem theorizing that AI could reach perfect knowledge, they ignore that a future superintelligence could have similarly advanced morality. Humanities powers of reasoning are too grounded in earthly experiences to be able to comprehend the inner thinking of a super intelligent AI. Another fallacy that is avoided is how ironically it only applies to those who have already subscribed to the ideas found on forums such as LessWrong. A person not familiar with its beliefs wouldn’t be able to associate a simulated version of itself with themself as much as someone who believes they would be one and the same. Meaning if you don’t believe in these pre-existing ideas, the AI in question shouldn’t have any power over you. While a fun thought experiment, Roko’s Basilisk assumes too many conditions as true to be able to work, and unfortunately doesn’t seem to hold enough rationality to be considered plausible.

The journey into exploring Roko’s Basilisk can be separated into 3 parts: genuine intrigue, fear and speculation, and lastly dissatisfaction and disappointment. While at first glance its arguments seem coherent, however with jargon out of the way it becomes apparent how improbable it’s conditions naturally are. However, if thought experiments are made to design discussion and debate, Roko’s Basilisk has certainly done so. Overblown reaction combined with the interesting nature of discussing censored topics (commonly coined as the Streisand effect ), Roko’s Basilisk has stood the test of time though message boards and fascination. Even while harmless to an average reader, Roko’s Basilisk was still able to prove that decision theory could be used to create informational hazards, and that mere knowledge of a topic can potentially be dangerous. While this thought experiment ,in particular, doesn’t seem too malignant, its hypothesis is compelling and captivating enough to continue to trick those into reading it, only to find the only punishment being to have invested so much time into it in the first place.

Works Cited

“Information Hazards.” LessWrong, 16 Apr. 2020, www.lesswrong.com/tag/information-hazards.

Rational Wiki. “Roko’s Basilisk.” RationalWiki, 9 Nov. 2012, rationalwiki.org/wiki/Roko’s_basilisk.

Riggio, Adam. “The Violence of Pure Reason: Neoreaction: A Basilisk.” Social Epistemology Review and Reply Collective, 23 Sept. 2016, social-epistemology.com/2016/09/23/the-violence-of-pure-reason-neoreaction-a-basilisk-adam-riggio/.

Roko. “Solutions to the Altruist’s Burden: the Quantum Billionaire Trick.” Less Wrong, 23 July 2010, basilisk.neocities.org/.

“Roko’s Basilisk.” LessWrong, 5 Oct. 2015, www.lesswrong.com/tag/rokos-basilisk.

Schulze-Makuch, Dirk. “Reaching the Singularity May Be Humanity’s Greatest and Last Accomplishment.” Air & Space Magazine, Air & Space Magazine, 27 Mar. 2020, www.airspacemag.com/daily-planet/reaching-singularity-may-be-humanitys-greatest-and-last-accomplishment-180974528/.

Wendigoon. “Roko’s Basilisk: A Deeper Dive (WARNING: Infohazard).” YouTube, YouTube, 15 Dec. 2020, www.youtube.com/watch?v=8xQfw40z8wM&t=740s.

Yudkowsky, Eliezer. “Newcomb’s Problem and Regret of Rationality.” LessWrong, 31 Jan. 2008, www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality.

Roko’s Basilisk:

Written by Aaron Coschizza

No responses yet