OpenAI, known for its breakthroughs in the field of artificial intelligence, is actively developing tools and frameworks for controlling AI systems with superhuman intelligence. They believe that this form of advanced intelligence might come sooner than expected, raising unique challenges both technically and ethically.
Developing Control for Superintelligent AI
OpenAI’s initiative to control superintelligent AI is called “superalignment.” This endeavor focuses on developing technical means to control superintelligent AI systems and “align” them with human objectives. OpenAI has allocated 20% of its computing capacity to this project, hoping to have solutions by 2027.
Potential and Risks of Superintelligence
OpenAI acknowledges that superintelligence could be the most impactful technology ever invented by humanity, with the potential to help solve many of the world’s most significant problems. However, they also warn about the inherent dangers of such power, which could lead to the disempowerment of humanity or even human extinction. Currently, there is no solution to steer or control a potentially superintelligent AI and prevent it from deviating from desired objectives. Existing AI alignment techniques, such as reinforcement learning from human feedback, rely on humans’ ability to oversee AI. Yet, this oversight may not be reliable for AI systems much more intelligent than humans, requiring new scientific and technical advancements.
Approach and Methodology
OpenAI’s goal is to build an approximately human-level automated alignment researcher. To achieve this, they need to develop a scalable training method, validate the resulting model, and perform stress tests throughout the alignment process. This includes providing training signals for tasks difficult for humans to assess, validating the alignment of their systems, and testing the entire alignment chain by purposefully training misaligned models and confirming that their techniques detect the worst types of misalignments.
Initiatives and Collaborations
OpenAI is forming a team of researchers and machine learning engineers to work on this problem. In addition, they plan to share the results of this effort widely and consider contributing to the alignment and safety of models not developed by OpenAI as an important part of their work. This new team will work in conjunction with other OpenAI teams, focusing on improving the safety of current models like ChatGPT and understanding and mitigating other AI risks such as misuse, economic disruption, misinformation, bias and discrimination, addiction, and overreliance.
Call for Collaboration
OpenAI calls on machine learning experts, even those who have not worked on alignment before, to join this effort. They consider superintelligence alignment as a tractable machine learning problem and believe that the contribution of experts could be highly significant.
In summary, OpenAI is taking proactive and significant steps to address the challenges and risks associated with the potential arrival of AI systems with superhuman intelligence. This multidisciplinary effort involves not only technical advances but also ethical considerations and governance, highlighting the importance of global collaboration in this emerging and critical field.