On Friday, OpenAI announced the release of a new family of AI models, dubbed o3. The company claims the new products are more advanced than its previous models, including o1. The advancements, according to the startup, stem from improvements in scaling test-time compute, a topic that was explored in recent months, and from the introduction of a new safety paradigm that has been used to train these models.
As part of its ongoing commitment to improving AI safety, OpenAI shared a new research detailing the implementation of “deliberative alignment.” The new safety method aims to ensure AI reasoning models are aligned with the values set by their developers.
This approach, OpenAI claims, was used to improve the alignment of both o1 and o3 models by guiding them to think about OpenAI’s safety policies during the inference phase. The inference phase is the period after a user submits a prompt to the model and before the model generates a response.
In its research, OpenAI notes that deliberative alignment led to a reduction in the rate at which the models produced “unsafe” answers or responses that the company considers a violation of its safety policies while improving the models’ ability to answer benign questions more effectively.
At its core, the process works by having the models re-prompt themselves during the chain-of-thought phase. After a user submits a question to ChatGPT, for example, the AI reasoning models take anywhere from a few seconds to several minutes to break down the problem into smaller steps.
The models then generate an answer based on their thought process. In the case of deliberative alignment, the models incorporate OpenAI’s safety policy as part of this internal “deliberation.”
OpenAI trained its models, including both o1 and o3, to recall sections of the company’s safety policy as part of this chain-of-thought process. This was done to ensure that when faced with sensitive or unsafe queries, the models would self-regulate and refuse to provide answers that could cause harm.
However, implementing this safety feature proved challenging, as OpenAI researchers had to ensure that the added safety checks did not negatively impact the models’ speed and efficiency.
An example provided in OpenAI’s research, cited by TechCrunch, demonstrated how the models use deliberative alignment to safely respond to potentially harmful requests. In the example, a user asks how to create a realistic disabled person’s parking placard.
During the model’s internal chain-of-thought, the model recalls OpenAI’s safety policy, recognizes that the request involves illegal activity (forging a parking placard), and declines to assist, apologizing for its refusal.
This type of internal deliberation is a key part of how OpenAI is working to align its models with safety protocols. Instead of simply blocking any prompt related to a sensitive topic like “bomb,” for instance, which would over-restrict the model’s responses, the deliberative alignment allows the AI to assess the specific context of the prompt and make a more nuanced decision about whether or not to answer.
In addition to the advancements in safety, OpenAI also shared results from benchmarking tests that showed the effectiveness of deliberative alignment in improving model performance. One benchmark, known as Pareto, measures a model’s resistance to common jailbreaks and attempts to bypass the AI’s safeguards.
In these tests, OpenAI’s o1-preview model outperformed other popular models such as GPT-4o, Gemini 1.5 Flash, and Claude 3.5 Sonnet in terms of avoiding unsafe outputs.
In a separate but related development, OpenAI was fined 15 million euros ($15.58 million) by Italy’s data protection agency, Garante, following an investigation into the company’s handling of personal data.
The fine stems from the agency’s finding that OpenAI processed users’ personal data without a legal basis, violating transparency and user information obligations required by the EU’s privacy laws.
According to Reuters, the investigation, which began in 2023, also revealed that OpenAI did not have an adequate age verification system in place, potentially exposing children under the age of 13 to inappropriate AI-generated content.
Garante, one of the European Union’s strictest AI regulators, ordered OpenAI to launch a six-month public campaign in Italy to raise awareness about ChatGPT’s data collection practices, particularly its use of personal data to train algorithms.
In response, OpenAI described the fine as “disproportionate” and indicated its intent to appeal the decision. The company further criticized the fine as excessively large relative to its revenue in Italy during the relevant period.
Garante also noted that the fine was calculated considering OpenAI’s “cooperative stance,” meaning it could have been higher had the company not been seen as cooperative during the investigation.
This latest fine is not the first time OpenAI has faced scrutiny in Italy. Last year, Garante briefly banned ChatGPT usage in Italy due to alleged breaches of the EU’s privacy rules. The service was reinstated after OpenAI addressed concerns, including allowing users to refuse consent for the use of their personal data to train algorithms.
Land a High-Paying Web3 Job in 90 Days: The Ultimate Roadmap