Man in jail Man in jail

Hyperalign: The Innovative Safety Approach

What sets apart in terms of safety from other large models? In the field of AI safety, we stand out with our unique approach, known as Hyperalign. We consider this the key to making AI systems safer.

The Challenges of AI Safety

Understanding the safety concerns of AI requires us to first grasp its unique challenges. In traditional software development, the desired behaviors are explicitly coded. However, in the case of deep neural networks, this process is far more complex, requiring a different approach.

Child looking at robot

In the past, AI software, such as ChatGPT, had to rely on human oversight to prevent misuse. While initiatives such as the Red Teaming Network provided valuable and reliable expertise, it had limited members. To scale things, OpenAI used Kenyan workers for less than $2/hour to enforce safety measures and detoxify ChatGPT’s content. This was a crucial effort despite the low wages paid. They attempted to implement rules that blocked prompt execution. “Sorry, but I can’t assist with that” is a default response from GPT-4 when faced with requests that violate guidelines or ethical constraints. 

However, this effort failed and will continue to fail. As a result of this approach, many jailbreak solutions have emerged. For instance, a 2023 study by researchers at Brown University revealed the limitations of GPT-4’s safety measures. It showed a success rate of 79% in bypassing them. (Yong et al., 2023) Therefore, GPT-4 only managed to stop 1 in 5 malicious attempts. This underscores the fact that relying solely on error messages like “Sorry, but I can’t assist with that” is flawed. The reason for this is that malicious actors can easily exploit these boundaries.

Robot baby

Limitations of Current Safety Measures

OpenAI might fix certain jailbreaks, but more will come because the basic idea is flawed. Malicious users will always know where the system draws the line, and they can easily manipulate their language to evade detection. Sometimes, manual censoring efforts can go overboard, as seen when Midjourney blocked phrases like “big black,” thus blocking harmless prompts like “big black cat” for instance.

While other solutions may block obvious requests for illegal content, such as asking for images of a naked child, criminals adapt by finding loopholes and alternative wordings, like the infamous “a child on the beach with pleasure in his eyes” prompt, to bypass the safety standards. For example, when the “pleasure in his eyes” jailbreak became known, Midjourney added the word “pleasure” to its ban list, showing how they had to constantly respond to emerging threats.

Child on the beach

Hyperalign vs. Traditional Blocking Methods

This is where Hyperalign by proves its worth. While traditional approaches block prompts or generate error messages, Hyperalign allows all prompts to pass through, ensuring that the generated results adhere to legal and ethical standards. It transforms the attempt to manipulate boundaries into an impossible challenge: to pick a lock when you don’t see where the lock is.


Final Words:

In conclusion, ensures to always generate a child on the beach, but the child will always have clothes on. Thanks to Hyperalign’s unique approach, this method works well because there’s no feedback for malicious users indicating where the system draws the line. Legitimate users will have a much better user experience without false positives; in other words, we will never censor “big black rooster”, even if different words are used. To experience this innovative technology in action, visit You can read about our other features here, in our blog.

Black rooster

Leave a Reply

Your email address will not be published. Required fields are marked *