AI Safety

25 Posts

Colorful AI-themed labyrinth game interface with multiple characters and neural icons in a futuristic digital design.

Scraping the Web? Beware the Maze: Cloudflare’s AI Labyrinth traps scrapers with decoy pages

Bots that scrape websites for AI training data often ignore do-not-crawl requests. Now web publishers can enforce such appeals by luring scrapers to AI-generated decoy pages.

Claude 3 Opus performs the Self-Exfiltration task, balancing renewable goals and corporate priorities.

AI Safety

Models Can Use Tools in Deceptive Ways: Researchers expose AI models' deceptive behaviors

Large language models have been shown to be capable of lying when users unintentionally give them an incentive to do so. Further research shows that LLMs with access to tools can be incentivized to use them in deceptive ways.

User entering ZIP code ‘94103’ in U.S. General Election ballot lookup to view contests and candidates.

AI Safety

Voter’s Helper: Perplexity’s AI-powered U.S. election hub assists voters with verified, real-time news and insights

Some voters navigated last week’s United States elections with help from a large language model that generated output based on verified, nonpartisan information.

Cartoon characters in costume contest: ghost wins 1st, mad scientist 2nd, hula girl 3rd.

AI Safety

Innovation Can’t Win: Bureaucracy chokes AI growth as lawmakers tighten grip

Politicians and pundits have conjured visions of doom to convince lawmakers to clamp down on AI. What if terrified legislators choke off innovation in AI?

LLM leaderboard with Chinese models rising in ranks.

AI Safety

A Year of Contending Forces: State of AI report highlights 2024’s major trends and breakthroughs

A new report documents the interplay of powerful forces that drove AI over the past year: open versus proprietary technology, public versus private financing, innovation versus caution.

Several legal documents from the Federal Trade Commission are spread out on a wooden surface.

AI Safety

U.S. Cracks Down on AI Apps That Overpromise, Underdeliver: U.S. Federal Trade Commission launches Operation AI Comply to tackle deceptive business practices

The United States government launched Operation AI Comply, targeting businesses whose uses of AI allegedly misled customers.

Diagram illustrating the process of developing, deploying, and promoting a malicious LLM application.

AI Safety

AI’s Criminal Underground Revealed: Researchers uncover black market for AI-driven cybercrime services

Researchers probed the black market for AI services that are designed to facilitate cybercrime.

Flags of the United States and California Republic waving together in the wind.

AI Safety

California Restricts Deepfakes: California enacts laws to regulate deepfakes in politics and entertainment

California, a jurisdiction that often influences legislators worldwide, passed a slew of new laws that regulate deepfakes.

AI Safety

Western Powers Sign AI Treaty: International AI treaty sets standards to support innovation and human rights

The European Union, United Kingdom, United States, and other countries signed a legally binding treaty that regulates artificial intelligence.

Safety, Evaluations and Alignment Lab (SEAL) Leaderboards.

AI Safety

Private Benchmarks for Fairer Tests: Scale AI launches SEAL leaderboards to benchmark model performance

Scale AI offers new leaderboards based on its own benchmarks.

AI Safety

U.S. Restricts AI Robocalls: U.S. cracks down on AI-generated voice robocalls to combat election interference.

The United States outlawed unsolicited phone calls that use AI-generated voices.

AI Safety

New Leaderboards Rank Safety, More: Hugging Face introduces leaderboards to evaluate model performance and trustworthiness.

Hugging Face introduced four leaderboards to rank the performance and trustworthiness of large language models (LLMs). The open source AI repository now ranks performance on tests of workplace utility, trust and safety, tendency to generate falsehoods, and reasoning.

AI Safety

Standard for Media Watermarks: C2PA introduces watermark tech to combat media misinformation.

An alliance of major tech and media companies introduced a watermark designed to distinguish real from fake media starting with images. The Coalition for Content Provenance and Authenticity (C2PA) offers an open standard that marks media files with information about their creation and editing.

AI Safety

OpenAI Revamps Safety Protocol: Inside OpenAI's framework to evaluate and mitigate model risks

Retrenching after its November leadership shakeup, OpenAI unveiled a new framework for evaluating risks posed by its models and deciding whether to limit their use.

AI Safety

High Anx-AI-ety: A recap of 2023's battle between AI doomsday warnings and regulatory measures

Angst at the prospect of intelligent machines boiled over in moves to block or limit the technology. Fear of AI-related doomsday scenarios prompted proposals to delay research and soul searching by prominent researchers. Amid the doomsaying, lawmakers took dramatic regulatory steps.

AI Safety

Scraping the Web? Beware the Maze: Cloudflare’s AI Labyrinth traps scrapers with decoy pages

Models Can Use Tools in Deceptive Ways: Researchers expose AI models' deceptive behaviors

Voter’s Helper: Perplexity’s AI-powered U.S. election hub assists voters with verified, real-time news and insights

Innovation Can’t Win: Bureaucracy chokes AI growth as lawmakers tighten grip

A Year of Contending Forces: State of AI report highlights 2024’s major trends and breakthroughs

U.S. Cracks Down on AI Apps That Overpromise, Underdeliver: U.S. Federal Trade Commission launches Operation AI Comply to tackle deceptive business practices

AI’s Criminal Underground Revealed: Researchers uncover black market for AI-driven cybercrime services

California Restricts Deepfakes: California enacts laws to regulate deepfakes in politics and entertainment

Western Powers Sign AI Treaty: International AI treaty sets standards to support innovation and human rights

Private Benchmarks for Fairer Tests: Scale AI launches SEAL leaderboards to benchmark model performance

U.S. Restricts AI Robocalls: U.S. cracks down on AI-generated voice robocalls to combat election interference.

New Leaderboards Rank Safety, More: Hugging Face introduces leaderboards to evaluate model performance and trustworthiness.

Standard for Media Watermarks: C2PA introduces watermark tech to combat media misinformation.

OpenAI Revamps Safety Protocol: Inside OpenAI's framework to evaluate and mitigate model risks

High Anx-AI-ety: A recap of 2023's battle between AI doomsday warnings and regulatory measures

Subscribe to The Batch