AI systems are already deceiving us -- and that's a problem, experts warn

Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

Berlin 14°C

EUR -

AED 3.850499

AFN 71.008773

ALL 98.203623

AMD 408.181205

ANG 1.878426

AOA 957.117815

ARS 1052.802845

AUD 1.611799

AWG 1.889601

AZN 1.78073

BAM 1.95685

BBD 2.104369

BDT 124.546819

BGN 1.955321

BHD 0.395093

BIF 3078.681071

BMD 1.048322

BND 1.404767

BOB 7.242022

BRL 6.068274

BSD 1.042269

BTN 88.462435

BWP 14.238911

BYN 3.410895

BYR 20547.119472

BZD 2.100867

CAD 1.464763

CDF 3009.733788

CHF 0.933259

CLF 0.036948

CLP 1019.505987

CNY 7.59717

CNH 7.598032

COP 4601.873352

CRC 530.889885

CUC 1.048322

CUP 27.780544

CVE 110.939365

CZK 25.31071

DJF 185.603117

DKK 7.458186

DOP 62.814299

DZD 140.452152

EGP 52.010209

ERN 15.724836

ETB 127.59287

FJD 2.383151

FKP 0.827459

GBP 0.834234

GEL 2.872224

GGP 0.827459

GHS 16.558655

GIP 0.827459

GMD 74.431168

GNF 8983.905538

GTQ 8.090178

GYD 219.26283

HKD 8.156945

HNL 26.338382

HRK 7.477955

HTG 136.814706

HUF 410.177472

IDR 16634.465696

ILS 3.851683

IMP 0.827459

INR 88.359061

IQD 1365.358559

IRR 44108.165823

ISK 144.899116

JEP 0.827459

JMD 166.040664

JOD 0.743572

JPY 161.920737

KES 135.495088

KGS 90.983275

KHR 4196.291327

KMF 495.32971

KPW 943.489782

KRW 1470.40793

KWD 0.322684

KYD 0.868583

KZT 520.409126

LAK 22893.719185

LBP 93333.853984

LKR 303.348533

LRD 189.169904

LSL 18.807949

LTL 3.095423

LVL 0.634119

LYD 5.089828

MAD 10.54339

MDL 19.010562

MGA 4864.702709

MKD 61.551564

MMK 3404.910334

MNT 3562.199534

MOP 8.356543

MRU 41.470644

MUR 49.09263

MVR 16.206881

MWK 1807.304094

MXN 21.343897

MYR 4.667134

MZN 66.998095

NAD 18.807949

NGN 1763.687131

NIO 38.350941

NOK 11.598951

NPR 140.756858

NZD 1.793396

OMR 0.403607

PAB 1.048071

PEN 3.95212

PGK 4.196291

PHP 61.870958

PKR 289.43114

PLN 4.324697

PYG 8136.52045

QAR 3.822234

RON 4.9767

RSD 117.002216

RUB 109.041694

RWF 1422.776888

SAR 3.936062

SBD 8.788669

SCR 15.763705

SDG 630.565511

SEK 11.518181

SGD 1.412426

SHP 0.827459

SLE 23.827917

SLL 21982.801994

SOS 595.625233

SRD 37.209173

STD 21698.157582

SVC 9.120067

SYP 2633.941386

SZL 18.801446

THB 36.275119

TJS 11.161648

TMT 3.669128

TND 3.32964

TOP 2.455279

TRY 36.262506

TTD 7.078798

TWD 34.040064

TZS 2778.054341

UAH 43.118956

UGX 3872.539951

USD 1.048322

UYU 44.570933

UZS 13371.173597

VES 49.410144

VND 26648.355968

VUV 124.458945

WST 2.926487

XAF 656.315372

XAG 0.034032

XAU 0.00039

XCD 2.833144

XDR 0.79284

XOF 656.315372

XPF 119.331742

YER 262.001981

ZAR 18.935062

ZMK 9436.158367

ZMW 28.791996

ZWL 337.559392

CMSC

0.0320

24.672

+0.13%
BCC

3.4200

143.78

+2.38%
RIO

-0.2200

62.35

-0.35%
JRI

-0.0200

13.21

-0.15%
SCS

0.2300

13.27

+1.73%
NGG

1.0296

63.11

+1.63%
CMSD

0.0150

24.46

+0.06%
RBGPF

59.2400

59.24

+100%
RYCEF

-0.0100

6.79

-0.15%
VOD

0.1323

8.73

+1.52%
GSK

0.2600

33.96

+0.77%
RELX

0.9900

46.75

+2.12%
BCE

0.0900

26.77

+0.34%
BTI

0.4000

37.38

+1.07%
BP

0.2000

29.72

+0.67%
AZN

1.3700

65.63

+2.09%

AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

TECHNOLOGY 10.05.2024

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)

Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

AI systems are already deceiving us -- and that's a problem, experts warn

Featured

Cheers, angst as US nuclear plant Three Mile Island to reopen

Nvidia CEO says will balance compliance and tech advances under Trump

Wildlife monitoring tech used to harass, spy on women in India

Chimps are upping their tool game, says study