Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 3.850499
AFN 71.008773
ALL 98.203623
AMD 408.181205
ANG 1.878426
AOA 957.117815
ARS 1052.802845
AUD 1.611799
AWG 1.889601
AZN 1.78073
BAM 1.95685
BBD 2.104369
BDT 124.546819
BGN 1.955321
BHD 0.395093
BIF 3078.681071
BMD 1.048322
BND 1.404767
BOB 7.242022
BRL 6.068274
BSD 1.042269
BTN 88.462435
BWP 14.238911
BYN 3.410895
BYR 20547.119472
BZD 2.100867
CAD 1.464763
CDF 3009.733788
CHF 0.933259
CLF 0.036948
CLP 1019.505987
CNY 7.59717
CNH 7.598032
COP 4601.873352
CRC 530.889885
CUC 1.048322
CUP 27.780544
CVE 110.939365
CZK 25.31071
DJF 185.603117
DKK 7.458186
DOP 62.814299
DZD 140.452152
EGP 52.010209
ERN 15.724836
ETB 127.59287
FJD 2.383151
FKP 0.827459
GBP 0.834234
GEL 2.872224
GGP 0.827459
GHS 16.558655
GIP 0.827459
GMD 74.431168
GNF 8983.905538
GTQ 8.090178
GYD 219.26283
HKD 8.156945
HNL 26.338382
HRK 7.477955
HTG 136.814706
HUF 410.177472
IDR 16634.465696
ILS 3.851683
IMP 0.827459
INR 88.359061
IQD 1365.358559
IRR 44108.165823
ISK 144.899116
JEP 0.827459
JMD 166.040664
JOD 0.743572
JPY 161.920737
KES 135.495088
KGS 90.983275
KHR 4196.291327
KMF 495.32971
KPW 943.489782
KRW 1470.40793
KWD 0.322684
KYD 0.868583
KZT 520.409126
LAK 22893.719185
LBP 93333.853984
LKR 303.348533
LRD 189.169904
LSL 18.807949
LTL 3.095423
LVL 0.634119
LYD 5.089828
MAD 10.54339
MDL 19.010562
MGA 4864.702709
MKD 61.551564
MMK 3404.910334
MNT 3562.199534
MOP 8.356543
MRU 41.470644
MUR 49.09263
MVR 16.206881
MWK 1807.304094
MXN 21.343897
MYR 4.667134
MZN 66.998095
NAD 18.807949
NGN 1763.687131
NIO 38.350941
NOK 11.598951
NPR 140.756858
NZD 1.793396
OMR 0.403607
PAB 1.048071
PEN 3.95212
PGK 4.196291
PHP 61.870958
PKR 289.43114
PLN 4.324697
PYG 8136.52045
QAR 3.822234
RON 4.9767
RSD 117.002216
RUB 109.041694
RWF 1422.776888
SAR 3.936062
SBD 8.788669
SCR 15.763705
SDG 630.565511
SEK 11.518181
SGD 1.412426
SHP 0.827459
SLE 23.827917
SLL 21982.801994
SOS 595.625233
SRD 37.209173
STD 21698.157582
SVC 9.120067
SYP 2633.941386
SZL 18.801446
THB 36.275119
TJS 11.161648
TMT 3.669128
TND 3.32964
TOP 2.455279
TRY 36.262506
TTD 7.078798
TWD 34.040064
TZS 2778.054341
UAH 43.118956
UGX 3872.539951
USD 1.048322
UYU 44.570933
UZS 13371.173597
VES 49.410144
VND 26648.355968
VUV 124.458945
WST 2.926487
XAF 656.315372
XAG 0.034032
XAU 0.00039
XCD 2.833144
XDR 0.79284
XOF 656.315372
XPF 119.331742
YER 262.001981
ZAR 18.935062
ZMK 9436.158367
ZMW 28.791996
ZWL 337.559392
  • CMSC

    0.0320

    24.672

    +0.13%

  • BCC

    3.4200

    143.78

    +2.38%

  • RIO

    -0.2200

    62.35

    -0.35%

  • JRI

    -0.0200

    13.21

    -0.15%

  • SCS

    0.2300

    13.27

    +1.73%

  • NGG

    1.0296

    63.11

    +1.63%

  • CMSD

    0.0150

    24.46

    +0.06%

  • RBGPF

    59.2400

    59.24

    +100%

  • RYCEF

    -0.0100

    6.79

    -0.15%

  • VOD

    0.1323

    8.73

    +1.52%

  • GSK

    0.2600

    33.96

    +0.77%

  • RELX

    0.9900

    46.75

    +2.12%

  • BCE

    0.0900

    26.77

    +0.34%

  • BTI

    0.4000

    37.38

    +1.07%

  • BP

    0.2000

    29.72

    +0.67%

  • AZN

    1.3700

    65.63

    +2.09%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)