Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.09844
AFN 76.83586
ALL 99.089905
AMD 432.002035
ANG 2.007856
AOA 1035.248441
ARS 1074.344472
AUD 1.638661
AWG 2.008525
AZN 1.893024
BAM 1.952645
BBD 2.249466
BDT 133.1349
BGN 1.952645
BHD 0.419822
BIF 3229.681956
BMD 1.115847
BND 1.439574
BOB 7.698562
BRL 6.154006
BSD 1.1141
BTN 93.116256
BWP 14.727206
BYN 3.646009
BYR 21870.604702
BZD 2.245672
CAD 1.513875
CDF 3203.596944
CHF 0.949519
CLF 0.037544
CLP 1035.955103
CNY 7.868838
CNH 7.863816
COP 4635.206863
CRC 578.066046
CUC 1.115847
CUP 29.56995
CVE 110.087137
CZK 25.069965
DJF 198.389472
DKK 7.458914
DOP 66.871958
DZD 147.446777
EGP 54.143139
ERN 16.737708
ETB 129.282025
FJD 2.455759
FKP 0.849783
GBP 0.838319
GEL 3.04616
GGP 0.849783
GHS 17.514702
GIP 0.849783
GMD 76.439037
GNF 9625.448619
GTQ 8.612086
GYD 233.06345
HKD 8.693621
HNL 27.636349
HRK 7.586657
HTG 147.002495
HUF 393.006904
IDR 16917.359076
ILS 4.220039
IMP 0.849783
INR 93.159124
IQD 1459.442049
IRR 46968.795211
ISK 152.101006
JEP 0.849783
JMD 175.037201
JOD 0.79058
JPY 160.821451
KES 143.711755
KGS 93.997292
KHR 4524.689674
KMF 492.479286
KPW 1004.261828
KRW 1487.446408
KWD 0.340411
KYD 0.9284
KZT 534.147004
LAK 24601.252923
LBP 99767.610207
LKR 339.910822
LRD 222.82
LSL 19.558301
LTL 3.294807
LVL 0.674965
LYD 5.290452
MAD 10.802747
MDL 19.440591
MGA 5038.858955
MKD 61.515612
MMK 3624.22811
MNT 3791.648663
MOP 8.942951
MRU 44.274468
MUR 51.195339
MVR 17.138946
MWK 1931.679078
MXN 21.635702
MYR 4.687244
MZN 71.247233
NAD 19.558301
NGN 1802.662425
NIO 41.003752
NOK 11.702003
NPR 148.98629
NZD 1.789722
OMR 0.429057
PAB 1.1141
PEN 4.175853
PGK 4.360954
PHP 62.080156
PKR 309.55267
PLN 4.269415
PYG 8691.956818
QAR 4.061738
RON 4.989403
RSD 116.898133
RUB 103.401129
RWF 1501.873494
SAR 4.187163
SBD 9.269272
SCR 14.55748
SDG 671.196271
SEK 11.351558
SGD 1.440826
SHP 0.849783
SLE 25.494098
SLL 23398.751675
SOS 636.67136
SRD 33.704207
STD 23095.783712
SVC 9.74825
SYP 2803.599441
SZL 19.565389
THB 36.811555
TJS 11.842866
TMT 3.905465
TND 3.375746
TOP 2.613427
TRY 38.108792
TTD 7.577757
TWD 35.711596
TZS 3041.485868
UAH 46.048502
UGX 4127.331666
USD 1.115847
UYU 46.035622
UZS 14177.094741
VEF 4042215.025119
VES 41.104208
VND 27455.419831
VUV 132.475619
WST 3.121541
XAF 654.898911
XAG 0.035916
XAU 0.000426
XCD 3.015633
XDR 0.825666
XOF 654.898911
XPF 119.331742
YER 279.324446
ZAR 19.421431
ZMK 10043.986022
ZMW 29.495346
ZWL 359.302336
  • NGG

    0.7200

    69.55

    +1.04%

  • CMSD

    0.0100

    25.02

    +0.04%

  • SCS

    -0.3900

    12.92

    -3.02%

  • RBGPF

    58.8300

    58.83

    +100%

  • GSK

    -0.8200

    40.8

    -2.01%

  • AZN

    -0.5200

    78.38

    -0.66%

  • BTI

    -0.1300

    37.44

    -0.35%

  • RYCEF

    0.0200

    6.97

    +0.29%

  • CMSC

    0.0300

    25.15

    +0.12%

  • RIO

    -1.6100

    63.57

    -2.53%

  • JRI

    -0.0800

    13.32

    -0.6%

  • VOD

    -0.0500

    10.01

    -0.5%

  • BCC

    -7.1900

    137.5

    -5.23%

  • BCE

    -0.1500

    35.04

    -0.43%

  • RELX

    -0.1400

    47.99

    -0.29%

  • BP

    -0.1200

    32.64

    -0.37%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)