AlphaGo設(shè)定贏棋可能性最大化 未追求勝出數(shù)目

來源:網(wǎng)絡(luò) 時間:2017-05-24 17:14:29

AlphaGo設(shè)定贏棋可能性最大化 未追求勝出數(shù)目

  5月23日,當(dāng)今世界圍棋第一人柯潔九,23日下午在這里執(zhí)黑289手以四分之一子的微弱劣勢負(fù)于計算機圍棋程序"阿爾法圍棋",在圍棋"人機大戰(zhàn)"三番棋中以0:1落后。

高清:柯潔對陣AlphaGo 眉頭緊鎖思考戰(zhàn)術(shù)

  AlphaGo團隊在賽后接受媒體采訪,對于結(jié)果,AlphaGo團隊表示系統(tǒng)設(shè)定為贏棋可能性最大化,面臨決策時會選擇穩(wěn)妥的路線。

  Q: 這次比賽柯潔小負(fù)AlphaGo,有一種比較有腦洞的說法是AlphaGo已經(jīng)不滿足于僅僅獲勝了,而是希望能具體地控制輸贏的差距。請問AlphaGo真的達到這樣的程度了嗎?如果沒有的話,還有多久才能做到?

  Demis Hassabis: So AlphaGo always tries to maximize its probability of winning rather than to maximize the size of the winning margin. So whenever we see it has a decision to make, it will always try to pick the more certain path… that it thinks is a more certain path to victory with less risk. So often in positions that’s what we see the tradeoff that AlphaGo is making is to decide about how certain it is about the margin of victory and how likely the probability of victory. David, if you want to add anything to that.

  AlphaGo總是盡量將贏棋的可能性最大化而不是將贏的目數(shù)最大化。我們看到它每次面臨決策的時候,總是會選擇它自己認(rèn)為更穩(wěn)妥、風(fēng)險更小的路線。在它的落子中我們能看到AlphaGo在判斷贏得的目數(shù)有多穩(wěn)妥和勝出的可能性時所做出的權(quán)衡。

  David Silver: So…it’s a very interesting question. The way AlphaGo works is as Demis said, it maximizes the probability of winning the game. This means that we program into AlphaGo a goal. That goal is in match what we really want it to do, which is to try and win games of Go. You could imagine other objectives being applied, such as maximizing the gap, the margin of victory, but this is not the objective that we chose for AlphaGo to play in the game of Go. So if you really focus on victory, then it leads to these behaviors where AlphaGo will try to win, and in doing so, it may give up a number of points in favor of actually just reducing any risks it may perceives, even if that risk seems to be very small.

  很有趣的問題。AlphaGo的決策過程就像是Demis所說的那樣,它最大化贏棋的可能性。意思就是我們給AlphaGo植入了一個目標(biāo),這個目標(biāo)才是我們想要它在比賽中做到的,也就是贏得比賽。你可以想象有其他的目標(biāo)被設(shè)定進去,比如將勝出的目數(shù)最大化,但是這不是我們?yōu)锳lphaGo選定的目標(biāo)。當(dāng)你把贏棋作為中心的時候,就會導(dǎo)致AlphaGo在爭取贏棋時的一些行為,它可能會放棄一些目數(shù)以求降低它感知到的風(fēng)險,即使這個風(fēng)險非常小。

棋局回顧:

·人機大戰(zhàn)首局柯潔執(zhí)黑先行 在傳統(tǒng)開局中求變化·AlphaGo中盤階段顯示實力 柯潔遇考驗陷入長考·AlphaGo大局清晰占主動 柯潔孤注一擲圖謀大龍·柯潔官子階段苦覓逆轉(zhuǎn)良機 AlphaGo144手略意外

嘉賓講棋:

·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(1) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(2) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(3) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(4) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(5) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(6)

http://sports.sohu.com/20170524/n494235881.shtml sports.sohu.com true 搜狐體育 http://sports.sohu.com/20170524/n494235881.shtml report 3668 5月23日,當(dāng)今世界圍棋第一人柯潔九,23日下午在這里執(zhí)黑289手以四分之一子的微弱劣勢負(fù)于計算機圍棋程序"阿爾法圍棋",在圍棋"人機大戰(zhàn)"三番棋中以0:1落后

繼續(xù)閱讀與本文標(biāo)簽相同的文章

分享至:

你可能感興趣 換一換

分享到微信朋友圈 ×
打開微信,
使用“掃一掃”即可將網(wǎng)頁分享至朋友圈。