AlphaGo2.0版已自我學習 目標用于科學醫(yī)學領域
AlphaGo2.0版已自我學習 目標用于科學醫(yī)學領域
5月23日,當今世界圍棋第一人柯潔九,23日下午在這里執(zhí)黑289手以四分之一子的微弱劣勢負于計算機圍棋程序"阿爾法圍棋",在圍棋"人機大戰(zhàn)"三番棋中以0:1落后。
AlphaGo團隊在賽后接受媒體采訪,對于新版本的AlphaGo進行解讀。目前AlphaGo新版本變得更加強大,實現(xiàn)了自我學習。
Q: 這次的AlphaGo是純凈版的AlphaGo嗎?也就是說,它是否是完全不依賴人類大師的棋譜來自我學習的?
Demis Hassabis: I’m not sure if I understand the question correctly, but… You know… obviously the version… AlphaGo initially learns from human games, and then…most of its learning now is from its own play against itself. So…but of course to truly test what it knows, we have to play against human experts, because we don't know playing the game against itself is not going to expose its weaknesses, because it will obviously fix those during the self-play. So we really have to test it against the world’s best players.
我不太確定我是否正確理解了這個問題。當然在最初的版本中,AlphaGo從人類棋譜中學習,后來到現(xiàn)在它大部分的學習材料都來自于自我對弈的棋譜。但是當然為了真正地測試它的所學,我們必須和人類高手對弈,因為我們不知道在自我對弈的過程中它是否會顯露出它的缺點,因為顯然它在自戰(zhàn)過程中會避開不足。所以我們必須和世界上最優(yōu)秀的棋手們對弈以測試它。
David Silver: Perhaps I could just add to that. One of the innovations of AlphaGo-Master, is that it actually relies much more on learning from itself. So in this version, AlphaGo has actually become its own teacher, learning from moves which are taken from examples of its own searches, that relies much less actually on human data than previous versions. And one of our goals in doing so is to make it more and more general so that its principal can be applied to other domains beyond Go.
我補充一下。AlphaGo-Master的一大創(chuàng)新就是它更多地依靠自我學習。在這個版本中,AlphaGo實際上成為了它自己的老師,從它自己的搜索中獲得的下法中學習,和上一個版本相比大幅減少了對人類棋譜的依賴。我們這樣做的目標之一就是是它變得更為通用,從而能被應用在圍棋以外的領域上。
Q:我想知道Master的版本是V25,那么現(xiàn)在和柯潔對弈的AlphaGo是不是一個更新的版本?另外我想知道這是我們最后一次見到AlphaGo嗎?AlphaGo未來會成為一個工具,幫助職業(yè)棋手繼續(xù)提升自己的技術,還是從此就會和我們說再見?
David Silver: So maybe I can answer the first part to that question, regarding the technology inside AlphaGo. So AlphaGo-Master is a new version of AlphaGo, and we worked very hard to improve the fundamental algorithm that is used in AlphaGo. In fact, it turns out that the algorithm often matters more than the amount of data, or the amount of compute that actually goes into it. And if you get the algorithms right to make them general and powerful enough, then they can really progress very rapidly. So in fact in AlphaGo-Master, actually uses 10 times less computation, and is trained in match in weeks rather than months, compare to the version that played against Lee Sedol last year. So it is a different version, and is at least in self-play performance considerably stronger. And we are here to find out if indeed it’s stronger as it seems in self-play, or if it has weaknesses that can be exposed.
我可以回答問題的第一部分,關于AlphaGO內部的技術問題的。AlphaGo-Master是一個全新版本的AlphaGo,我們非常努力地工作,改進了AlphaGo的基礎算法。事實證明,算法常常比數(shù)據的多少或者運算力更重要。當你把算法弄對使它們足夠通用和強大,它們運行的速度是非??斓摹K允聦嵣螦lphaGo-Master用了和去年挑戰(zhàn)李世石的那個版本相比來說十分之一的計算能力,用了幾周在棋盤上訓練而不是幾個月。所以這是一個不同的版本,至少在自我對弈中它表現(xiàn)的更為強大了。我們來這里就是為了看看它是否真的像在自戰(zhàn)中所表現(xiàn)的那樣強大,還是它依然存在能被暴露出來的弱點。
Demis Hassabis: And as far as the second part of the question, I’ll just answer that. And later on in the event we will be announcing the next steps for AlphaGo. So I don't want to say anything in advance of that, but we will be talking about that later in the week. But one thing I want to say is that, just like with the last version of AlphaGo where we published all the technical details and results of the AlphaGo program in the Nature article, in the scientific journal Nature. And we published all the details and that allowed other companies, you know… Tencent and Japanese companies, to make their own versions of AlphaGo, and some of them are very strong now as well, I’m sure you all know, playing online, probably 9 Dan level. And we plan to publish more details of the new version of AlphaGo in the next few months. So we will review those technical details, and then again other teams and academic labs will be able to implement their versions of this AlphaGo-Master architecture.
至于第二部分的問題,由我來回答。今后在這個峰會上我們會公布AlphaGo的下一步計劃,所以在那之前我不想多說,我們會在這周稍后談到。但是有一件事是我想說的,我們在《自然》雜志中公布了上一個版本AlphaGo的技術細節(jié)和成果,這允許了其他的公司,比如騰訊和一些日本公司開發(fā)了他們自己版本的AlphaGo,這些程序中有一些已經很強大了,我相信你們都知道,它們在網上下棋,有著大概9段的水平。我們也計劃在幾個月內公布更多關于新版AlphaGo的技術細節(jié)。我們會回顧這些技術細節(jié),然后其他的團隊和實驗室將會能夠再次構建他們自己的AlphaGo-Master框架。
Q: 當越來越多頂尖棋手不愿意和AlphaGo對弈時,我們是否會考慮到用AlphaGo和AlphaGo對弈?
Demis Hassabis: We want to use AlphaGo, as I said, as a tool for the Go community to improve their knowledge about the game. We hope to, you know, release some details about the architecture we are using, maybe also some of the games that AlphaGo plays against itself. So we maybe will make some announcement about this later in the week. But don't forget, the reason, ultimately, we are developing these technologies is also to use them more widely in areas of science and medicine, and to try and help human experts in those areas. So we have lot of work ahead of us in the coming years.
就像我所說的,我們希望AlphaGo會是一個供圍棋界提高他們對于這個游戲的認知的工具。我們會公布我們所使用的程序架構的細節(jié),也可能還會公布一些AlphaGo自我對弈的棋譜,這周稍后會正式宣布。但是別忘了,我們發(fā)展這些科技的最終目的是為了在科學和醫(yī)學領域更廣闊地應用它們,也為了給人類專家提供幫助。所以在接下來幾年我們還有很多工作要做。
棋局回顧:
·人機大戰(zhàn)首局柯潔執(zhí)黑先行 在傳統(tǒng)開局中求變化·AlphaGo中盤階段顯示實力 柯潔遇考驗陷入長考·AlphaGo大局清晰占主動 柯潔孤注一擲圖謀大龍·柯潔官子階段苦覓逆轉良機 AlphaGo144手略意外
嘉賓講棋:
·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(1) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(2) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(3) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(4) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(5) ·黨毅飛、范蔚菁解析人機大戰(zhàn) 柯潔 VS AlphaGo(6)
繼續(xù)閱讀與本文標簽相同的文章