"

九州天下登陆

<var id="59xdh"></var><listing id="59xdh"></listing>
<var id="59xdh"><strike id="59xdh"><thead id="59xdh"></thead></strike></var>
<cite id="59xdh"><strike id="59xdh"></strike></cite>
<var id="59xdh"></var>
<cite id="59xdh"></cite>
<var id="59xdh"></var>
<var id="59xdh"></var>
<var id="59xdh"><span id="59xdh"><var id="59xdh"></var></span></var>
<var id="59xdh"><video id="59xdh"></video></var><var id="59xdh"><strike id="59xdh"></strike></var>
<var id="59xdh"><video id="59xdh"></video></var>
<var id="59xdh"></var>
<var id="59xdh"></var><cite id="59xdh"><video id="59xdh"></video></cite>
<var id="59xdh"><video id="59xdh"><menuitem id="59xdh"></menuitem></video></var>
<menuitem id="59xdh"><strike id="59xdh"><listing id="59xdh"></listing></strike></menuitem>
<cite id="59xdh"><video id="59xdh"></video></cite> <cite id="59xdh"></cite>
<cite id="59xdh"><video id="59xdh"></video></cite>
<var id="59xdh"></var>
"
立即打开
创造阿尔法狗的公司,如今要解开生物学最大秘密

创造阿尔法狗的公司九州天下登陆九州天下登陆,如今要解开生物学最大秘密

Jeremy Kahn 2020年12月22日
DeepMind首创的新方法在抗击新冠病毒的斗争中已经取得成果九州天下登陆九州天下登陆九州天下登陆九州天下登陆。本文将阐述这家以游戏知名的公司如何解开生物学最大秘密的故事。

计算机生成与新冠病毒相关的蛋白质ORF8图像九州天下登陆。图像由DeepMind开发的人工智能系统支持绘制九州天下登陆。图片来源:COURTESY OF DEEPMIND

2016年3月13日深夜九州天下登陆九州天下登陆九州天下登陆,气温相当寒冷,两名男子头戴羊毛帽九州天下登陆九州天下登陆九州天下登陆九州天下登陆,身穿厚厚的外套,并肩走过韩国首尔市中心拥挤的街道九州天下登陆。二人热烈地交谈九州天下登陆,似乎完全忽视了周围饺子馆和烧烤店霓虹灯的诱惑。他们此行韩国肩负重任九州天下登陆九州天下登陆,多年的努力终于能够看到结果九州天下登陆九州天下登陆。最棒的是,他们刚刚成功了。

这次散步是为了庆祝九州天下登陆。他们取得的成就将进一步巩固他们在计算机史上的地位九州天下登陆。在古老的战略游戏围棋领域里九州天下登陆,他们开发的人工智能软件已经充分掌握了个中奥秘九州天下登陆,而且轻松击败了全球顶尖选手李世石。如今九州天下登陆,两人开始讨论下一个目标,身后跟踪的纪录片摄制组捕捉到了当时的谈话九州天下登陆。

“告诉你九州天下登陆,我们可以解决蛋白质折叠问题九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?九州天下登陆!钡旅姿?哈萨比斯对同伴大卫?西尔弗说九州天下登陆【胖萏煜碌锹?九州天下登陆!澳遣攀谴蟪删?九州天下登陆。我相信现在能够去做了九州天下登陆。以前我只是想过,现在肯定可以做成九州天下登陆?!惫人故亲懿课挥诼锥氐娜斯ぶ悄芄綝eepMind的联合创始人及首席执行官九州天下登陆,正是该公司开发出了AlphaGo(阿尔法狗)九州天下登陆九州天下登陆。西尔弗则是DeepMind的计算机科学家九州天下登陆,负责领导AlphaGo团队。

四年后,DeepMind实现了当年哈萨比斯在首尔散步时的设想九州天下登陆九州天下登陆。公司开发出了人工智能系统九州天下登陆,能够根据基因序列来预测蛋白质的复杂形状,精确到单个原子宽度九州天下登陆??孔耪庀畛删?,DeepMind完成了需要近50年才能完成的科学探索九州天下登陆。1972年九州天下登陆,化学家克里斯蒂安?安芬森在诺贝尔奖获奖演说中提出九州天下登陆,只有DNA才可以完全决定蛋白质的最终结构。这是惊人的猜想九州天下登陆。当时连一个基因组都未完成测序。安芬森的理论开创了计算生物学的分支,目标是用复杂的数学模拟蛋白质结构,而不是实验九州天下登陆。

DeepMind在围棋方面取得的成就确实很重要九州天下登陆,但在围棋和计算机科学这两个相对偏僻的领域之外,几乎没有产生什么具体影响九州天下登陆九州天下登陆。解决蛋白质折叠问题则完全不同九州天下登陆,对大多数人来说都有变革意义九州天下登陆。蛋白质是生命的基本组成部分九州天下登陆,也是大多数生物过程背后的运行机制。如果能够预测蛋白质的结构,将彻底改变人们对疾病的理解九州天下登陆九州天下登陆九州天下登陆,还可以为癌症到老年痴呆症等各种疾病开发全新也更具针对性的药物九州天下登陆九州天下登陆。新药上市时间有望加快九州天下登陆九州天下登陆,药物研发成本减少数年时间九州天下登陆,成本也节约数亿美元九州天下登陆,还可能会拯救很多生命九州天下登陆九州天下登陆。

DeepMind的联合创始人及首席执行官德米斯?哈萨比斯。他早年痴迷国际象棋和电子游戏设计九州天下登陆九州天下登陆,后来对开发人工智能系统产生兴趣九州天下登陆。图片来源:Courtesy of DeepMind

DeepMind首创的新方法在抗击SARS-CoV-2(也就是新冠病毒)的斗争中已经取得成果九州天下登陆九州天下登陆。以下是以游戏知名的公司如何揭开生物学最大秘密的故事九州天下登陆。

形状莫测的积木

“蛋白质是细胞的主要机器九州天下登陆九州天下登陆九州天下登陆?!奔又荽笱Р死中5纳锕こ探淌谝炼?霍姆斯表示九州天下登陆。蛋白质的结构和形状对其工作方式至关重要九州天下登陆九州天下登陆,构成蛋白质分子晶格的小“口袋”是发生各种化学反应的地方九州天下登陆。如果能够找到某种化学物质与其中一个口袋结合九州天下登陆,这种物质就可以作为药物阻止或加速生物过程九州天下登陆。生物工程师还能够创造出自然界中从未出现的全新蛋白质九州天下登陆,而且具有独特的疗效【胖萏煜碌锹?!叭绻颐强梢岳玫鞍字实牧α?,合理地设计用途,就能够制造出神奇的自我组装机器九州天下登陆九州天下登陆,发挥一些作用九州天下登陆九州天下登陆九州天下登陆?九州天下登陆!被裟匪顾稻胖萏煜碌锹?九州天下登陆。

但为了确保蛋白质达到想要的效果,把握其形状很重要九州天下登陆。

蛋白质由氨基酸链组成,常被比作细绳上的珠子九州天下登陆九州天下登陆。至于珠子按照什么顺序穿起来九州天下登陆,信息都存储在DNA里。但是九州天下登陆,根据简单的基因指令很难预测完整的链条会形成多复杂的物理形状九州天下登陆。氨基酸链根据分子间吸引和排斥的电化学规则折叠成某种结构九州天下登陆。形状常常类似绳索和丝带缠绕而成的抽象雕塑:褶皱的带状物加上莫比乌斯带,就像卷曲环状的螺旋九州天下登陆九州天下登陆九州天下登陆。20世纪60年代九州天下登陆九州天下登陆,物理学家和分子生物学家塞勒斯?列文塔尔发现九州天下登陆,一种蛋白质的形状有太多可能性九州天下登陆。如果想通过随机尝试组合找出蛋白质的准确结构九州天下登陆,花的时间比已知宇宙的年龄还长。而且九州天下登陆九州天下登陆,几毫秒内蛋白质就会完成折叠九州天下登陆九州天下登陆。该观察被称为列文塔尔悖论九州天下登陆。

到目前为止,只有通过所谓X射线晶体衍射才可以接近准确了解蛋白质的结构九州天下登陆。顾名思义九州天下登陆,首先需要将含有数百万蛋白质的溶液转化为晶体九州天下登陆九州天下登陆,本身就是很复杂的化学过程九州天下登陆九州天下登陆九州天下登陆。然后九州天下登陆,X射线发射到晶体上九州天下登陆,科学家从获得的衍射图逆向工作九州天下登陆九州天下登陆,从而建立蛋白质图像。而且九州天下登陆,还不是随便什么X射线都可以。要想获得很多蛋白质的结构九州天下登陆九州天下登陆,要由圆形的九州天下登陆九州天下登陆,大小堪比体育场的同步加速器发射X射线九州天下登陆。

过程既昂贵又耗时九州天下登陆九州天下登陆。根据多伦多大学(University of Toronto)的研究人员估计,用X射线晶体衍射法测定单个蛋白质的结构需要约12个月九州天下登陆九州天下登陆,花费约12万美元九州天下登陆。已知的蛋白质超过2亿种九州天下登陆,每年大约能够发现3000万种九州天下登陆,但其中只有不到20万种蛋白质通过X射线晶体衍射或其他实验方法绘制出了结构图九州天下登陆九州天下登陆?!叭死嗟奈拗潭日谘杆僭龀??九州天下登陆!奔扑阄锢硌Ъ以己?乔普说九州天下登陆,现在他担任DeepMind的高级研究员九州天下登陆,负责领导蛋白质折叠团队九州天下登陆。

过去50年里,自从克里斯蒂安?安芬森发表著名演讲以来九州天下登陆,科学家们一直努力使用高性能计算机上运行的复杂数学模型加速分析蛋白质结构九州天下登陆【胖萏煜碌锹?九州天下登陆九州天下登陆九州天下登陆!盎旧暇褪浅⑹栽诩扑慊锎唇ǖ鞍字实氖炙?九州天下登陆,然后尝试操作九州天下登陆?!甭砝锢即笱У南赴镅Ш头肿右糯Ы淌谠己?穆尔特说九州天下登陆,他也是用数学算法通过DNA序列预测蛋白质结构的先驱九州天下登陆九州天下登陆九州天下登陆。问题是九州天下登陆九州天下登陆,预测出的折叠模式经常有误九州天下登陆九州天下登陆,与科学家通过X射线晶体衍射发现的结构并不一致九州天下登陆。事实上大约10年前九州天下登陆,很少有模型预测大蛋白质形状时准确率可以超过三分之一九州天下登陆。

蛋白质折叠模拟要占用庞大的算力九州天下登陆。2000年九州天下登陆,研究人员创建了名叫Fold@home的“公民科学”项目,人们能够捐出个人电脑和游戏机的闲置处理能力运行蛋白质折叠模拟九州天下登陆。所有设备通过互联网连接在一起,从而打造全世界最强大的虚拟超级计算机之一九州天下登陆。大家都希望帮研究人员摆脱列文塔尔悖论,通过随机实验和试错准确判断蛋白质的结构九州天下登陆。目前该项目仍然在进行中九州天下登陆,已经为超过225篇论文提供了数据,研究内容是与多种疾病相关的蛋白质九州天下登陆。

尽管拥有强大的处理能力九州天下登陆,Fold@home仍然深陷列文塔尔悖论九州天下登陆九州天下登陆九州天下登陆,因为算法试图搜索所有可能的排列,从而找到蛋白质结构九州天下登陆。破解蛋白质折叠的关键在于跳过艰苦搜索的过程九州天下登陆九州天下登陆九州天下登陆,发现蛋白质DNA序列与结构联系的神秘模式九州天下登陆,从而让计算机踏上全新捷径九州天下登陆,直接从遗传学领域转到准确绘制形状。

严肃的游戏

德米斯?哈萨比斯对蛋白质折叠的兴趣始于一场游戏九州天下登陆,他对很多事都是这样九州天下登陆九州天下登陆九州天下登陆九州天下登陆。哈萨比斯曾经是国际象棋天才九州天下登陆,13岁时已经成为大师九州天下登陆九州天下登陆,一度在同年龄里排名世界第二。他对象棋的热爱后来转向对两件事感兴趣:一是游戏设计,二是研究自身意识的内在机制九州天下登陆。他高中时开始为电子游戏公司工作九州天下登陆,在剑桥大学(University of Cambridge)学习计算机科学后九州天下登陆九州天下登陆,1998年创立了电脑游戏初创公司Elixir Studios。

尽管曾经研发出两款获奖游戏,最终Elixir还是卖掉知识产权并关闭公司九州天下登陆,哈萨比斯从伦敦大学学院(University College London)获得了认知神经科学博士学位九州天下登陆九州天下登陆。彼时他已经开始踏上漫漫征途九州天下登陆九州天下登陆九州天下登陆九州天下登陆,后来2010年联合创立了DeepMind九州天下登陆。他开始研发通用人工智能软件九州天下登陆,不仅可以学习执行很多任务九州天下登陆九州天下登陆,有些甚至比人类完成得更好九州天下登陆九州天下登陆九州天下登陆。哈萨比斯曾经说过九州天下登陆九州天下登陆,DeepMind的远大目标是“解决智能问题九州天下登陆,然后解决所有其他问题?九州天下登陆!惫人挂苍凳揪胖萏煜碌锹?,蛋白质折叠可能就是“其他问题”里的第一批九州天下登陆。

2009年九州天下登陆,哈萨比斯在麻省理工学院(Massachusetts Institute of Technology)攻读博士后时九州天下登陆九州天下登陆,听说了一款名为Foldit的在线游戏。Foldit是由华盛顿大学(University of Washington)的研究人员设计九州天下登陆九州天下登陆,跟Fold@home类似九州天下登陆,也是有关蛋白质折叠的“公民科学”项目九州天下登陆。但Foldit并不是整合闲置的微芯片九州天下登陆,而是利用闲置的大脑九州天下登陆九州天下登陆。

Foldit是类似益智游戏的游戏九州天下登陆九州天下登陆,并不掌握生物学领域知识的人类玩家比赛折叠蛋白质九州天下登陆九州天下登陆,如果能够得到合理的形状就可以获得积分。然后,研究人员分析得分最高的设计,看是否有助于破解蛋白质结构问题九州天下登陆。游戏已经吸引成千上万玩家九州天下登陆,并且一些记录案例中得到的蛋白质结构比研究蛋白质折叠的计算机算法更准确?!按诱飧鼋嵌壤纯?,我觉得游戏很有趣九州天下登陆,想着能不能利用游戏的上瘾性和游戏的乐趣九州天下登陆,不仅让人们玩得开心,也做一些对科学有用的事情九州天下登陆【胖萏煜碌锹?!惫人顾?。

Foldit能够抓住哈萨比斯的想象力还有另一个原因九州天下登陆九州天下登陆。其实游戏是一种强化学习行为九州天下登陆,特别适合训练人工智能。软件可以通过试验和试错从经验中学习九州天下登陆,从而更好地完成任务九州天下登陆九州天下登陆。在游戏里软件能够无休止地试验九州天下登陆,反复地玩,逐步改进九州天下登陆,不对现实世界造成伤害的情况下提升技能水平,直到超过人类。游戏也有现成的方法判断某个特定的动作或某组动作是否有效,即积分和胜利。种种指标可以提供非常明确的标准衡量表现九州天下登陆,在现实世界很多问题里则无法如此处理九州天下登陆九州天下登陆。现实世界遇到问题时九州天下登陆九州天下登陆,最有效的方法可能比较模糊九州天下登陆,“获胜”的概念也可能不适用九州天下登陆。

DeepMind的基础主要是将强化学习与称为深度学习的人工智能相结合九州天下登陆。深度学习是基于神经网络的人工智能,所谓神经网络是大致基于人脑工作原理的软件九州天下登陆。这种情况下九州天下登陆,软件没有实际的神经细胞网络九州天下登陆,而是一堆虚拟神经元分层排列九州天下登陆,初始输入层接收数据,按照权重分配后传递到中间层九州天下登陆九州天下登陆,中间层依次执行相同操作九州天下登陆,最终传递到输出层九州天下登陆,输出层汇总各项加权值并算出结果九州天下登陆。网络能够调整各项权重九州天下登陆,直到产生理想的结果,例如准确识别猫的照片或国际象棋获胜九州天下登陆九州天下登陆。之所以被称为“深度学习”九州天下登陆,并不是因为产生的结果一定深刻九州天下登陆九州天下登陆,当然也有可能深刻九州天下登陆,但主要原因是网络由许多层构成九州天下登陆九州天下登陆,所以可以说具有深度。

DeepMind最初成功是用“深度强化学习”创建软件九州天下登陆,自学玩经典的雅达利电脑游戏,如《乒乓球》(Pong)九州天下登陆、《突围》(Breakout)和《太空入侵者》(Space Invaders)等九州天下登陆,而且水平超过人类九州天下登陆九州天下登陆九州天下登陆。正是这一成就让DeepMind受到谷歌(Google)等科技巨头的关注九州天下登陆九州天下登陆,据报道九州天下登陆九州天下登陆,2014年谷歌以4亿英镑(当时超过6亿美元)收购了DeepMind。之后公司主攻围棋并开发了AlphaGo系统,2016年击败了李世石。DeepMind接着开发了名叫AlphaZero的更通用系统版本九州天下登陆九州天下登陆九州天下登陆,几乎能够学会所有两玩家回合制游戏九州天下登陆,在这种游戏中九州天下登陆九州天下登陆九州天下登陆,玩家都可以获得充分信息(没有机会隐藏信息九州天下登陆九州天下登陆九州天下登陆,例如牌面朝下放置或隐藏位置)。去年,公司开发的系统还在高度复杂的即时战略游戏《星际争霸2》(Starcraft 2)中击败了顶尖的人类职业电竞玩家九州天下登陆。

2016年3月15日九州天下登陆九州天下登陆,谷歌DeepMind挑战赛最后一场比赛结束后九州天下登陆九州天下登陆,职业围棋选手李世石(左)与德米斯?哈萨比斯握手,比赛中李世石与电脑程序AlphaGo对决。图片来源:Jeon Heon-Kyun—Pool/Getty Images

但哈萨比斯表示九州天下登陆,一直认为公司在游戏方面的探索是完善人工智能系统的方式九州天下登陆九州天下登陆,之后能够应用于现实世界挑战,尤其是科学领域九州天下登陆九州天下登陆?!氨热皇茄盗烦?九州天下登陆,但训练到底为了什么九州天下登陆?最终是为了创造新知识?九州天下登陆九州天下登陆!彼?九州天下登陆九州天下登陆。

DeepMind并非具有产品和客户的传统业务,本质上是推动人工智能前沿的研究实验室九州天下登陆九州天下登陆。公司的很多开发方法都已经公开九州天下登陆,供所有人使用或借鉴九州天下登陆。不过某些方面的进步对姊妹公司谷歌也颇有帮助九州天下登陆九州天下登陆。

DeepMind团队由工程师和科学家组成,帮助谷歌将尖端的人工智能技术融入产品九州天下登陆。DeepMind的技术已经渗透各处九州天下登陆九州天下登陆九州天下登陆,从谷歌地图(Google Maps)到数字助理九州天下登陆九州天下登陆,再到协助管理安卓手机电池电量的系统九州天下登陆。谷歌为此向DeepMind支付费用九州天下登陆,母公司Alphabet继续承担DeepMind带来的额外亏损九州天下登陆九州天下登陆【胖萏煜碌锹?九州天下登陆九州天下登陆?魉鸸婺2⒉恍?,2018年九州天下登陆,公司亏损4.7亿英镑(当时约合5.1亿美元)九州天下登陆,这也是通过英国的商业注册机构公司登记局(Companies House)可以查到的最新一年公开记录九州天下登陆。

不过如今员工超过1000人的DeepMind九州天下登陆,还有一整个部门只负责人工智能的科学应用。该部门的负责人为39岁的印度人普什米?科里,他加入DeepMind之前曾经在微软从事人工智能研究。他表示,DeepMind的目标是解决“根节点”问题九州天下登陆,这是数据科学家的惯用语九州天下登陆,意思是希望解决能够解锁很多科学路径的基础问题九州天下登陆九州天下登陆。蛋白质折叠就是根节点之一,科里说九州天下登陆九州天下登陆。

“蛋白质折叠的奥运会”

1994年九州天下登陆九州天下登陆,当很多科学家刚开始使用复杂的计算机算法预测蛋白质折叠方式时,马里兰大学的生物学家墨尔特决定开办竞赛九州天下登陆,用公正的方法评估哪种算法最好。他把比赛称为蛋白质结构预测关键评估(简称为CASP)九州天下登陆,之后每两年举办一次九州天下登陆。

赛事具体如下九州天下登陆,美国国立卫生研究院资助的蛋白质结构预测中心主办CASP九州天下登陆,并说服从事X射线晶体衍射和其他实证研究的研究人员提供尚未公布的蛋白质结构九州天下登陆,要求在CASP竞赛结束之前不公开相关结构。然后CASP将蛋白质DNA序列发给参赛者九州天下登陆,参赛者用算法预测蛋白质结构九州天下登陆九州天下登陆。CASP判断预测与X射线晶体学家和实验学家发现的实际结构接近程度,然后根据算法对各种蛋白质预测的平均得分排名九州天下登陆九州天下登陆九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?九州天下登陆!拔页浦鞍字收鄣绲陌略嘶峋胖萏煜碌锹?【胖萏煜碌锹?!惫人顾?九州天下登陆九州天下登陆。2016年AlphaGo击败李世石后不久九州天下登陆,DeepMind就打算赢得金牌九州天下登陆。

DeepMind组建了小规模精干的团队九州天下登陆九州天下登陆,由六名机器学习研究人员和工程师组成?!叭谩ú拧胧质俏颐堑睦砟?九州天下登陆九州天下登陆【胖萏煜碌锹?九州天下登陆九州天下登陆!惫人顾?九州天下登陆九州天下登陆。公司里并不缺乏人才?九州天下登陆!扒拔锢硌Ъ揖胖萏煜碌锹?、前生物学家九州天下登陆九州天下登陆,大家都四处闲逛九州天下登陆九州天下登陆【胖萏煜碌锹剑”哈萨比斯有点啼笑皆非九州天下登陆?九州天下登陆!八怯涝恫恢乐暗淖ㄒ抵妒裁词焙蚩梢酝蝗环⒒幼饔镁胖萏煜碌锹?九州天下登陆?九州天下登陆九州天下登陆!弊詈笸哦映稍痹黾拥?0人左右。

不过九州天下登陆,DeepMind还是认为团队里至少要有一位真正的蛋白质折叠专家九州天下登陆,后来选中了约翰?乔普。35岁的乔普像个大男孩九州天下登陆,瘦得皮包骨,一头蓬乱斜梳的棕色头发九州天下登陆九州天下登陆,有点像20世纪90年代末高中车库乐队的低音吉他手九州天下登陆九州天下登陆。他在剑桥大学获得理论凝聚态物理硕士学位九州天下登陆,之后在纽约由对冲基金亿万富翁大卫?肖创立的独立研究实验室D.E.Shaw Research工作九州天下登陆。实验室专门研究计算生物学九州天下登陆九州天下登陆,包括蛋白质模拟。后来乔普在芝加哥大学获得了计算生物物理学博士学位九州天下登陆九州天下登陆,导师为卡尔?弗里德和托宾?索斯尼克九州天下登陆,两位科学家皆因推动蛋白质折叠模型进步出名?!拔以礑eepMind对解决蛋白质结构有兴趣九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹骄胖萏煜碌锹剑”他说九州天下登陆。于是他申请并顺利加入九州天下登陆九州天下登陆。

哈萨比斯和DeepMind团队的第一直觉是九州天下登陆九州天下登陆九州天下登陆,蛋白质折叠能够用与围棋完全相同的方式解决九州天下登陆,即深度强化学习九州天下登陆。事实证明存在问题九州天下登陆九州天下登陆九州天下登陆。首先九州天下登陆九州天下登陆九州天下登陆,蛋白质折叠结构的可能性比围棋的步数还要多九州天下登陆九州天下登陆。更重要的是九州天下登陆九州天下登陆,DeepMind让工智能系统AlphaGo与自己对弈就可以掌握围棋的玩法九州天下登陆九州天下登陆?九州天下登陆九州天下登陆!八钥杀刃圆⒉桓?,因为蛋白质折叠不是双人游戏?!惫人顾?,“有点违背自然九州天下登陆【胖萏煜碌锹?九州天下登陆!?/p>

计算物理学家约翰?乔普如今负责DeepMind的蛋白质折叠团队九州天下登陆。乔普说,团队面临的挑战不仅是在竞争中领先:“我们想打造对生物学家很重要的系统九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?!蓖计丛矗篊ourtesy of DeepMind

DeepMind很快发现九州天下登陆九州天下登陆,如果使用所谓监督式深度学习的人工智能培训方法九州天下登陆,就能够更简便地取得进步九州天下登陆。这是大多数商业应用里使用的人工智能九州天下登陆九州天下登陆,神经网络通过一组既定数据输入和相应输出九州天下登陆,可以学习如何将给定的输入与给定输出相匹配九州天下登陆。具体到蛋白质结构九州天下登陆,DeepMind已经掌握约170000个蛋白质结构九州天下登陆九州天下登陆九州天下登陆,能够作为训练数据九州天下登陆九州天下登陆。蛋白质数据库(PDB)是已知三维蛋白质形状及遗传序列的公共存储库,可以公开查询相关结构。

一些生物学家已经使用监督式深度学习预测蛋白质如何折叠。但此类人工智能系统表现最佳的正确率也只有50%九州天下登陆,对生物学家或医学研究人员没有什么帮助九州天下登陆,尤其是对结构未知的蛋白质九州天下登陆,因为无法确定某次特定预测是否正确九州天下登陆。

有种技术很有希望九州天下登陆,其理念是基于蛋白质的进化史划分为不同的家族。各种家族里可能在一个DNA序列中找到相距遥远但似乎会同时突变的氨基酸对九州天下登陆九州天下登陆。此类所谓“共同进化”的现象很有帮助九州天下登陆,因为共同进化的蛋白质很可能在蛋白质折叠结构中有联系九州天下登陆。位于芝加哥的丰田技术研究所(Toyota Technological Institute)的科学家徐金波(音译)率先利用深入学习共同进化数据预测氨基酸联系九州天下登陆。这种方法有点像是在连接点游戏里寻找点九州天下登陆九州天下登陆??蒲Ъ胰匀灰闷渌砑页龅阒涞南?九州天下登陆九州天下登陆,过程中经常出错九州天下登陆。有时候连点都找不准九州天下登陆。

在2018年的CASP竞赛中九州天下登陆,DeepMind应用了共同进化和预测联系的基本思想九州天下登陆,但增加了两个重要的转折点。首先九州天下登陆,系统没有试图确定两个氨基酸是否有联系九州天下登陆,也就是二进制输出(即两个氨基酸可能有联系九州天下登陆,也可能没有联系)九州天下登陆九州天下登陆,而是决定让算法预测蛋白质里所有氨基酸对之间的距离九州天下登陆。

在多数分子生物学家看来九州天下登陆,这种方法似乎违反直觉九州天下登陆九州天下登陆九州天下登陆,不过值得称赞的是九州天下登陆九州天下登陆,徐金波也独立提出了类似方法九州天下登陆。毕竟九州天下登陆,联系才是最重要的。对于DeepMind的深度学习专家来说九州天下登陆,很明显距离是让神经网络发挥作用更好的指标九州天下登陆,科里表示九州天下登陆?九州天下登陆!罢庵皇巧疃妊暗幕〔糠?九州天下登陆九州天下登陆,如果与决策相关存在不确定性九州天下登陆,最好是让神经网络整合不确定性,并决定如何应对九州天下登陆?!彼?九州天下登陆。与联系不一样九州天下登陆,距离包含了神经网络可调整和使用的丰富信息。

DeepMind另一项让人意外之处是引入第二个神经网络九州天下登陆,用于预测氨基酸对之间的角度九州天下登陆。有了距离和角度两个因素九州天下登陆九州天下登陆,DeepMind的算法就能够算出蛋白质结构的大致轮廓九州天下登陆九州天下登陆。然后,系统使用另一种非人工智能算法改进结构九州天下登陆九州天下登陆九州天下登陆。DeepMind将相关组件整合到名为AlphaFold的系统中九州天下登陆,横扫了2018年CASP(又称为第13届CASP九州天下登陆,因为是两年一度比赛举办第13次九州天下登陆九州天下登陆。)比赛里结构最复杂的43种蛋白质中九州天下登陆,AlphaFold在25种蛋白质中得分最高。第二名仅在三种蛋白质里得到高分九州天下登陆九州天下登陆。研究结果震惊了全行业。如果说之前还有人怀疑深度学习究竟是不是解决蛋白质折叠问题最有希望的方法九州天下登陆九州天下登陆,AlphaFold让所有人再无疑问九州天下登陆。

回到白板

尽管如此,DeepMind还远没有达到哈萨比斯的目标九州天下登陆九州天下登陆,即完全解决蛋白质折叠问题九州天下登陆九州天下登陆。AlphaFold准确率只有一半九州天下登陆九州天下登陆九州天下登陆,第13届CASP的104个蛋白质中九州天下登陆,准确度可以达到X射线晶体衍射水平的只有三个九州天下登陆?!拔颐遣恢幌朐贑ASP竞赛中夺魁九州天下登陆,而是想真正解决问题九州天下登陆。我们想打造对生物学家很重要的系统九州天下登陆【胖萏煜碌锹?!鼻瞧账?九州天下登陆。

2018年CASP的结果公布后不久九州天下登陆九州天下登陆,DeepMind就开始加倍努力九州天下登陆九州天下登陆九州天下登陆九州天下登陆。乔普负责扩大的团队九州天下登陆。团队并未简单地在AlphaFold基础上改进九州天下登陆,而是返回原点九州天下登陆,集思广益寻找完全不同的想法九州天下登陆,他们希望新创意能够帮软件将精确度提升到更接近X射线晶体衍射级别九州天下登陆。

乔普表示九州天下登陆九州天下登陆,接下来是整个项目中最可怕也最令人沮丧的时期之一九州天下登陆,因为什么办法都没有九州天下登陆【胖萏煜碌锹?九州天下登陆!拔颐腔巳鲈?,结果都达不到第13届CASP的水平,开始真正感觉到恐慌九州天下登陆九州天下登陆?九州天下登陆!彼稻胖萏煜碌锹健2还笔毖芯咳嗽钡某⑹猿鱿至艘恍└慕?九州天下登陆,没到6个月系统已经比最初的AlphaFold有了明显改进九州天下登陆。之后两年里一直延续该模式九州天下登陆,乔普说九州天下登陆。先是三个月一无所获,接下来三个月快速发展九州天下登陆九州天下登陆九州天下登陆九州天下登陆九州天下登陆,接着又是平台期九州天下登陆九州天下登陆。

哈萨比斯说,DeepMind以前的项目也出现过类似模式九州天下登陆九州天下登陆,包括围棋项目九州天下登陆九州天下登陆,还有复杂的即时战略游戏《星际争霸2》项目。他说,公司克服问题的管理策略就是交替采取两种不同的工作方式九州天下登陆九州天下登陆。第一种哈萨比斯称之为“攻击模式”九州天下登陆九州天下登陆,尽可能推动团队九州天下登陆九州天下登陆九州天下登陆,追求当前系统可以达到的极致表现。然后九州天下登陆,全力以赴努力的效果似乎耗尽时,他就开始转向所谓的“创新模式”九州天下登陆。期间哈萨比斯不再对团队施加压力九州天下登陆,容忍甚至期待出现暂时性的后退,从而为研究人员和工程师提供修补新想法和尝试新手段的空间。他说:“要鼓励人们提出尽可能多的疯狂想法九州天下登陆,还要头脑风暴九州天下登陆?九州天下登陆!备媚J酵ǔD芄煌贫阅艹鱿中路稍揪胖萏煜碌锹?九州天下登陆,让团队切换回攻击模式九州天下登陆。

生日大礼

2019年11月21日九州天下登陆,DeepMind蛋白质折叠团队的研究员凯萨伦?图雅苏那科年满30岁。这一天也会因为另一个原因值得纪念。图雅苏那科拥有牛津大学(University of Oxford)计算生物学博士学位九州天下登陆九州天下登陆,在团队里负责为蛋白质折叠人工智能开发新测试集九州天下登陆,新款人工智能叫AlphaFold 2九州天下登陆,是DeepMind为2020年的CASP竞赛新开发的系统九州天下登陆。那天早上她打开办公电脑时九州天下登陆九州天下登陆,收到系统对一批大约50个蛋白质序列预测的评估九州天下登陆九州天下登陆九州天下登陆九州天下登陆,所有序列均为最近才添加到蛋白质数据库中九州天下登陆九州天下登陆。她愣了一下九州天下登陆,然后大吃一惊。AlphaFold 2确实一直在改进,但对该组蛋白质的预测结果惊人地准确九州天下登陆九州天下登陆九州天下登陆。系统对好几个蛋白质结构结构预测误差在1.5埃以内,埃的距离单位相当于十分之一纳米,或大约一个原子的宽度九州天下登陆。

DeepMind的科学家凯萨伦?图雅苏那科帮助公司在蛋白质折叠研究方面取得了进展九州天下登陆九州天下登陆。图片来源:Courtesy of DeepMind

自称“团队悲观主义者”的图雅苏那科说九州天下登陆九州天下登陆,第一反应并不是高兴而是有点想吐九州天下登陆?!拔业笔焙芎ε戮胖萏煜碌锹健九州天下登陆!彼?。结果实在太好九州天下登陆九州天下登陆,她以为是自己犯了错九州天下登陆,可能准备测试集时无意中把人工智能在训练数据里见过的几个蛋白质加了进来九州天下登陆。如此一来AlphaFold 2基本上就可以作弊,轻易预测出准确的结构九州天下登陆九州天下登陆九州天下登陆。图雅苏那科回忆说九州天下登陆,当时坐在DeepMind自助餐厅俯瞰伦敦的圣潘克拉斯车站(St. Pancras Station)九州天下登陆,一杯接一杯地喝茶努力平复心情九州天下登陆。随后九州天下登陆,她和其他团队成员花了一整天九州天下登陆九州天下登陆,直到深夜才下班九州天下登陆,之后几天也是如此九州天下登陆九州天下登陆九州天下登陆,他们坐在工作站旁埋头梳理AlphaFold 2的训练数据,希望找出错误所在九州天下登陆。

然而一个错误也没有九州天下登陆九州天下登陆。事实是九州天下登陆,新系统在预测表现方面实现了巨大飞跃九州天下登陆。AlphaFold 2与之前版本完全不同九州天下登陆。人工智能不再只是各成分组合九州天下登陆,一个用来预测氨基酸之间的距离九州天下登陆九州天下登陆九州天下登陆,另一个预测角度九州天下登陆九州天下登陆九州天下登陆,然后用第三个软件联系起来九州天下登陆。现在的人工智能用单一的神经网络直接从DNA序列进行推理。虽然系统仍然接受进化信息九州天下登陆,从而确定研究的蛋白质是否与以前见过的蛋白质有共同的祖先九州天下登陆九州天下登陆,并仔细检查目标蛋白质的DNA序列与其他已知序列之间的一致性九州天下登陆九州天下登陆九州天下登陆,但不再需要哪些氨基酸对共同进化的明确数据九州天下登陆?九州天下登陆!拔颐遣⑽刺峁└嘈畔⒕胖萏煜碌锹剑炊跎倭诵畔⒕胖萏煜碌锹?九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹骄胖萏煜碌锹剑”乔普说九州天下登陆。系统可以自由地得出见解九州天下登陆,即祖先何时可能决定蛋白质的部分形状九州天下登陆,以及何时可能彻底偏离?九州天下登陆九州天下登陆;痪浠八稻胖萏煜碌锹骄胖萏煜碌锹?,系统根据经验培养出直觉,就像老练的人类科学家一样九州天下登陆。

新系统的核心是“注意力”机制九州天下登陆九州天下登陆九州天下登陆,顾名思义九州天下登陆,注意力是让深度学习系统专注于某组输入九州天下登陆,并对相关输入加大权重九州天下登陆。举例来说九州天下登陆九州天下登陆,在识别猫的系统里九州天下登陆九州天下登陆,系统可能学会注意耳朵的形状九州天下登陆九州天下登陆,也会学习在鼻子附近寻找胡须九州天下登陆九州天下登陆。乔普比较了AlphaFold 2的功能与玩拼图游戏,过程中“能够将某些部分拼凑在一起而且非常确定九州天下登陆九州天下登陆,得到不同的本地解决方案九州天下登陆九州天下登陆九州天下登陆,然后想办法将相关问题连接起来九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?九州天下登陆!鼻瞧账稻胖萏煜碌锹骄胖萏煜碌锹?九州天下登陆,神经网络的中层已经学会根据对DNA序列的分析推理几何和空间排列,以及氨基酸对如何连接。

DeepMind曾经在128个“张量处理核心”上训练AlphaFold 2,张量处理核心是在16块专门用于深度学习的计算机芯片上创建的数字运算大脑,芯片由谷歌设计并在数据中心使用九州天下登陆九州天下登陆,公司称连续运行了数周九州天下登陆九州天下登陆。(128个专用的人工智能核心大约相当于100到200块强大的图形处理芯片九州天下登陆,可以在Xbox或PlayStation上呈现极其炫目的动画效果九州天下登陆。)公司表示九州天下登陆九州天下登陆,经过训练的系统提取DNA序列后“几天内”就能够完成整个结构预测。

AlphaFold 2与前一代相比有个优势九州天下登陆,就是提供可信程度九州天下登陆九州天下登陆,即系统对结构里每种氨基酸的预测都有信心分数九州天下登陆九州天下登陆九州天下登陆。如果说AlphaFold 2可以切实帮到生物学家和医学研究人员九州天下登陆,这项指标至关重要,因为研究者需要清楚何时能够合理依赖模型九州天下登陆,以及何时需要更加谨慎九州天下登陆。

尽管测试结果惊人九州天下登陆,DeepMind仍然不能确定AlphaFold 2的预测效果九州天下登陆。新冠病毒来袭时九州天下登陆,公司才得到重要的线索九州天下登陆。今年3月九州天下登陆,AlphaFold 2可以预测出六种与SARS-CoV-2(引发疫情的病毒)相关但未被研究的蛋白质结构,后来科学家使用所谓低温电子显微镜的经验方法证实了其中一种九州天下登陆。由此能够充分看出AlphaFold 2对现实世界的影响力九州天下登陆九州天下登陆九州天下登陆。

惊人的结果

CASP比赛在5月到8月之间举行。蛋白质结构预测中心发布多批目标蛋白质,之后参赛方提交结构预测进行评估九州天下登陆。今年比赛排名于11月30日公布九州天下登陆九州天下登陆九州天下登陆。

每次预测均可以得到“全球距离测试总分”九州天下登陆,简称GDT的指标评分九州天下登陆九州天下登陆,该指标实际上看预测结果与通过实证方法(如X射线晶体衍射或电子显微镜)得到的结构接近程度九州天下登陆,单位为埃九州天下登陆九州天下登陆。CASP的主席穆尔特表示九州天下登陆九州天下登陆,满分是100分九州天下登陆九州天下登陆九州天下登陆,如果得分能够达到90分或以上九州天下登陆,说明与实证方法相当。根据CASP组织者判断的结构难度九州天下登陆,蛋白质也会划分不同的组九州天下登陆九州天下登陆。

穆尔特看到AlphaFold 2的结果时简直不敢相信九州天下登陆。他就像几个月前的图雅苏那科一样九州天下登陆九州天下登陆,刚开始的想法是出错了九州天下登陆。也许比赛中一些蛋白质序列以前发表过?又或者DeepMind也许设法获得了未发布数据的缓存九州天下登陆?

T1042的计算机生成图像九州天下登陆,T1042是感染细菌病毒里的部分蛋白质九州天下登陆九州天下登陆九州天下登陆。2020年CASP竞赛中九州天下登陆九州天下登陆,DeepMind的AlphaFold 2准确预测了该蛋白质的结构九州天下登陆九州天下登陆九州天下登陆九州天下登陆,这是人工智能在生物学和医学研究应用方面的重大突破。图片来源:Courtesy of DeepMind

T1037的计算机生成图像九州天下登陆九州天下登陆九州天下登陆,T1037是感染细菌病毒里的部分蛋白质。2020年CASP竞赛中,DeepMind的AlphaFold 2成功地预测了T1037的结构九州天下登陆。图片来源:Courtesy of DeepMind

为了核实九州天下登陆,他请位于德国图宾的根马克斯?普朗克发展生物学研究所(Max Planck Institute for Developmental Biology)的蛋白质进化系主任安德烈?卢帕斯帮忙验证。卢帕斯让AlphaFold 2预测一个自己确信没有见过的结构九州天下登陆,因为卢帕斯利用X射线结晶衍射从未成功观测到该蛋白质的关键部分九州天下登陆九州天下登陆。近十年来九州天下登陆,卢帕斯一直因为该部分缺失而伤脑筋,但就是观测不到准确的形状九州天下登陆。卢帕斯说九州天下登陆,利用AlphaFold的预测后九州天下登陆,他重新查看X射线数据九州天下登陆九州天下登陆九州天下登陆?!懊坏桨胄∈本偷贸隽苏方峁?九州天下登陆【胖萏煜碌锹?九州天下登陆!彼稻胖萏煜碌锹?,“太令人吃惊了九州天下登陆九州天下登陆!”

2018年DeepMind在CASP中获得成功以来九州天下登陆,诸多学术研究人员纷纷涌向深度学习技术九州天下登陆。结果九州天下登陆九州天下登陆,该领域其他方面的表现都有所提高九州天下登陆九州天下登陆。在中等难度目标方面九州天下登陆,其他竞争对手的平均最佳预测GDT得分为75,比两年前提高了10分九州天下登陆。不过还是完全追不上AlphaFold 2九州天下登陆九州天下登陆,因为该系统预测蛋白质结构平均得分高达92,就算面对最复杂的蛋白质平均得分也有87九州天下登陆九州天下登陆。穆尔特表示AlphaFold 2的预测“与实证方法不相上下”九州天下登陆,比如X射线晶体衍射九州天下登陆九州天下登陆九州天下登陆九州天下登陆。得出该结论后九州天下登陆,11月30日星期一CASP发表了重大声明:50年前的蛋白质折叠问题已经解决。

诺贝尔奖获得者九州天下登陆、英国最负盛名的科学机构皇家学会(The Royal Society)现任主席文基?拉马克里希南表示,AlphaFold 2在蛋白质折叠方面“取得了惊人的进步”九州天下登陆。有AlphaFold 2相助九州天下登陆九州天下登陆,X射线晶体衍射和电子显微镜之类既昂贵又耗时的实证方法可能都会变成过去式九州天下登陆。

蛋白质结构专家九州天下登陆九州天下登陆、曾任欧洲分子生物学实验室欧洲生物信息学研究所(European Molecular Biology Laboratory’s European Bioinformatics Institute)主任的珍妮特?桑顿表示,DeepMind的突破可以帮助科学家绘制出整个人类“蛋白质组”,即人体内所有蛋白质九州天下登陆。目前人体蛋白质中只有四分之一被用作药物靶点九州天下登陆,如果能够掌握其余蛋白质结构九州天下登陆,就可以为研发新疗法创造巨大的机会九州天下登陆九州天下登陆。她还表示九州天下登陆,人工智能软件还能够推动蛋白质工程发展九州天下登陆九州天下登陆九州天下登陆九州天下登陆,从而推动可持续发展九州天下登陆,帮科学家创造新作物品种九州天下登陆九州天下登陆,提升每英亩种植土地出产的营养价值,还可能研究出可以消化塑料的酶九州天下登陆九州天下登陆九州天下登陆。

不过九州天下登陆,当前的问题仍然是DeepMind如何应用AlphaFold 2九州天下登陆。哈萨比斯表示九州天下登陆,公司将努力确保软件“最大程度发挥积极的社会影响”九州天下登陆,他也承认公司尚未决定如何实现九州天下登陆九州天下登陆,只说明年某个时候将宣布九州天下登陆。哈萨比斯还告诉《财富》杂志九州天下登陆,DeepMind正在考虑如何围绕系统开发商业产品或建立合作伙伴关系九州天下登陆【胖萏煜碌锹?!跋低扯砸┪镅蟹⒁约爸埔┚尥纷饔枚挤浅4?九州天下登陆九州天下登陆?九州天下登陆!辈还硎?,商业产品的具体形式也尚未决定九州天下登陆。

对于DeepMind来说九州天下登陆,如果尝试商业化就意味着踏上新征程,而此前出售给Alphabet后公司还从来没有担心过收入九州天下登陆。公司简单成立了名叫DeepMind Health的部门,正在与英国国家医疗服务体系(U.K.’s National Health Service)合作开发应用程序九州天下登陆,该应用程序能够识别出存在患急性肾损伤风险的医院患者九州天下登陆九州天下登陆。但新闻报道称DeepMind的医院合作伙伴违反英国的数据本胖萏煜碌锹?;しㄏ蚱涮峁┦偻蚧颊叩囊搅萍锹己?九州天下登陆,合作陷入了争论九州天下登陆九州天下登陆。2019年九州天下登陆,DeepMind Health正式并入新的谷歌健康部门九州天下登陆九州天下登陆。当时DeepMind表示九州天下登陆,剥离健康业务可以专注自身的研究基础九州天下登陆九州天下登陆,而不必分心在谷歌已然很擅长的领域(如数据安全和客户支持)成立商业部门九州天下登陆九州天下登陆。

当然了,即便DeepMind要推出商业产品九州天下登陆,也不会是第一家尝试商业化的人工智能研究公司。总部位于旧金山的OpenAI可能是最接近DeepMind的竞争对手九州天下登陆九州天下登陆,如今越发商业化九州天下登陆九州天下登陆。去年,OpenAI发布的第一个商业产品九州天下登陆,企业能够使用人工智能界面将简短的手写提示组成连贯的长文本九州天下登陆九州天下登陆九州天下登陆。该人工智能被称为GPT九州天下登陆九州天下登陆,商业价值尚未得到证实九州天下登陆,而DeepMind的AlphaFold 2可能对制药公司或生物技术初创企业产生根本性的影响。在反垄断监管者调查Alphabet之际,拥有商业上可行的产品可能是很好的保险九州天下登陆九州天下登陆九州天下登陆,以防将来拆分Googleplex时DeepMind失去财大气粗的母公司无条件支持九州天下登陆。

有一点可以肯定九州天下登陆,DeepMind在蛋白质折叠领域的探索并未结束。CASP竞争只是围绕预测单个蛋白质的结构九州天下登陆。在生物学和医学领域九州天下登陆九州天下登陆,研究人员真正关心的通常是蛋白质如何相互作用九州天下登陆。一种蛋白质是如何与另一种蛋白质或与某种特定的小分子结合九州天下登陆九州天下登陆九州天下登陆?酶如何分解蛋白质九州天下登陆九州天下登陆?莫尔特说九州天下登陆九州天下登陆,预测相互作用和结合很可能成为未来CASP竞争的主要关注点九州天下登陆九州天下登陆。乔普表示九州天下登陆,下一步DeepMind打算应对相关挑战。

而在蛋白质折叠以外的领域,AlphaFold 2的成功肯定也会发挥影响九州天下登陆,将鼓励其他人在重大科学问题中应用深入学习九州天下登陆九州天下登陆。比如发现新的亚原子粒子,探索暗物质的奥秘九州天下登陆九州天下登陆,掌握核聚变或创造室温超导体九州天下登陆【胖萏煜碌锹?九州天下登陆?评锉硎揪胖萏煜碌锹剑谔焯逦锢硌Х矫?,DeepMind已经发挥了积极的作用九州天下登陆。Facebook的人工智能研究人员刚刚启动了深度学习项目九州天下登陆,希望寻找新的化学催化剂九州天下登陆。蛋白质折叠是基础科学当中第一个由人工智能解决的谜团,但肯定不会是最后一个九州天下登陆。(财富中文网)

译者:冯丰

审校:夏林

2016年3月13日深夜九州天下登陆,气温相当寒冷九州天下登陆,两名男子头戴羊毛帽九州天下登陆九州天下登陆,身穿厚厚的外套九州天下登陆,并肩走过韩国首尔市中心拥挤的街道九州天下登陆。二人热烈地交谈九州天下登陆,似乎完全忽视了周围饺子馆和烧烤店霓虹灯的诱惑。他们此行韩国肩负重任九州天下登陆,多年的努力终于能够看到结果九州天下登陆九州天下登陆。最棒的是九州天下登陆,他们刚刚成功了九州天下登陆。

这次散步是为了庆祝九州天下登陆。他们取得的成就将进一步巩固他们在计算机史上的地位九州天下登陆九州天下登陆。在古老的战略游戏围棋领域里九州天下登陆,他们开发的人工智能软件已经充分掌握了个中奥秘九州天下登陆九州天下登陆九州天下登陆,而且轻松击败了全球顶尖选手李世石九州天下登陆。如今九州天下登陆,两人开始讨论下一个目标九州天下登陆九州天下登陆,身后跟踪的纪录片摄制组捕捉到了当时的谈话。

“告诉你九州天下登陆,我们可以解决蛋白质折叠问题九州天下登陆?九州天下登陆九州天下登陆!钡旅姿?哈萨比斯对同伴大卫?西尔弗说九州天下登陆【胖萏煜碌锹?九州天下登陆!澳遣攀谴蟪删途胖萏煜碌锹健N蚁嘈畔衷谀芄蝗プ隽?九州天下登陆。以前我只是想过九州天下登陆九州天下登陆九州天下登陆,现在肯定可以做成九州天下登陆?九州天下登陆!惫人故亲懿课挥诼锥氐娜斯ぶ悄芄綝eepMind的联合创始人及首席执行官,正是该公司开发出了AlphaGo(阿尔法狗)九州天下登陆九州天下登陆。西尔弗则是DeepMind的计算机科学家九州天下登陆,负责领导AlphaGo团队九州天下登陆。

四年后,DeepMind实现了当年哈萨比斯在首尔散步时的设想九州天下登陆。公司开发出了人工智能系统,能够根据基因序列来预测蛋白质的复杂形状,精确到单个原子宽度??孔耪庀畛删?,DeepMind完成了需要近50年才能完成的科学探索九州天下登陆九州天下登陆九州天下登陆。1972年九州天下登陆,化学家克里斯蒂安?安芬森在诺贝尔奖获奖演说中提出九州天下登陆九州天下登陆,只有DNA才可以完全决定蛋白质的最终结构。这是惊人的猜想九州天下登陆九州天下登陆。当时连一个基因组都未完成测序九州天下登陆。安芬森的理论开创了计算生物学的分支,目标是用复杂的数学模拟蛋白质结构九州天下登陆,而不是实验。

DeepMind在围棋方面取得的成就确实很重要,但在围棋和计算机科学这两个相对偏僻的领域之外九州天下登陆九州天下登陆九州天下登陆,几乎没有产生什么具体影响九州天下登陆九州天下登陆。解决蛋白质折叠问题则完全不同九州天下登陆九州天下登陆,对大多数人来说都有变革意义九州天下登陆九州天下登陆。蛋白质是生命的基本组成部分,也是大多数生物过程背后的运行机制九州天下登陆。如果能够预测蛋白质的结构九州天下登陆,将彻底改变人们对疾病的理解九州天下登陆九州天下登陆,还可以为癌症到老年痴呆症等各种疾病开发全新也更具针对性的药物。新药上市时间有望加快九州天下登陆九州天下登陆,药物研发成本减少数年时间,成本也节约数亿美元九州天下登陆九州天下登陆,还可能会拯救很多生命。

DeepMind首创的新方法在抗击SARS-CoV-2(也就是新冠病毒)的斗争中已经取得成果九州天下登陆九州天下登陆。以下是以游戏知名的公司如何揭开生物学最大秘密的故事九州天下登陆九州天下登陆九州天下登陆。

形状莫测的积木

“蛋白质是细胞的主要机器九州天下登陆?!奔又荽笱Р死中5纳锕こ探淌谝炼?霍姆斯表示九州天下登陆。蛋白质的结构和形状对其工作方式至关重要九州天下登陆,构成蛋白质分子晶格的小“口袋”是发生各种化学反应的地方九州天下登陆。如果能够找到某种化学物质与其中一个口袋结合,这种物质就可以作为药物阻止或加速生物过程九州天下登陆。生物工程师还能够创造出自然界中从未出现的全新蛋白质九州天下登陆,而且具有独特的疗效【胖萏煜碌锹?!叭绻颐强梢岳玫鞍字实牧α?九州天下登陆,合理地设计用途,就能够制造出神奇的自我组装机器九州天下登陆,发挥一些作用【胖萏煜碌锹剑”霍姆斯说九州天下登陆九州天下登陆。

但为了确保蛋白质达到想要的效果九州天下登陆,把握其形状很重要。

蛋白质由氨基酸链组成九州天下登陆,常被比作细绳上的珠子九州天下登陆。至于珠子按照什么顺序穿起来,信息都存储在DNA里九州天下登陆。但是九州天下登陆九州天下登陆,根据简单的基因指令很难预测完整的链条会形成多复杂的物理形状。氨基酸链根据分子间吸引和排斥的电化学规则折叠成某种结构九州天下登陆。形状常常类似绳索和丝带缠绕而成的抽象雕塑:褶皱的带状物加上莫比乌斯带九州天下登陆九州天下登陆,就像卷曲环状的螺旋。20世纪60年代九州天下登陆九州天下登陆,物理学家和分子生物学家塞勒斯?列文塔尔发现九州天下登陆九州天下登陆,一种蛋白质的形状有太多可能性九州天下登陆。如果想通过随机尝试组合找出蛋白质的准确结构,花的时间比已知宇宙的年龄还长九州天下登陆。而且九州天下登陆,几毫秒内蛋白质就会完成折叠九州天下登陆九州天下登陆九州天下登陆。该观察被称为列文塔尔悖论九州天下登陆。

到目前为止九州天下登陆九州天下登陆,只有通过所谓X射线晶体衍射才可以接近准确了解蛋白质的结构。顾名思义九州天下登陆,首先需要将含有数百万蛋白质的溶液转化为晶体,本身就是很复杂的化学过程。然后九州天下登陆九州天下登陆,X射线发射到晶体上,科学家从获得的衍射图逆向工作九州天下登陆,从而建立蛋白质图像九州天下登陆九州天下登陆。而且九州天下登陆九州天下登陆九州天下登陆,还不是随便什么X射线都可以九州天下登陆。要想获得很多蛋白质的结构,要由圆形的,大小堪比体育场的同步加速器发射X射线。

过程既昂贵又耗时九州天下登陆。根据多伦多大学(University of Toronto)的研究人员估计九州天下登陆,用X射线晶体衍射法测定单个蛋白质的结构需要约12个月九州天下登陆,花费约12万美元九州天下登陆九州天下登陆。已知的蛋白质超过2亿种九州天下登陆九州天下登陆,每年大约能够发现3000万种九州天下登陆,但其中只有不到20万种蛋白质通过X射线晶体衍射或其他实验方法绘制出了结构图九州天下登陆?九州天下登陆!叭死嗟奈拗潭日谘杆僭龀ぞ胖萏煜碌锹?九州天下登陆?九州天下登陆九州天下登陆!奔扑阄锢硌Ъ以己?乔普说九州天下登陆,现在他担任DeepMind的高级研究员九州天下登陆,负责领导蛋白质折叠团队九州天下登陆。

过去50年里九州天下登陆,自从克里斯蒂安?安芬森发表著名演讲以来,科学家们一直努力使用高性能计算机上运行的复杂数学模型加速分析蛋白质结构?九州天下登陆九州天下登陆!盎旧暇褪浅⑹栽诩扑慊锎唇ǖ鞍字实氖炙?,然后尝试操作?!甭砝锢即笱У南赴镅Ш头肿右糯Ы淌谠己?穆尔特说,他也是用数学算法通过DNA序列预测蛋白质结构的先驱九州天下登陆。问题是,预测出的折叠模式经常有误,与科学家通过X射线晶体衍射发现的结构并不一致九州天下登陆。事实上大约10年前,很少有模型预测大蛋白质形状时准确率可以超过三分之一。

蛋白质折叠模拟要占用庞大的算力九州天下登陆。2000年九州天下登陆九州天下登陆,研究人员创建了名叫Fold@home的“公民科学”项目九州天下登陆九州天下登陆,人们能够捐出个人电脑和游戏机的闲置处理能力运行蛋白质折叠模拟。所有设备通过互联网连接在一起九州天下登陆九州天下登陆,从而打造全世界最强大的虚拟超级计算机之一九州天下登陆。大家都希望帮研究人员摆脱列文塔尔悖论九州天下登陆,通过随机实验和试错准确判断蛋白质的结构九州天下登陆。目前该项目仍然在进行中,已经为超过225篇论文提供了数据九州天下登陆九州天下登陆,研究内容是与多种疾病相关的蛋白质九州天下登陆。

尽管拥有强大的处理能力九州天下登陆,Fold@home仍然深陷列文塔尔悖论,因为算法试图搜索所有可能的排列,从而找到蛋白质结构九州天下登陆。破解蛋白质折叠的关键在于跳过艰苦搜索的过程九州天下登陆,发现蛋白质DNA序列与结构联系的神秘模式,从而让计算机踏上全新捷径九州天下登陆九州天下登陆九州天下登陆,直接从遗传学领域转到准确绘制形状九州天下登陆。

严肃的游戏

德米斯?哈萨比斯对蛋白质折叠的兴趣始于一场游戏九州天下登陆九州天下登陆九州天下登陆九州天下登陆,他对很多事都是这样九州天下登陆。哈萨比斯曾经是国际象棋天才九州天下登陆,13岁时已经成为大师,一度在同年龄里排名世界第二九州天下登陆。他对象棋的热爱后来转向对两件事感兴趣:一是游戏设计九州天下登陆,二是研究自身意识的内在机制九州天下登陆。他高中时开始为电子游戏公司工作九州天下登陆,在剑桥大学(University of Cambridge)学习计算机科学后,1998年创立了电脑游戏初创公司Elixir Studios九州天下登陆九州天下登陆。

尽管曾经研发出两款获奖游戏九州天下登陆,最终Elixir还是卖掉知识产权并关闭公司,哈萨比斯从伦敦大学学院(University College London)获得了认知神经科学博士学位九州天下登陆。彼时他已经开始踏上漫漫征途九州天下登陆九州天下登陆九州天下登陆,后来2010年联合创立了DeepMind九州天下登陆。他开始研发通用人工智能软件九州天下登陆,不仅可以学习执行很多任务,有些甚至比人类完成得更好九州天下登陆。哈萨比斯曾经说过,DeepMind的远大目标是“解决智能问题九州天下登陆九州天下登陆,然后解决所有其他问题【胖萏煜碌锹?九州天下登陆九州天下登陆!惫人挂苍凳揪胖萏煜碌锹?,蛋白质折叠可能就是“其他问题”里的第一批九州天下登陆九州天下登陆。

2009年九州天下登陆九州天下登陆,哈萨比斯在麻省理工学院(Massachusetts Institute of Technology)攻读博士后时九州天下登陆,听说了一款名为Foldit的在线游戏九州天下登陆九州天下登陆。Foldit是由华盛顿大学(University of Washington)的研究人员设计九州天下登陆九州天下登陆,跟Fold@home类似九州天下登陆,也是有关蛋白质折叠的“公民科学”项目九州天下登陆。但Foldit并不是整合闲置的微芯片九州天下登陆,而是利用闲置的大脑。

Foldit是类似益智游戏的游戏九州天下登陆,并不掌握生物学领域知识的人类玩家比赛折叠蛋白质,如果能够得到合理的形状就可以获得积分九州天下登陆九州天下登陆。然后,研究人员分析得分最高的设计,看是否有助于破解蛋白质结构问题。游戏已经吸引成千上万玩家九州天下登陆九州天下登陆,并且一些记录案例中得到的蛋白质结构比研究蛋白质折叠的计算机算法更准确?!按诱飧鼋嵌壤纯?九州天下登陆,我觉得游戏很有趣,想着能不能利用游戏的上瘾性和游戏的乐趣,不仅让人们玩得开心九州天下登陆九州天下登陆九州天下登陆,也做一些对科学有用的事情九州天下登陆【胖萏煜碌锹剑”哈萨比斯说九州天下登陆。

Foldit能够抓住哈萨比斯的想象力还有另一个原因九州天下登陆。其实游戏是一种强化学习行为九州天下登陆九州天下登陆九州天下登陆,特别适合训练人工智能。软件可以通过试验和试错从经验中学习,从而更好地完成任务九州天下登陆九州天下登陆。在游戏里软件能够无休止地试验九州天下登陆,反复地玩,逐步改进,不对现实世界造成伤害的情况下提升技能水平九州天下登陆,直到超过人类九州天下登陆九州天下登陆。游戏也有现成的方法判断某个特定的动作或某组动作是否有效九州天下登陆,即积分和胜利。种种指标可以提供非常明确的标准衡量表现,在现实世界很多问题里则无法如此处理。现实世界遇到问题时九州天下登陆九州天下登陆九州天下登陆,最有效的方法可能比较模糊九州天下登陆,“获胜”的概念也可能不适用。

DeepMind的基础主要是将强化学习与称为深度学习的人工智能相结合九州天下登陆。深度学习是基于神经网络的人工智能九州天下登陆,所谓神经网络是大致基于人脑工作原理的软件九州天下登陆。这种情况下九州天下登陆,软件没有实际的神经细胞网络,而是一堆虚拟神经元分层排列九州天下登陆,初始输入层接收数据,按照权重分配后传递到中间层,中间层依次执行相同操作九州天下登陆,最终传递到输出层,输出层汇总各项加权值并算出结果九州天下登陆。网络能够调整各项权重,直到产生理想的结果九州天下登陆九州天下登陆,例如准确识别猫的照片或国际象棋获胜九州天下登陆九州天下登陆。之所以被称为“深度学习”九州天下登陆九州天下登陆,并不是因为产生的结果一定深刻九州天下登陆,当然也有可能深刻九州天下登陆,但主要原因是网络由许多层构成九州天下登陆,所以可以说具有深度九州天下登陆九州天下登陆。

DeepMind最初成功是用“深度强化学习”创建软件,自学玩经典的雅达利电脑游戏九州天下登陆九州天下登陆,如《乒乓球》(Pong)九州天下登陆、《突围》(Breakout)和《太空入侵者》(Space Invaders)等九州天下登陆九州天下登陆九州天下登陆,而且水平超过人类九州天下登陆。正是这一成就让DeepMind受到谷歌(Google)等科技巨头的关注九州天下登陆,据报道九州天下登陆,2014年谷歌以4亿英镑(当时超过6亿美元)收购了DeepMind九州天下登陆九州天下登陆九州天下登陆。之后公司主攻围棋并开发了AlphaGo系统九州天下登陆九州天下登陆,2016年击败了李世石九州天下登陆。DeepMind接着开发了名叫AlphaZero的更通用系统版本,几乎能够学会所有两玩家回合制游戏九州天下登陆,在这种游戏中,玩家都可以获得充分信息(没有机会隐藏信息,例如牌面朝下放置或隐藏位置)九州天下登陆。去年九州天下登陆,公司开发的系统还在高度复杂的即时战略游戏《星际争霸2》(Starcraft 2)中击败了顶尖的人类职业电竞玩家。

但哈萨比斯表示九州天下登陆,一直认为公司在游戏方面的探索是完善人工智能系统的方式九州天下登陆,之后能够应用于现实世界挑战,尤其是科学领域九州天下登陆?九州天下登陆!氨热皇茄盗烦?,但训练到底为了什么九州天下登陆?最终是为了创造新知识九州天下登陆【胖萏煜碌锹?!彼?九州天下登陆。

DeepMind并非具有产品和客户的传统业务,本质上是推动人工智能前沿的研究实验室九州天下登陆九州天下登陆。公司的很多开发方法都已经公开九州天下登陆九州天下登陆,供所有人使用或借鉴九州天下登陆九州天下登陆。不过某些方面的进步对姊妹公司谷歌也颇有帮助。

DeepMind团队由工程师和科学家组成九州天下登陆,帮助谷歌将尖端的人工智能技术融入产品九州天下登陆。DeepMind的技术已经渗透各处九州天下登陆,从谷歌地图(Google Maps)到数字助理九州天下登陆,再到协助管理安卓手机电池电量的系统九州天下登陆。谷歌为此向DeepMind支付费用,母公司Alphabet继续承担DeepMind带来的额外亏损九州天下登陆?九州天下登陆?魉鸸婺2⒉恍【胖萏煜碌锹?,2018年九州天下登陆,公司亏损4.7亿英镑(当时约合5.1亿美元),这也是通过英国的商业注册机构公司登记局(Companies House)可以查到的最新一年公开记录。

不过如今员工超过1000人的DeepMind,还有一整个部门只负责人工智能的科学应用九州天下登陆。该部门的负责人为39岁的印度人普什米?科里九州天下登陆,他加入DeepMind之前曾经在微软从事人工智能研究九州天下登陆。他表示九州天下登陆,DeepMind的目标是解决“根节点”问题,这是数据科学家的惯用语九州天下登陆,意思是希望解决能够解锁很多科学路径的基础问题九州天下登陆。蛋白质折叠就是根节点之一九州天下登陆九州天下登陆九州天下登陆,科里说九州天下登陆九州天下登陆。

“蛋白质折叠的奥运会”

1994年九州天下登陆,当很多科学家刚开始使用复杂的计算机算法预测蛋白质折叠方式时,马里兰大学的生物学家墨尔特决定开办竞赛九州天下登陆九州天下登陆,用公正的方法评估哪种算法最好。他把比赛称为蛋白质结构预测关键评估(简称为CASP)九州天下登陆九州天下登陆,之后每两年举办一次九州天下登陆九州天下登陆。

赛事具体如下九州天下登陆,美国国立卫生研究院资助的蛋白质结构预测中心主办CASP,并说服从事X射线晶体衍射和其他实证研究的研究人员提供尚未公布的蛋白质结构九州天下登陆,要求在CASP竞赛结束之前不公开相关结构。然后CASP将蛋白质DNA序列发给参赛者,参赛者用算法预测蛋白质结构九州天下登陆九州天下登陆。CASP判断预测与X射线晶体学家和实验学家发现的实际结构接近程度九州天下登陆,然后根据算法对各种蛋白质预测的平均得分排名九州天下登陆九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?!拔页浦鞍字收鄣绲陌略嘶?九州天下登陆九州天下登陆?九州天下登陆!惫人顾?九州天下登陆九州天下登陆。2016年AlphaGo击败李世石后不久九州天下登陆九州天下登陆,DeepMind就打算赢得金牌。

DeepMind组建了小规模精干的团队,由六名机器学习研究人员和工程师组成九州天下登陆?九州天下登陆!叭谩ú拧胧质俏颐堑睦砟罹胖萏煜碌锹?九州天下登陆【胖萏煜碌锹?!惫人顾?。公司里并不缺乏人才九州天下登陆?九州天下登陆!扒拔锢硌Ъ?、前生物学家九州天下登陆,大家都四处闲逛九州天下登陆九州天下登陆?九州天下登陆九州天下登陆!惫人褂械闾湫苑蔷胖萏煜碌锹?【胖萏煜碌锹剑“他们永远不知道之前的专业知识什么时候可以突然发挥作用九州天下登陆?九州天下登陆九州天下登陆!弊詈笸哦映稍痹黾拥?0人左右九州天下登陆。

不过九州天下登陆,DeepMind还是认为团队里至少要有一位真正的蛋白质折叠专家九州天下登陆,后来选中了约翰?乔普九州天下登陆。35岁的乔普像个大男孩九州天下登陆,瘦得皮包骨九州天下登陆,一头蓬乱斜梳的棕色头发九州天下登陆,有点像20世纪90年代末高中车库乐队的低音吉他手九州天下登陆。他在剑桥大学获得理论凝聚态物理硕士学位,之后在纽约由对冲基金亿万富翁大卫?肖创立的独立研究实验室D.E.Shaw Research工作九州天下登陆九州天下登陆。实验室专门研究计算生物学九州天下登陆,包括蛋白质模拟九州天下登陆。后来乔普在芝加哥大学获得了计算生物物理学博士学位九州天下登陆,导师为卡尔?弗里德和托宾?索斯尼克九州天下登陆,两位科学家皆因推动蛋白质折叠模型进步出名九州天下登陆?九州天下登陆九州天下登陆!拔以礑eepMind对解决蛋白质结构有兴趣九州天下登陆【胖萏煜碌锹?!彼?九州天下登陆。于是他申请并顺利加入九州天下登陆。

哈萨比斯和DeepMind团队的第一直觉是,蛋白质折叠能够用与围棋完全相同的方式解决九州天下登陆九州天下登陆,即深度强化学习。事实证明存在问题。首先九州天下登陆九州天下登陆九州天下登陆,蛋白质折叠结构的可能性比围棋的步数还要多九州天下登陆。更重要的是九州天下登陆九州天下登陆九州天下登陆,DeepMind让工智能系统AlphaGo与自己对弈就可以掌握围棋的玩法?九州天下登陆九州天下登陆!八钥杀刃圆⒉桓?九州天下登陆,因为蛋白质折叠不是双人游戏九州天下登陆九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?!惫人顾?九州天下登陆九州天下登陆,“有点违背自然九州天下登陆九州天下登陆?九州天下登陆!?/p>

DeepMind很快发现九州天下登陆,如果使用所谓监督式深度学习的人工智能培训方法,就能够更简便地取得进步九州天下登陆九州天下登陆。这是大多数商业应用里使用的人工智能九州天下登陆,神经网络通过一组既定数据输入和相应输出九州天下登陆九州天下登陆,可以学习如何将给定的输入与给定输出相匹配九州天下登陆。具体到蛋白质结构九州天下登陆,DeepMind已经掌握约170000个蛋白质结构九州天下登陆,能够作为训练数据九州天下登陆。蛋白质数据库(PDB)是已知三维蛋白质形状及遗传序列的公共存储库九州天下登陆,可以公开查询相关结构九州天下登陆。

一些生物学家已经使用监督式深度学习预测蛋白质如何折叠九州天下登陆九州天下登陆。但此类人工智能系统表现最佳的正确率也只有50%九州天下登陆,对生物学家或医学研究人员没有什么帮助九州天下登陆九州天下登陆,尤其是对结构未知的蛋白质九州天下登陆九州天下登陆,因为无法确定某次特定预测是否正确。

有种技术很有希望九州天下登陆,其理念是基于蛋白质的进化史划分为不同的家族。各种家族里可能在一个DNA序列中找到相距遥远但似乎会同时突变的氨基酸对九州天下登陆。此类所谓“共同进化”的现象很有帮助九州天下登陆,因为共同进化的蛋白质很可能在蛋白质折叠结构中有联系九州天下登陆九州天下登陆。位于芝加哥的丰田技术研究所(Toyota Technological Institute)的科学家徐金波(音译)率先利用深入学习共同进化数据预测氨基酸联系九州天下登陆九州天下登陆。这种方法有点像是在连接点游戏里寻找点九州天下登陆?九州天下登陆?蒲Ъ胰匀灰闷渌砑页龅阒涞南呔胖萏煜碌锹骄胖萏煜碌锹?九州天下登陆,过程中经常出错九州天下登陆。有时候连点都找不准九州天下登陆。

在2018年的CASP竞赛中九州天下登陆九州天下登陆九州天下登陆九州天下登陆,DeepMind应用了共同进化和预测联系的基本思想九州天下登陆九州天下登陆,但增加了两个重要的转折点九州天下登陆九州天下登陆。首先九州天下登陆九州天下登陆,系统没有试图确定两个氨基酸是否有联系九州天下登陆九州天下登陆,也就是二进制输出(即两个氨基酸可能有联系九州天下登陆,也可能没有联系)九州天下登陆,而是决定让算法预测蛋白质里所有氨基酸对之间的距离九州天下登陆九州天下登陆。

在多数分子生物学家看来九州天下登陆,这种方法似乎违反直觉,不过值得称赞的是,徐金波也独立提出了类似方法。毕竟九州天下登陆,联系才是最重要的九州天下登陆。对于DeepMind的深度学习专家来说,很明显距离是让神经网络发挥作用更好的指标九州天下登陆,科里表示【胖萏煜碌锹?!罢庵皇巧疃妊暗幕〔糠志胖萏煜碌锹?九州天下登陆九州天下登陆,如果与决策相关存在不确定性九州天下登陆,最好是让神经网络整合不确定性九州天下登陆九州天下登陆,并决定如何应对【胖萏煜碌锹?九州天下登陆!彼?九州天下登陆九州天下登陆。与联系不一样九州天下登陆,距离包含了神经网络可调整和使用的丰富信息九州天下登陆。

DeepMind另一项让人意外之处是引入第二个神经网络九州天下登陆,用于预测氨基酸对之间的角度九州天下登陆。有了距离和角度两个因素,DeepMind的算法就能够算出蛋白质结构的大致轮廓。然后九州天下登陆九州天下登陆,系统使用另一种非人工智能算法改进结构九州天下登陆九州天下登陆九州天下登陆。DeepMind将相关组件整合到名为AlphaFold的系统中九州天下登陆九州天下登陆九州天下登陆,横扫了2018年CASP(又称为第13届CASP,因为是两年一度比赛举办第13次九州天下登陆。)比赛里结构最复杂的43种蛋白质中,AlphaFold在25种蛋白质中得分最高九州天下登陆。第二名仅在三种蛋白质里得到高分九州天下登陆。研究结果震惊了全行业九州天下登陆。如果说之前还有人怀疑深度学习究竟是不是解决蛋白质折叠问题最有希望的方法九州天下登陆九州天下登陆,AlphaFold让所有人再无疑问九州天下登陆。

回到白板

尽管如此九州天下登陆,DeepMind还远没有达到哈萨比斯的目标,即完全解决蛋白质折叠问题九州天下登陆。AlphaFold准确率只有一半九州天下登陆九州天下登陆,第13届CASP的104个蛋白质中九州天下登陆,准确度可以达到X射线晶体衍射水平的只有三个九州天下登陆?!拔颐遣恢幌朐贑ASP竞赛中夺魁九州天下登陆九州天下登陆,而是想真正解决问题。我们想打造对生物学家很重要的系统九州天下登陆【胖萏煜碌锹?九州天下登陆!鼻瞧账稻胖萏煜碌锹?。

2018年CASP的结果公布后不久九州天下登陆,DeepMind就开始加倍努力九州天下登陆九州天下登陆。乔普负责扩大的团队九州天下登陆九州天下登陆。团队并未简单地在AlphaFold基础上改进九州天下登陆九州天下登陆,而是返回原点九州天下登陆,集思广益寻找完全不同的想法,他们希望新创意能够帮软件将精确度提升到更接近X射线晶体衍射级别。

乔普表示,接下来是整个项目中最可怕也最令人沮丧的时期之一九州天下登陆九州天下登陆,因为什么办法都没有?九州天下登陆!拔颐腔巳鲈?九州天下登陆,结果都达不到第13届CASP的水平九州天下登陆,开始真正感觉到恐慌九州天下登陆?!彼?九州天下登陆。不过当时研究人员的尝试出现了一些改进,没到6个月系统已经比最初的AlphaFold有了明显改进九州天下登陆。之后两年里一直延续该模式九州天下登陆九州天下登陆九州天下登陆,乔普说。先是三个月一无所获九州天下登陆,接下来三个月快速发展九州天下登陆,接着又是平台期。

哈萨比斯说九州天下登陆九州天下登陆,DeepMind以前的项目也出现过类似模式,包括围棋项目九州天下登陆,还有复杂的即时战略游戏《星际争霸2》项目九州天下登陆九州天下登陆。他说九州天下登陆,公司克服问题的管理策略就是交替采取两种不同的工作方式九州天下登陆。第一种哈萨比斯称之为“攻击模式”九州天下登陆,尽可能推动团队九州天下登陆九州天下登陆,追求当前系统可以达到的极致表现九州天下登陆九州天下登陆九州天下登陆。然后九州天下登陆九州天下登陆,全力以赴努力的效果似乎耗尽时九州天下登陆九州天下登陆,他就开始转向所谓的“创新模式”。期间哈萨比斯不再对团队施加压力九州天下登陆九州天下登陆,容忍甚至期待出现暂时性的后退九州天下登陆九州天下登陆,从而为研究人员和工程师提供修补新想法和尝试新手段的空间九州天下登陆九州天下登陆。他说:“要鼓励人们提出尽可能多的疯狂想法九州天下登陆九州天下登陆,还要头脑风暴?九州天下登陆!备媚J酵ǔD芄煌贫阅艹鱿中路稍?,让团队切换回攻击模式九州天下登陆九州天下登陆。

生日大礼

2019年11月21日,DeepMind蛋白质折叠团队的研究员凯萨伦?图雅苏那科年满30岁。这一天也会因为另一个原因值得纪念。图雅苏那科拥有牛津大学(University of Oxford)计算生物学博士学位九州天下登陆九州天下登陆九州天下登陆九州天下登陆,在团队里负责为蛋白质折叠人工智能开发新测试集九州天下登陆,新款人工智能叫AlphaFold 2九州天下登陆九州天下登陆,是DeepMind为2020年的CASP竞赛新开发的系统九州天下登陆。那天早上她打开办公电脑时九州天下登陆,收到系统对一批大约50个蛋白质序列预测的评估九州天下登陆九州天下登陆,所有序列均为最近才添加到蛋白质数据库中九州天下登陆。她愣了一下九州天下登陆九州天下登陆九州天下登陆,然后大吃一惊九州天下登陆。AlphaFold 2确实一直在改进,但对该组蛋白质的预测结果惊人地准确九州天下登陆九州天下登陆九州天下登陆。系统对好几个蛋白质结构结构预测误差在1.5埃以内九州天下登陆,埃的距离单位相当于十分之一纳米九州天下登陆,或大约一个原子的宽度九州天下登陆。

自称“团队悲观主义者”的图雅苏那科说九州天下登陆,第一反应并不是高兴而是有点想吐九州天下登陆九州天下登陆?九州天下登陆!拔业笔焙芎ε戮胖萏煜碌锹健九州天下登陆!彼?九州天下登陆。结果实在太好,她以为是自己犯了错九州天下登陆九州天下登陆,可能准备测试集时无意中把人工智能在训练数据里见过的几个蛋白质加了进来。如此一来AlphaFold 2基本上就可以作弊九州天下登陆,轻易预测出准确的结构九州天下登陆九州天下登陆。图雅苏那科回忆说九州天下登陆,当时坐在DeepMind自助餐厅俯瞰伦敦的圣潘克拉斯车站(St. Pancras Station)九州天下登陆,一杯接一杯地喝茶努力平复心情九州天下登陆。随后九州天下登陆,她和其他团队成员花了一整天,直到深夜才下班九州天下登陆九州天下登陆九州天下登陆,之后几天也是如此九州天下登陆,他们坐在工作站旁埋头梳理AlphaFold 2的训练数据,希望找出错误所在九州天下登陆。

然而一个错误也没有九州天下登陆。事实是九州天下登陆,新系统在预测表现方面实现了巨大飞跃。AlphaFold 2与之前版本完全不同九州天下登陆九州天下登陆九州天下登陆。人工智能不再只是各成分组合九州天下登陆,一个用来预测氨基酸之间的距离,另一个预测角度九州天下登陆,然后用第三个软件联系起来九州天下登陆九州天下登陆。现在的人工智能用单一的神经网络直接从DNA序列进行推理九州天下登陆。虽然系统仍然接受进化信息九州天下登陆九州天下登陆,从而确定研究的蛋白质是否与以前见过的蛋白质有共同的祖先九州天下登陆,并仔细检查目标蛋白质的DNA序列与其他已知序列之间的一致性九州天下登陆,但不再需要哪些氨基酸对共同进化的明确数据?九州天下登陆九州天下登陆!拔颐遣⑽刺峁└嘈畔?九州天下登陆,反而减少了信息九州天下登陆【胖萏煜碌锹?九州天下登陆九州天下登陆!鼻瞧账稻胖萏煜碌锹?。系统可以自由地得出见解九州天下登陆,即祖先何时可能决定蛋白质的部分形状九州天下登陆九州天下登陆九州天下登陆,以及何时可能彻底偏离九州天下登陆【胖萏煜碌锹?;痪浠八?九州天下登陆,系统根据经验培养出直觉九州天下登陆九州天下登陆,就像老练的人类科学家一样九州天下登陆九州天下登陆。

新系统的核心是“注意力”机制九州天下登陆,顾名思义,注意力是让深度学习系统专注于某组输入九州天下登陆,并对相关输入加大权重九州天下登陆九州天下登陆。举例来说,在识别猫的系统里九州天下登陆,系统可能学会注意耳朵的形状九州天下登陆,也会学习在鼻子附近寻找胡须九州天下登陆。乔普比较了AlphaFold 2的功能与玩拼图游戏九州天下登陆九州天下登陆,过程中“能够将某些部分拼凑在一起而且非常确定九州天下登陆九州天下登陆,得到不同的本地解决方案九州天下登陆九州天下登陆九州天下登陆,然后想办法将相关问题连接起来九州天下登陆九州天下登陆【胖萏煜碌锹骄胖萏煜碌锹?!鼻瞧账?九州天下登陆九州天下登陆,神经网络的中层已经学会根据对DNA序列的分析推理几何和空间排列,以及氨基酸对如何连接。

DeepMind曾经在128个“张量处理核心”上训练AlphaFold 2九州天下登陆,张量处理核心是在16块专门用于深度学习的计算机芯片上创建的数字运算大脑,芯片由谷歌设计并在数据中心使用九州天下登陆,公司称连续运行了数周。(128个专用的人工智能核心大约相当于100到200块强大的图形处理芯片九州天下登陆九州天下登陆,可以在Xbox或PlayStation上呈现极其炫目的动画效果九州天下登陆。)公司表示九州天下登陆,经过训练的系统提取DNA序列后“几天内”就能够完成整个结构预测九州天下登陆九州天下登陆。

AlphaFold 2与前一代相比有个优势九州天下登陆九州天下登陆,就是提供可信程度九州天下登陆,即系统对结构里每种氨基酸的预测都有信心分数九州天下登陆。如果说AlphaFold 2可以切实帮到生物学家和医学研究人员,这项指标至关重要,因为研究者需要清楚何时能够合理依赖模型,以及何时需要更加谨慎。

尽管测试结果惊人九州天下登陆九州天下登陆,DeepMind仍然不能确定AlphaFold 2的预测效果九州天下登陆。新冠病毒来袭时九州天下登陆,公司才得到重要的线索九州天下登陆。今年3月,AlphaFold 2可以预测出六种与SARS-CoV-2(引发疫情的病毒)相关但未被研究的蛋白质结构,后来科学家使用所谓低温电子显微镜的经验方法证实了其中一种。由此能够充分看出AlphaFold 2对现实世界的影响力九州天下登陆九州天下登陆。

惊人的结果

CASP比赛在5月到8月之间举行。蛋白质结构预测中心发布多批目标蛋白质九州天下登陆,之后参赛方提交结构预测进行评估九州天下登陆。今年比赛排名于11月30日公布九州天下登陆九州天下登陆。

每次预测均可以得到“全球距离测试总分”九州天下登陆九州天下登陆,简称GDT的指标评分九州天下登陆,该指标实际上看预测结果与通过实证方法(如X射线晶体衍射或电子显微镜)得到的结构接近程度,单位为埃九州天下登陆九州天下登陆。CASP的主席穆尔特表示九州天下登陆九州天下登陆,满分是100分,如果得分能够达到90分或以上九州天下登陆九州天下登陆,说明与实证方法相当九州天下登陆。根据CASP组织者判断的结构难度九州天下登陆,蛋白质也会划分不同的组九州天下登陆九州天下登陆。

穆尔特看到AlphaFold 2的结果时简直不敢相信。他就像几个月前的图雅苏那科一样九州天下登陆九州天下登陆九州天下登陆,刚开始的想法是出错了九州天下登陆九州天下登陆。也许比赛中一些蛋白质序列以前发表过九州天下登陆?又或者DeepMind也许设法获得了未发布数据的缓存九州天下登陆?

为了核实九州天下登陆,他请位于德国图宾的根马克斯?普朗克发展生物学研究所(Max Planck Institute for Developmental Biology)的蛋白质进化系主任安德烈?卢帕斯帮忙验证。卢帕斯让AlphaFold 2预测一个自己确信没有见过的结构九州天下登陆,因为卢帕斯利用X射线结晶衍射从未成功观测到该蛋白质的关键部分九州天下登陆九州天下登陆。近十年来九州天下登陆九州天下登陆,卢帕斯一直因为该部分缺失而伤脑筋九州天下登陆九州天下登陆,但就是观测不到准确的形状九州天下登陆九州天下登陆。卢帕斯说九州天下登陆九州天下登陆九州天下登陆,利用AlphaFold的预测后九州天下登陆,他重新查看X射线数据九州天下登陆九州天下登陆?!懊坏桨胄∈本偷贸隽苏方峁?九州天下登陆【胖萏煜碌锹?!彼?,“太令人吃惊了九州天下登陆九州天下登陆!”

2018年DeepMind在CASP中获得成功以来,诸多学术研究人员纷纷涌向深度学习技术。结果九州天下登陆,该领域其他方面的表现都有所提高。在中等难度目标方面九州天下登陆,其他竞争对手的平均最佳预测GDT得分为75,比两年前提高了10分九州天下登陆。不过还是完全追不上AlphaFold 2,因为该系统预测蛋白质结构平均得分高达92,就算面对最复杂的蛋白质平均得分也有87九州天下登陆。穆尔特表示AlphaFold 2的预测“与实证方法不相上下”九州天下登陆九州天下登陆,比如X射线晶体衍射。得出该结论后九州天下登陆九州天下登陆,11月30日星期一CASP发表了重大声明:50年前的蛋白质折叠问题已经解决九州天下登陆。

诺贝尔奖获得者、英国最负盛名的科学机构皇家学会(The Royal Society)现任主席文基?拉马克里希南表示,AlphaFold 2在蛋白质折叠方面“取得了惊人的进步”九州天下登陆九州天下登陆。有AlphaFold 2相助九州天下登陆,X射线晶体衍射和电子显微镜之类既昂贵又耗时的实证方法可能都会变成过去式。

蛋白质结构专家、曾任欧洲分子生物学实验室欧洲生物信息学研究所(European Molecular Biology Laboratory’s European Bioinformatics Institute)主任的珍妮特?桑顿表示九州天下登陆九州天下登陆,DeepMind的突破可以帮助科学家绘制出整个人类“蛋白质组”九州天下登陆九州天下登陆九州天下登陆,即人体内所有蛋白质九州天下登陆。目前人体蛋白质中只有四分之一被用作药物靶点,如果能够掌握其余蛋白质结构,就可以为研发新疗法创造巨大的机会。她还表示,人工智能软件还能够推动蛋白质工程发展九州天下登陆九州天下登陆,从而推动可持续发展九州天下登陆,帮科学家创造新作物品种九州天下登陆,提升每英亩种植土地出产的营养价值九州天下登陆,还可能研究出可以消化塑料的酶九州天下登陆。

不过,当前的问题仍然是DeepMind如何应用AlphaFold 2。哈萨比斯表示九州天下登陆九州天下登陆,公司将努力确保软件“最大程度发挥积极的社会影响”九州天下登陆九州天下登陆九州天下登陆,他也承认公司尚未决定如何实现九州天下登陆,只说明年某个时候将宣布九州天下登陆九州天下登陆。哈萨比斯还告诉《财富》杂志,DeepMind正在考虑如何围绕系统开发商业产品或建立合作伙伴关系九州天下登陆九州天下登陆?!跋低扯砸┪镅蟹⒁约爸埔┚尥纷饔枚挤浅4?【胖萏煜碌锹?!辈还硎?九州天下登陆,商业产品的具体形式也尚未决定九州天下登陆九州天下登陆。

对于DeepMind来说,如果尝试商业化就意味着踏上新征程九州天下登陆,而此前出售给Alphabet后公司还从来没有担心过收入九州天下登陆。公司简单成立了名叫DeepMind Health的部门九州天下登陆九州天下登陆,正在与英国国家医疗服务体系(U.K.’s National Health Service)合作开发应用程序九州天下登陆九州天下登陆九州天下登陆,该应用程序能够识别出存在患急性肾损伤风险的医院患者九州天下登陆九州天下登陆。但新闻报道称DeepMind的医院合作伙伴违反英国的数据本胖萏煜碌锹?九州天下登陆九州天下登陆;しㄏ蚱涮峁┦偻蚧颊叩囊搅萍锹己?九州天下登陆九州天下登陆,合作陷入了争论。2019年九州天下登陆九州天下登陆九州天下登陆,DeepMind Health正式并入新的谷歌健康部门。当时DeepMind表示,剥离健康业务可以专注自身的研究基础九州天下登陆,而不必分心在谷歌已然很擅长的领域(如数据安全和客户支持)成立商业部门九州天下登陆九州天下登陆九州天下登陆。

当然了,即便DeepMind要推出商业产品,也不会是第一家尝试商业化的人工智能研究公司九州天下登陆。总部位于旧金山的OpenAI可能是最接近DeepMind的竞争对手,如今越发商业化。去年九州天下登陆,OpenAI发布的第一个商业产品九州天下登陆,企业能够使用人工智能界面将简短的手写提示组成连贯的长文本。该人工智能被称为GPT九州天下登陆九州天下登陆九州天下登陆,商业价值尚未得到证实,而DeepMind的AlphaFold 2可能对制药公司或生物技术初创企业产生根本性的影响九州天下登陆。在反垄断监管者调查Alphabet之际九州天下登陆,拥有商业上可行的产品可能是很好的保险九州天下登陆九州天下登陆九州天下登陆,以防将来拆分Googleplex时DeepMind失去财大气粗的母公司无条件支持九州天下登陆九州天下登陆。

有一点可以肯定九州天下登陆,DeepMind在蛋白质折叠领域的探索并未结束九州天下登陆九州天下登陆。CASP竞争只是围绕预测单个蛋白质的结构九州天下登陆。在生物学和医学领域九州天下登陆,研究人员真正关心的通常是蛋白质如何相互作用九州天下登陆九州天下登陆。一种蛋白质是如何与另一种蛋白质或与某种特定的小分子结合九州天下登陆九州天下登陆?酶如何分解蛋白质?莫尔特说,预测相互作用和结合很可能成为未来CASP竞争的主要关注点。乔普表示九州天下登陆,下一步DeepMind打算应对相关挑战九州天下登陆。

而在蛋白质折叠以外的领域九州天下登陆,AlphaFold 2的成功肯定也会发挥影响九州天下登陆,将鼓励其他人在重大科学问题中应用深入学习九州天下登陆。比如发现新的亚原子粒子,探索暗物质的奥秘,掌握核聚变或创造室温超导体九州天下登陆九州天下登陆??评锉硎?九州天下登陆九州天下登陆,在天体物理学方面九州天下登陆,DeepMind已经发挥了积极的作用九州天下登陆。Facebook的人工智能研究人员刚刚启动了深度学习项目九州天下登陆九州天下登陆,希望寻找新的化学催化剂九州天下登陆九州天下登陆。蛋白质折叠是基础科学当中第一个由人工智能解决的谜团九州天下登陆九州天下登陆九州天下登陆九州天下登陆,但肯定不会是最后一个九州天下登陆九州天下登陆九州天下登陆。(财富中文网)

译者:冯丰

审校:夏林

It is March 13, 2016. Two men, dressed in winter coats and woolen hats to defend against the frigid night air, walk side by side through the crowded streets of downtown Seoul. Locked in animated conversation, they seem oblivious to the pulsating neon enticements of the surrounding dumpling houses and barbecue joints. They are visitors, having come to South Korea on a mission, the culmination of years of effort—and they have just succeeded.

This is a celebratory stroll. What they have achieved will cement their places in the annals of computer science: They have built a piece of artificial intelligence software able to play the ancient strategy game Go so expertly that it handily defeated the world’s top player, Lee Sedol. Now the two men are discussing their next goal, their conversation captured by a documentary film crew shadowing them.

“I’m telling you, we can solve protein folding,” Demis Hassabis says to his walking companion, David Silver. “That’s like, I mean, it’s just huge. I am sure we can do that now. I thought we could do that before, but now we definitely can do it.” Hassabis is the cofounder and chief executive officer of DeepMind, the London-based A.I. company that built AlphaGo. Silver is the DeepMind computer scientist who led the AlphaGo team.

Four years later, DeepMind has just accomplished what Hassabis broached in that nocturnal amble: It has created an A.I. system that can predict the complex shapes of proteins down to an atom’s-width accuracy from the genetic sequences that encode them. With this achievement, DeepMind has completed an almost 50-year-old scientific quest. In 1972, in his Nobel Prize acceptance speech, chemist Christian Anfinsen postulated that DNA alone should fully determine the final structure a protein takes. It was a remarkable conjecture. At the time, not a single genome had been sequenced yet. But Anfinsen’s theory launched an entire subfield of computational biology with the goal of using complex mathematics, instead of empirical experiments, to model proteins.

DeepMind’s achievement with Go was important—but it had little concrete impact outside the relatively cliquish worlds of Go and computer science. Solving protein folding is different: It could prove transformative for much of humanity. Proteins are the basic building blocks of life and the mechanism behind most biological processes. Being able to predict their structure could revolutionize our understanding of disease and lead to new, more targeted pharmaceuticals for disorders from cancer to Alzheimer’s disease. It will likely accelerate the time it takes to bring new medicines to market, potentially shaving years and hundreds of millions of dollars in costs from drug development, and potentially saving lives as a result.

The new method pioneered by DeepMind is already yielding results in the fight against SARS-CoV-2, the virus that causes COVID-19. What follows is the story of how a company best known for playing games came to unlock one of biology’s greatest secrets.

Building blocks with elusive shapes

“Proteins are the main machines of the cell,” Ian Holmes, a professor of bioengineering at the University of California at Berkeley, says. “And the structure and shape of them is crucial to how they operate.” Small “pockets” within the lattice of molecules that make up the protein are where various chemical reactions take place. If you can find a chemical that will bind to one of these pockets, then that substance can be used as a drug—to either disable or accelerate a biological process. Bioengineers can also create entirely new proteins never before seen in nature with unique therapeutic properties. “If we could tap into the power of proteins and rationally engineer them to any purpose, then we could build these remarkable self-assembling machines that could do things for us,” Holmes says.

But to be sure the protein will do what you want, it’s important to know its shape.

Proteins consist of chains of amino acids, often compared to beads on a string. The recipe for which beads to string in what order is encoded in DNA. But the complex physical shape the completed chain will take is extremely difficult to predict from those simple genetic instructions. Amino acid chains collapse—or fold—into a structure based on electrochemical rules of attraction and repulsion between molecules. The resulting shapes frequently resemble abstract sculptures formed from tangles of cord and ribbon: pleated banderoles joined to M?bius strip–like curlicues and looping helixes. In the 1960s, Cyrus Levinthal, a physicist and molecular biologist, determined that there were so many plausible shapes a protein might assume that it would take longer than the known age of the universe to arrive at the correct structure by randomly trying combinations—and yet, the protein folds itself in milliseconds. This observation has become known as Levinthal’s Paradox.

Until now, the only way to know a protein’s structure with near certainty was through a method known as X-ray crystallography. As the name implies, this involves turning solutions of millions of proteins into crystals, a chemical process that is itself tricky. X-rays are then fired at these crystals, allowing a scientist to work backward from the diffraction patterns they make to build up a picture of the protein itself. Oh, and not just any X-rays: For many proteins, the X-rays need to be produced by a massive, stadium-size circular particle accelerator called a synchrotron.

The process is expensive and time-consuming: It takes about 12 months and approximately $120,000 to determine a single protein’s structure with X-ray crystallography, according to one estimate from researchers at the University of Toronto. There are over 200 million known proteins, with about 30 million more being discovered every year, and yet fewer than 200,000 of these have had their structures mapped with X-ray crystallography or other experimental methods. “Our level of ignorance is growing rapidly,” says John Jumper, a computational physicist who is now a senior researcher at DeepMind and leads its protein-folding team.

Over the past 50 years, ever since Christian Anfinsen’s famous speech, scientists have tried speed up the analysis of protein structure by using complex mathematical models run on high-powered computers. “What you do is essentially try to create a digital twin of the protein in your computer, and then try to manipulate it,” says John Moult, a professor of cell biology and molecular genetics at the University of Maryland and a pioneer in using mathematical algorithms to predict protein structures from their DNA sequences. The problem is, these predicted folding patterns were frequently wrong, failing to match the structures scientists found through X-ray crystallography. In fact, until about 10 years ago, few models were able to accurately predict more than about a third of a large protein’s shape.

Some protein-folding simulations also take up tremendous amounts of computing power. In the year 2000, researchers created a “citizens science” project called Fold@home in which people could donate the idle processing capacity of their personal computers and game consoles to run a protein-folding simulation. All those devices, chained together through the Internet, created one of the world’s most powerful virtual supercomputers. The hope was that this would allow researchers to escape Levinthal’s Paradox—to speed up the time it would take to hit upon the accurate protein structures through random trial and error. The project, which is still running, has provided data for more than 225 scientific papers on proteins implicated in a number of diseases.

But despite having access to so much processing power, Fold@home is still mired in Levinthal's Paradox: It is trying to find a protein structure by searching through all possible permutations. The holy grail of protein folding is to skip this laborious search and to instead discover elusive patterns that link a protein’s DNA sequence to its structure—allowing a computer to take a radical shortcut, leaping directly from genetics to the correct shape.

Games with a serious purpose

Demis Hassabis’s interest in protein folding began, as many of Hassabis’s passions do, with a game. Hassabis is a former chess prodigy, a master by the time he was 13 and at one time ranked second in the world for his age. His love of chess fed a fascination with two things: game design and the inner mechanisms of his own mind. He began working for a video games company while still in high school and, after studying computer science at the University of Cambridge, founded his own computer games startup, Elixir Studios, in 1998.

Despite producing two award-winning games, Elixir eventually sold off its intellectual property and shut down, and Hassabis went on to get a Ph.D. in cognitive neuroscience from University College London. By then, he had already embarked on the crusade that would lead him to cofound DeepMind in 2010: the creation of artificial general intelligence—software capable of learning to perform many disparate tasks as well or better than people. DeepMind’s lofty goal, Hassabis once said, was “to solve intelligence, and then use it to solve everything else.” Hassabis already had an inkling that protein folding just might be one of those first “everything elses.”

Hassabis was doing a postdoc at the Massachusetts Institute of Technology in 2009 when he heard about an online game called Foldit. Foldit was designed by researchers at the University of Washington and, like Fold@home, it was a “citizens science” project for protein folding. But instead of yoking together idle microchips, Foldit was designed to harness idle brains.

Foldit is a puzzle-like game in which human players, without any knowledge of biology, compete to fold proteins, earning points for creating shapes that are plausible. Researchers then analyze the highest-scoring designs to see if they can help complete unsolved protein structures. The game has attracted tens of thousands of players and, in a number of documented cases, produced better protein structures than protein-folding computer algorithms. “I thought it was fascinating from the standpoint of, can we use the addictiveness of games and the joy of them, and in the background not only are they having fun, but they are doing something useful for science,” Hassabis says.

But there was another reason Foldit would continue to capture Hassabis’s imagination. Games are a particularly good arena for a kind of A.I. training called reinforcement learning. This is where software learns from experience, essentially by trial and error, to get better at a task. In a computer game, software can experiment endlessly, playing over and over again, improving gradually until it reaches superhuman skill, without causing any real-world harm. Games also have ready-made and unambiguous ways to tell if a particular action or set of actions is effective: points and wins. Those metrics provide a very clear way to benchmark performance—something that doesn’t exist for many real-world problems, where the most effective move may be far more ambiguous and the entire concept of “winning” may not apply.

DeepMind was founded largely on the promise of combining reinforcement learning with a kind of A.I. called deep learning. Deep learning is A.I. based on neural networks—a kind of software loosely based on how the human brain works. In this case, instead of networks of actual nerve cells, the software has a bunch of virtual neurons, arranged in a hierarchy where an initial input layer takes in some data, applies a weighting to it, and passes it along to the middle layers, which do the same in turn, until it is eventually passed to an output layer that sums up all the weighted signals and uses that to produce a result. The network adjusts these weights until it can produce a desired outcome—such as accurately identifying photos of cats or winning a game of chess. It’s called “deep learning” not because the insights it produces are necessarily profound—although they can be—but because the network consists of many layers and so can be said to have depth.

DeepMind’s initial success came in using this “deep reinforcement learning” to create software that taught itself to play classic Atari computer games, such as Pong, Breakout, and Space Invaders, at superhuman levels. It was this achievement that helped get DeepMind noticed by big technology firms, including Google, which bought it for a reported £400 million (more than $600 million at the time) in 2014. It then turned its attention to Go, eventually creating the system AlphaGo, which defeated Sedol in 2016. DeepMind went on to create a more general version of that system, called AlphaZero, that could learn to play almost any two-player, turn-based game in which players have perfect information (so there is no element of chance or hidden information, such as face-down cards or hidden positions) at superhuman levels. Last year, it also built a system that could beat top human professional e-sports players at the highly complex real-time strategy game Starcraft 2.

But Hassabis says he always saw the company’s work with games as a way to perfect A.I. methods so they could be applied to real-world challenges—especially in science. “Games are just a training ground, but a training ground for what exactly? For creating new knowledge,” he says.

DeepMind is not a traditional business, with products and customers. Instead, it is essentially a research lab that tries to advance the frontiers of artificial intelligence. Many of the methods it develops, it publishes openly for anyone to use or build upon. But some of its advances are useful for its sister company, Google.

DeepMind has a whole team of engineers and scientists that help Google incorporate cutting-edge A.I. into its products. DeepMind’s technology has found its way into everything from Google Maps to the company’s digital assistant to the system that helps manage battery power on Android phones. Google pays DeepMind for this help, and Alphabet, its parent company, continues to absorb the additional losses that DeepMind generates. Those are not insignificant: The company lost £470 million in 2018 (about $510 million at the time), the last year for which its annual financial statements are publicly available through the U.K. business registry Companies House.

But DeepMind, which now employs more than 1,000 people, also has a whole other division that works only on scientific applications of A.I. It is headed by Pushmeet Kohli, a 39-year-old native of India, who worked on A.I. research for Microsoft before joining DeepMind. He says that DeepMind’s aim is to try to solve “root node” problems—data science-speak for saying it wants to take on issues that are fundamental to unlocking many different scientific avenues. Protein folding is one of these root nodes, Kohli says.

“The Olympics of protein folding”

In 1994, at a time when many scientists were first starting to use sophisticated computer algorithms to try to predict how proteins would fold, Moult, the University of Maryland biologist, decided to create a competition that could provide an unbiased way of assessing which of these algorithms was best. He called this competition the Critical Assessment of Protein Structure Prediction (CASP, for short), and it has been held biennially ever since.

It works like this: The Protein Structure Prediction Center, the organization that runs CASP and which is funded through the U.S. National Institute of General Medical Sciences, persuades researchers who do X-ray crystallography and other empirical studies to provide it with protein structures that have not yet been published anywhere, asking them to refrain from making the structures public until after the CASP competition. CASP then gives the DNA sequences of these proteins to the contestants, who use their algorithms to predict the protein’s structure. CASP then judges how close the predictions are to the actual structure the X-ray crystallographers and experimentalists found. The algorithms are then ranked by their average performance across all the proteins. “I call it the Olympics of protein folding,” Hassabis says. And, in 2016, shortly after AlphaGo beat Sedol, DeepMind set out to win the gold medal.

DeepMind established a small, crack team of a half-dozen machine learning researchers and engineers to work on the problem. “It’s part of our philosophy that we start with generalists,” Hassabis says. The company does not suffer from a lack of brain power. “Ex-physicists, ex-biologists, we just have them lying around generally,” Hassabis says with a wry smile. “They never know when their previous expertise suddenly is going to become useful.” Eventually the team grew to about 20 people.

Still, DeepMind decided it would be helpful to have at least one true protein-folding expert onboard. It found one in John Jumper. Skinny, with a mop of asymmetrically styled brown hair, Jumper is a boyish 35 and looks a bit like the bass guitarist in a late-1990s high school garage band. He earned a master’s degree in theoretical condensed matter physics from Cambridge before going on to work at D.E. Shaw Research in New York City, an independent research lab founded by hedge fund billionaire David Shaw. The lab specializes in computational biology, including the simulation of proteins. Jumper later got his Ph.D. in computational biophysics from the University of Chicago, studying under Karl Freed and Tobin Sosnick, two scientists known for advances in protein-fold modeling. “I had heard this rumor that DeepMind was interested in protein problems,” he says. He applied and got the job.

Hassabis’s and the DeepMind team’s first instinct was that protein folding could be solved in exactly the same way as Go—with deep reinforcement learning. But this proved problematic: For one thing, there were even more possible fold configurations than there are moves in Go. More importantly, DeepMind had mastered Go in large part by getting its A.I. system, AlphaGo, to play games against itself. “There isn’t quite the right analogy for that because protein folding is not a two-player game,” Hassabis says. “You’re sort of playing against Nature.”

DeepMind soon established that there was a simpler way of making progress using a kind of A.I. training known as supervised deep learning. This is the sort of A.I. used in most business applications: From an established set of data inputs and corresponding outputs, a neural network learns how to match a given input to a given output. In this case, DeepMind had the protein structures—currently about 170,000 of them—that are publicly available in the Protein Data Bank (PDB), a public repository of all known three-dimensional protein shapes and their genetic sequences, to use as training data.

Some biologists had already used supervised deep learning to predict how proteins would fold. But the best of these A.I. systems were right only about 50% of the time, which wasn’t particularly helpful to biologists or medical researchers—especially since, for a protein whose structure was unknown, they had no way of determining whether a particular prediction was correct.

One promising technique rested on the idea that proteins can be grouped into families based on their evolutionary history. And within these families, it is possible to find pairs of amino acids that are distant from one another in a DNA sequence, yet seem to mutate at the same time. This phenomenon, which is called “coevolution,” is helpful because coevolved proteins are likely to be in contact within the protein’s folded structure. Jinbo Xu, a scientist at the Toyota Technological Institute in Chicago, pioneered using deep learning on this coevolutionary data to predict amino acid contacts. The approach is a bit like finding just the dots in a connect-the-dots game. Scientists still had to use other software to try to figure out the lines between those dots—and often they got this wrong. Sometimes they didn’t even get the dots right.

For the 2018 CASP competition, DeepMind took these basic ideas about coevolution and contact prediction but added two important twists. First, rather than trying to determine if two amino acids were in contact, a binary output (either the pair is in contact or isn’t), it decided to ask the algorithm to predict the distance between all the amino acid pairs in the protein.

To most molecular biologists, such an approach seemed counterintuitive—although Xu, to his credit, had independently proposed a similar method. After all, it was contact that mattered most. But to DeepMind’s deep learning experts it was immediately obvious that distance was a much better metric for a neural network to work on, Kohli says. “It is just a fundamental part of deep learning that if you have some uncertainty associated with a decision, it is much better to have the neural network incorporate that uncertainty and decide what to do about it,” he says. Distance, unlike contact, was a richer piece of information the network could adjust and play with.

The other twist DeepMind came up with was a second neural network that predicted the angles between amino acid pairs. With these two factors—distance and angles—DeepMind’s algorithm was able to work out a rough outline of a protein’s likely structure. It then used a different, non-A.I. algorithm to refine this structure. Putting these components together into a system it called AlphaFold, DeepMind crushed the competition in the 2018 CASP (called CASP13 because it was the 13th of the biennial contests). On the hardest set of 43 proteins in the competition, AlphaFold got the highest score on 25 of them. The next closest team scored highest on just three. The results shook the entire field: If there had been any doubt about whether deep learning methods were the most promising way to crack the protein-folding problem, AlphaFold ended them.

Going back to the whiteboard

Still, DeepMind was nowhere close to Hassabis’s goal: solving the protein-folding problem. AlphaFold was fairly inaccurate almost half the time. And, of the 104 protein targets in CASP13, it achieved results that were as good as X-ray crystallography in only about three cases. “We didn’t just want to be the best at this according to CASP, we wanted to be good at this. We actually want a system that matters to biologists,” Jumper says.

No sooner had the CASP 2018 results been announced than DeepMind redoubled its efforts: Jumper was put in charge of the expanded team. Rather than simply trying to build on AlphaFold, making incremental improvements, the team went back to the whiteboard and started to brainstorm radically different ideas that they hoped would be able to bring the software closer to the kind of accuracy X-ray crystallography yielded.

What followed, Jumper says, was one of scariest and most depressing periods of the entire project: nothing worked. “We spent three months not getting any better than our CASP13 results and starting to really panic,” he says. But then, a few of the things the researchers were trying produced a slight improvement—and within six months the system was notably better than the original AlphaFold. This pattern would continue throughout the next two years, Jumper says: three months of nothing, followed by three months of rapid progress, followed by yet another plateau.

Hassabis says a similar pattern had occurred with previous DeepMind projects, including its work on Go and the complex, real-time strategy video game Starcraft 2. The company’s management strategy for overcoming this, he says, is to alternate between two different ways of working. The first, which Hassabis calls “strike mode,” involves pushing the team as hard as possible to wring every ounce of performance out of an existing system. Then, when the gains from the all-out effort seem to be exhausted, he shifts gears into what he calls “creative mode.” During this period, Hassabis no longer presses the team on performance—in fact, he tolerates and even expects some temporary declines—in order to give the researchers and engineers the space to tinker with new ideas and try novel approaches. “You want to encourage as many crazy ideas as possible, brainstorming,” he says. This often leads to another leap forward in performance, allowing the team to switch back into strike mode.

A big birthday present

On Nov. 21 of 2019, Kathryn Tunyasuvunakool, a researcher at DeepMind who works on the protein folding team, turned 30. The day would prove to be memorable for another reason too. Tunyasuvunakool, who has a Ph.D. in computational biology from the University of Oxford, was the person on the team in charge of developing new test sets for the protein-folding A.I., now dubbed AlphaFold 2, that DeepMind was developing for the 2020 CASP competition. That morning, when she turned on her office computer, she received an assessment of the system’s predictions on a batch of about 50 protein sequences—all of them only recently added to the Protein Data Bank. She did a double take. AlphaFold 2 had been improving, but on this set of proteins the results were startlingly good—predicting the structure in many cases to within 1.5 angstroms, a distance equivalent to a tenth of a nanometer, or about the width of an atom.

Tunyasuvunakool, who calls herself “the team’s pessimist,” says her first response was not elation, but nausea. “I was feeling quite scared,” she says. The results were so good she was certain she had made a mistake—that when she was preparing the test set, she must have inadvertently allowed several proteins that the A.I. had already seen in the training data to slip in. That would have allowed AlphaFold 2 to essentially cheat, easily predicting the exact structure. Tunyasuvunakool recalls sitting in DeepMind’s cafeteria overlooking London’s St. Pancras Station and drinking cup after cup of herbal tea in an effort to calm herself. She and other team members then spent the rest of that day and late into the evening, and several days more, sitting at their workstations, painstakingly combing through AlphaFold 2’s training data to try to find the mistake.

There wasn’t one. In fact, the new system had made a giant leap forward in performance. AlphaFold 2 was completely different from its predecessor. Rather than an assemblage of components—one to predict the distance between amino acids and another to forecast the angles, with a third piece of software to tie them together —the A.I. now used a single neural network to reason directly from the DNA sequence. While the system still took in evolutionary information—figuring out if the protein in question had a likely common ancestor to others it had seen before, and scrutinizing the alignment between the target protein’s DNA sequence and other known sequences—it no longer needed explicit data about which amino acid pairs evolved together. “Instead of providing more information, we actually provided less,” Jumper says. The system was free to draw its own insights about when ancestry might determine a portion of the protein’s shape and when it might depart more radically from that heritage. In other words, it developed a kind of intuition based on its experience, in much the same way a veteran human scientist might.

At the heart of the new system was a mechanism called "attention." Attention, as the name implies, is a way to get a deep learning system to focus on a certain set of inputs and weigh those more heavily. For a cat identification system, for instance, the system might learn to pay attention to the shape of the ears and also learn to look for evidence of whiskers near the nose. Jumper compares what AlphaFold 2 does to the process of solving a jigsaw puzzle where “you can snap together certain pieces and be pretty sure of it, and then what you end up with are different local islands of solution, and then you figure out how to join these up.” The middle of the network, Jumper says, has learned to reason about geometry and space and how to join up those amino acid pairs it thinks are close together based on its analysis of the DNA sequences.

DeepMind trained AlphaFold 2 on 128 “tensor processing cores,” the number-crunching brains found on 16 special computer chips engineered for deep learning that Google designed and uses in its data centers, running continuously for what the company says was a few weeks. (These 128 specialized A.I. cores are about equivalent to 100 to 200 of the powerful graphics processing chips that deliver eye-popping animation on an Xbox or PlayStation.) Once trained, the system can take a DNA sequence and spit out a complete structure prediction “in a matter of days,” the company says.

Among AlphaFold 2’s advantages over its predecessor is a confidence gauge: The system produces a score for how sure it is of its own predictions for each amino acid in a structure. This metric is crucial if AlphaFold 2 is going to be useful to biologists and medical researchers who will need to know when they can reasonably rely on the model and when to have more caution.

Despite the stunning test results, DeepMind was still not certain how good AlphaFold 2 was. But they got an important clue when the coronavirus pandemic struck. In March of this year, AlphaFold 2 was able to predict the structure for six understudied proteins associated with SARS-CoV-2, the virus that causes COVID-19, one of which scientists have since confirmed using an empirical method called cryogenic electron microscopy. It was a powerful glimpse of the kind of real-world impact DeepMind hopes AlphaFold 2 will soon have.

An astonishing result

The CASP competition takes place between May and August. The Protein Structure Prediction Center releases batches of target proteins, and contestants then submit their structure predictions for evaluation. The rankings for this year’s competition were announced on Nov. 30.

Each prediction is scored using a metric called “global distance test total score,” or GDT for short, that in effect looks at how close, in angstroms, it is to a structure obtained by empirical methods such as X-ray crystallography or electron microscope. A score of 100 is perfect, but anything at 90 or above is considered equivalent to the empirical methods, Moult, the CASP director, says. The proteins are also classed into groups based on how difficult the CASP organizers think it is to get the structure.

When Moult saw AlphaFold 2’s results he was incredulous. Like Tunyasuvunakool months earlier, his initial thought was that there might be a mistake. Maybe some of the protein sequences in the competition had been published before? Or maybe DeepMind had somehow managed to get hold of a cache of unpublished data?

As a test, he asked Andrei Lupas, director of the department of protein evolution at the Max Planck Institute for Developmental Biology in Tuebingen, Germany, to conduct an experiment. Lupas would ask AlphaFold 2 to predict a structure that he knew for certain had never been seen before because Lupas had never been able to work out from X-ray crystallography what a key piece of the protein looked like. For almost a decade, Lupas had puzzled over this missing link, but the correct shape had eluded him. Now, with AlphaFold’s prediction as a guide, Lupas says, he went back to the X-ray data. “The correct structure just fell out within half an hour,” he says. “It was astonishing.”

Since DeepMind’s success in 2018’s CASP, many academic researchers have flocked to deep learning techniques. As a result, the rest of the field’s performance has improved: On a median difficulty target, the other competitors now have an average best prediction GDT of 75, up 10 points from two years ago. But there was no comparison to AlphaFold 2: It scored a median 92 GDT across all proteins, and even on the most difficult proteins it achieved a median score of 87 GDT. Moult says AlphaFold 2’s predictions are “on par with empirical methods,” such as X-ray crystallography. That conclusion lead CASP to make a momentous declaration on Monday, Nov. 30: The 50-year-old protein-folding problem had been solved.

Venki Ramakrishnan, a Nobel Prize–winning structural biologist who is also the current president of The Royal Society, Britain’s most prestigious scientific body, says AlphaFold 2 “represents a stunning advance” in protein folding. With AlphaFold 2, expensive and time-consuming empirical analysis with methods like X-ray crystallography and electron microscopes may become a thing of the past.

Janet Thornton, an expert in protein structure and former director of the European Molecular Biology Laboratory’s European Bioinformatics Institute, says that DeepMind’s breakthrough will allow scientists to map the entire human “proteome”—all the proteins found within the human body. Currently only a quarter of these proteins have been used as targets for drugs, but having the structure for the rest would create vast opportunities for the development of new therapies. She also says the A.I. software could enable protein engineering that might aid in sustainability efforts, allowing scientists to potentially create new crop strains that provide more nutritional value per acre of land planted, and also possibly allowing for the advent of enzymes that could digest plastic.

For now, though, the question remains about how exactly DeepMind will make AlphaFold 2 available. Hassabis says the company is committed to ensuring the software can “make the maximal positive societal impact.” But he says it has not yet determined how to do that, saying only that it will make an announcement sometime next year. Hassabis also tells Fortune that DeepMind is considering how it might be able to build a commercial product or partnership around the system. “This should be hugely useful for the drug discovery process and therefore Big Pharma,” he says. But exactly what form this commercial offering will take, he says, has not yet been decided either.

A commercial venture would be marked departure for DeepMind, which, since its sale to Alphabet, has not had to worry about generating revenue. The company briefly set up a division called DeepMind Health that was working with the U.K.’s National Health Service on an app that could identify hospital patients who were at risk of developing acute kidney injury. But the effort became embroiled in a controversy after news reports revealed DeepMind's hospital partner had violated the U.K. data protection laws by giving the company access to millions of patients’ medical records. In 2019, DeepMind Health was formally absorbed into a new Google health division. At the time, DeepMind said cleaving off its health effort would allow it to remain true to its research roots without the distraction of having to build a commercial unit that might replicate areas, such as data security and customer support, where Google already had expertise.

Of course, if DeepMind were to launch a commercial product, it would not be the first A.I. research company to do so: OpenAI, the San Francisco–based research company that is perhaps DeepMind’s closest rival, has become increasingly business-oriented. Last year, OpenAI launched its first commercial product, an interface that lets companies use an A.I. that composes long passages of coherent text from a short, human-written prompt. The business value of that A.I., called GPT-3, remains unproven, while DeepMind’s AlphaFold 2 could have an immediate bottom-line impact for a pharmaceutical company or biotechnology startup. At a time when antitrust regulators are probing Alphabet, having a viable commercial product could be a good insurance policy for DeepMind in the event it ever loses the unconditional support of its deep-pocketed parent in some future breakup of the Googleplex.

One thing is certain: DeepMind isn’t done with protein folding. The CASP competition was set up around predicting the structure of single proteins. But in biology and medicine, it is usually protein interactions that researchers really care about. How does one protein bind with another or with a particular small molecule? Exactly how does an enzyme break a protein apart? The problem of predicting these interactions and bindings will likely become the primary focus of future CASP competitions, Moult says. And Jumper says DeepMind plans to work on those challenges next.

Reverberations from AlphaFold 2’s success are certain to be felt in areas far removed from protein folding, too, encouraging others to apply deep learning to big scientific questions: finding new subatomic particles, probing the secrets of dark matter, mastering nuclear fusion, or creating room-temperature superconductors. DeepMind has an active effort already underway on astrophysics, Kohli says. Facebook’s A.I. researchers just launched a deep learning project aimed at finding new chemical catalysts. Protein folding is the first foundational scientific mystery to fall to the power of artificial intelligence. It won’t be the last.

最新:
  • 热读文章
  • 热门视频
活动
扫码打开财富Plus App
九州天下登陆 最好看的小说排行 小说网 听中国有声小说 欢乐颂小说txt 我欲封天 耳根 小说零 完美世界前传下载 我欲封天txt下载 玄幻小说改编的电视剧 好看的玄幻小说 完美世界txt下载 好看的课外书 古风小说 国际完美世界下载 欢乐颂第一季 梦入神机 盗墓笔记 好看的玄幻小说 已完结小说排行榜 穿越小说完本 绝色狂妃 仙魅 小说 盗墓笔记 琅琊榜 海宴 小说 完美世界有声小说 天下 高月 小说 重生之毒妃 梅果 小说 盛世嫡妃 凤轻 小说 古风小说 君子以泽 怎么写网络小说 完结小说 好看的课外书 完美世界txt全集下载 女人书籍排行榜 小说 穿越小说完本 有声读物 完结小说 盗墓笔记小说txt下载 yy玄幻小说排行榜完本 我欲封天 将夜 猫腻 小说 懒人听书 遮天 小说网 有声读物 完美世界前传下载 完美世界国际版下载 殿上欢 耳根 最好看的小说排行 小说阅读网 风凌天下 古风小说 君子以泽 好看的课外书 欢乐颂小说结局是什么 性爱有声小说在线收听 古风小说 管理书籍排行榜 玄幻小说改编的电视剧 好看的小说 君子以泽 好看的电视剧
<var id="59xdh"></var><listing id="59xdh"></listing>
<var id="59xdh"><strike id="59xdh"><thead id="59xdh"></thead></strike></var>
<cite id="59xdh"><strike id="59xdh"></strike></cite>
<var id="59xdh"></var>
<cite id="59xdh"></cite>
<var id="59xdh"></var>
<var id="59xdh"></var>
<var id="59xdh"><span id="59xdh"><var id="59xdh"></var></span></var>
<var id="59xdh"><video id="59xdh"></video></var><var id="59xdh"><strike id="59xdh"></strike></var>
<var id="59xdh"><video id="59xdh"></video></var>
<var id="59xdh"></var>
<var id="59xdh"></var><cite id="59xdh"><video id="59xdh"></video></cite>
<var id="59xdh"><video id="59xdh"><menuitem id="59xdh"></menuitem></video></var>
<menuitem id="59xdh"><strike id="59xdh"><listing id="59xdh"></listing></strike></menuitem>
<cite id="59xdh"><video id="59xdh"></video></cite> <cite id="59xdh"></cite>
<cite id="59xdh"><video id="59xdh"></video></cite>
<var id="59xdh"></var>