• 机器学习(Machine Learning)大家~zz
        
          闲着无事,想写点一些我所了解的machine learning大家。由于学识浅薄,见识有限,并且仅局限于某些领域,一些在NLP及最近很热的生物信息领域活跃的学者我就浅陋无知,所以不对的地方大家仅当一笑。
      
      Machine Learning 大家(1):M. I. Jordan
      
      在我的眼里,M Jordan无疑是武林中的泰山北斗。他师出MIT,现在在berkeley坐镇一方,在附近的两所名校(加stanford)中都可以说无出其右者, stanford的Daphne Koller虽然也声名遐迩,但是和Jordan比还是有一段距离。
      
      Jordan身兼stat和cs两个系的教授,从他身上可以看出Stat和ML的融合。
      
      Jordan 最先专注于mixtures of experts,并迅速奠定了自己的地位,我们哈尔滨工业大学的校友徐雷跟他做博后期间,也在这个方向上沾光不少。Jordan和他的弟子在很多方面作出了开创性的成果,如spectral clustering, Graphical model和nonparametric Bayesian。现在后两者在ML领域是非常炙手可热的两个方向,可以说很大程度上是Jordan的lab一手推动的。
      
      更难能可贵的是, Jordan不仅自己武艺高强,并且揽钱有法,教育有方,手下门徒众多且很多人成了大器,隐然成为江湖大帮派。他的弟子中有10多人任教授,个人认

    为他现在的弟子中最出色的是stanford的Andrew Ng,不过由于资历原因,现在还是assistant professor,不过成为大教授指日可待;另外Tommi Jaakkola和David Blei也非常厉害,其中Tommi Jaakkola在mit任教而David Blei在cmu做博后,数次获得NIPS最佳论文奖,把SVM的最大间隔方法和Markov network的structure结构结合起来,赫赫有名。还有一个博后是来自于toronto的Yee Whye Teh,非常不错,有幸跟他打过几次交道,人非常nice。另外还有一个博后居然在做生物信息方面的东西,看来jordan在这方面也捞了钱。这方面他有一个中国学生Eric P. Xing(清华大学校友),现在在cmu做assistant professor。
      
      总的说来,我觉得 Jordan现在做的主要还是graphical model和Bayesian learning,他去年写了一本关于graphical model的书,今年由mit press出版,应该是这个领域里程碑式的著作。3月份曾经有人答应给我一本打印本看看,因为Jordan不让他传播电子版,但后来好像没放在心上(可见美国人也不是很守信的),人不熟我也不好意思问着要,可以说是一大遗憾. 另外发现一个有趣的现象就是Jordan对hierarchical情有独钟,相当多的文章都是关于hierarchical的,所以能 hierarchical大家赶快hierarchical,否则就让他给抢了。
      
      用我朋友话说看jordan牛不牛,看他主页下面的Past students and postdocs就知道了。
      
      Machine Learning大家(2):D. Koller
      
      D. Koller是1999年美国青年科学家总统奖(PECASE)得主,IJCAI 2001 Computers and Thought Award(IJCAI计算机与思维奖,这是国际人工智能界35岁以下青年学者的最高奖)得主,2004 World Technology Award得主。
      
      最先知道D koller是因为她得了一个大奖,2001年IJCAI计算机与思维奖。Koller因她在概率推理的理论和实践、机器学习、计算博弈论等领域的重要贡献,成为继Terry Winograd、David Marr、Tom Mitchell、Rodney Brooks等人之后的第18位获奖者。说起这个奖挺有意思的,IJCAI终身成就奖(IJCAI Award for Research Excellence),是国际人工智能界的最高荣誉; IJCAI计算机与思维奖是国际人工智能界35岁以下青年学者的最高荣誉。早期AI研究将推理置于至高无上的地位; 但是1991年牛人Rodney Brooks对推理全面否定,指出机器只能独立学习而得到了IJCAI计算机与思维奖; 但是koller却因提出了Probabilistic Relational Models 而证明机器可以推理论知而又得到了这个奖,可见世事无绝对,科学有轮回。
      
      D koller的Probabilistic Relational Models在nips和icml等各种牛会上活跃了相当长的一段时间,并且至少在实验室里证明了它在信息搜索上的价值,这也导致了她的很多学生进入了 google。虽然进入google可能没有在牛校当faculty名声响亮,但要知道google的很多员工现在可都是百万富翁,在全美大肆买房买车的主。
      
      Koller的研究主要都集中在probabilistic graphical model,如Bayesian网络,但这玩意我没有接触过,我只看过几篇他们的markov network的文章,但看了也就看了,一点想法都没有,这滩水有点深,不是我这种非科班出身的能趟的,并且感觉难以应用到我现在这个领域中。
      
      Koller 才从教10年,所以学生还没有涌现出太多的牛人,这也是她不能跟Jordan比拟的地方,并且由于在stanford的关系,很多学生直接去硅谷赚大钱去了,而没有在学术界开江湖大帮派的影响,但在stanford这可能太难以办到,因为金钱的诱惑实在太大了。不过Koller的一个学生我非常崇拜,叫 Ben Taskar,就是我在(1)中所提到的Jordan的博后,是好几个牛会的最佳论文奖,他把SVM的最大间隔方法和Markov network结合起来,可以说是对structure data处理的一种标准工具,也把最大间隔方法带入了一个新的热潮,近几年很多牛会都有这样的workshop。 我最开始上Ben Taskar的在stanford的个人网页时,正赶上他刚毕业,他的顶上有这么一句话:流言变成了现实,我终于毕业了!可见Koller是很变态的,把自己的学生关得这么郁闷,这恐怕也是大多数女faculty的通病吧,并且估计还非常的push!
      
      Machine learning 大家(3):J. D. Lafferty
      
      大家都知道NIPS和ICML向来都是由大大小小的山头所割据,而John Lafferty无疑是里面相当高的一座高山,这一点可从他的publication list里的NIPS和ICML数目得到明证。虽然江湖传说计算机重镇CMU现在在走向衰落,但这无碍Lafferty拥有越来越大的影响力,翻开AI兵器谱排名第一的journal of machine learning research的很多文章,我们都能发现author或者editor中赫然有Lafferty的名字。
      
      Lafferty给人留下的最大的印象似乎是他2001年的conditional random fields,这篇文章后来被疯狂引用,广泛地应用在语言和图像处理,并随之出现了很多的变体,如Kumar的discriminative random fields等。虽然大家都知道discriminative learning好,但很久没有找到好的discriminative方法去处理这些具有丰富的contextual inxxxxation的数据,直到Lafferty的出现。
      
      而现在Lafferty做的东西好像很杂,semi-supervised learning, kernel learning,graphical models甚至manifold learning都有涉及,可能就是像武侠里一样只要学会了九阳神功,那么其它的武功就可以一窥而知其精髓了。这里面我最喜欢的是semi- supervised learning,因为随着要处理的数据越来越多,进行全部label过于困难,而完全unsupervised的方法又让人不太放心,在这种情况下 semi-supervised learning就成了最好的。这没有一个比较清晰的认识,不过这也给了江湖后辈成名的可乘之机。到现在为止,我觉得cmu的semi- supervised是做得最好的,以前是KAMAL NIGAM做了开创性的工作,而现在Lafferty和他的弟子作出了很多总结和创新。
      
      Lafferty 的弟子好像不是很多,并且好像都不是很有名。不过今年毕业了一个中国人,Xiaojin Zhu(上海交通大学校友),就是做semi-supervised的那个人,现在在wisconsin-madison做assistant professor。他做了迄今为止最全面的Semi-supervised learning literature survey,大家可以从他的个人主页中找到。这人看着很憨厚,估计是很好的陶瓷对象。另外我在(1)中所说的Jordan的牛弟子D Blei今年也投奔Lafferty做博后,就足见Lafferty的牛了。
      
      Lafferty做NLP是很好的,著名的Link Grammar Parser还有很多别的应用。其中language model在IR中应用,这方面他的另一个中国学生ChengXiang Zhai(南京大学校友,2004年美国青年科学家总统奖(PECASE)得主),现在在uiuc做assistant professor。
      
      Machine learning 大家(4):Peter L. Bartlett
      
      鄙人浅薄之见,Jordan比起同在berkeley的Peter Bartlett还是要差一个层次。Bartlett主要的成就都是在learning theory方面,也就是ML最本质的东西。他的几篇开创性理论分析的论文,当然还有他的书Neural Network Learning: Theoretical Foundations。
      
      UC Berkeley的统计系在强手如林的北美高校中一直是top3,这就足以证明其肯定是群星荟萃,而其中,Peter L. Bartlett是相当亮的一颗星。关于他的研究,我想可以从他的一本书里得到答案:Neural Network Learning: Theoretical Foundations。也就是说,他主要做的是Theoretical Foundations。基础理论虽然没有一些直接可面向应用的算法那样引人注目,但对科学的发展实际上起着更大的作用。试想vapnik要不是在VC维的理论上辛苦了这么多年,怎么可能有SVM的问世。不过阳春白雪固是高雅,但大多数人只能听懂下里巴人,所以Bartlett的文章大多只能在做理论的那个圈子里产生影响,而不能为大多数人所广泛引用。
      
      Bartlett在最近两年做了大量的Large margin classifiers方面的工作,如其convergence rate和generalization bound等。并且很多是与jordan合作,足见两人的工作有很多相通之处。不过我发现Bartlett的大多数文章都是自己为第一作者,估计是在教育上存在问题吧,没带出特别牛的学生出来。
      
      Bartlett的个人主页的talk里有很多值得一看的slides,如Large Margin Classifiers: Convexity and Classification;Large Margin Methods for Structured Classification: Exponentiated Gradient Algorithms。大家有兴趣的话可以去下来看看。
      
      Machine learning 大家(5): Michael Collins
      
      Michael Collins (http://people.csail.mit.edu/mcollins/)
      自然语言处理(NLP)江湖的第一高人。出身Upenn,靠一身叫做Collins Parser的武功在江湖上展露头脚。当然除了资质好之外,其出身也帮了不少忙。早年一个叫做Mitchell P. Marcus的师傅传授了他一本葵花宝典-Penn Treebank。从此,Collins整日沉迷于此,终于练成盖世神功。
      
      学成之后,Collins告别师傅开始闯荡江湖,投入了一个叫AT&T Labs Research的帮会,并有幸结识了Robert Schapire、Yoram Singer等众多高手。大家不要小瞧这个叫AT&T Labs Research的帮会,如果谁没有听过它的大名总该知道它的同父异母的兄弟Bell Labs吧。
      
      言归正传,话说Collins在这里度过了3年快乐的时光。其间也奠定了其NLP江湖老大的地位。并且练就了Discriminative Reranking, Convolution Kernels,Discriminative Training Methods for Hidden Markov Models等多种绝技。然而,世事难料,怎奈由于帮会经营不善,这帮大牛又不会为帮会拼杀,终于被一脚踢开,大家如鸟兽散了。Schapire去了 Princeton, Singer 也回老家以色列了。Collins来到了MIT,成为了武林第一大帮的六袋长老,并教授一门叫做的Machine Learning Approaches for NLP(http://www.ai.mit.edu/courses/6.891-nlp/) 的功夫。虽然这一地位与其功力极不相符,但是这并没有打消Collins的积极性,通过其刻苦打拼,终于得到了一个叫Sloan Research Fellow的头衔,并于今年7月,光荣的升任7袋Associate Professor。
      
      在其下山短短7年时间内,Collins共获得了4次世界级武道大会冠军(EMNLP2002, 2004, UAI2004, 2005)。相信年轻的他,总有一天会一统丐帮,甚至整个江湖。
      
      看过Collins和别人合作的一篇文章,用conditional random fields 做object recogntion。还这么年轻,admire to death!

        Machine learning 大家(6): Dan Roth

        Dan Roth (http://l2r.cs.uiuc.edu/~danr/)
        统计NLP领域的众多学者后,我得出了一个惊人的结论,就是叫Daniel的牛人特别多: 大到MT领域成名已久的Prof. Dan Melamed,小到Stanford刚刚毕业的Dan Klein,

    中间又有Dan jurafsky这种牛魔王,甚至Michael Collins的师弟Dan Bikel (IBM Research),ISI的Dan Marcu,获得过无数次TREC QA评比冠军的Prof. Dan Moldovan (UTexas Dallas),UC Berkeley毕业的Dan Gildea (U Rochester)。但是,在众多的Dan中,我最崇拜的还是UIUC的Associate Professor,其Cognitive Computation Group的头头Dan Roth。

        这位老兄也是极其年轻的,Harvard博士毕业整十年,带领其团队撑起了UIUC Machine Learning以及NLP领域的一片灿烂天空。其领导开发的SNoW工具可谓是一把绝世好剑,基本达到了"又想马儿跑,又想马儿不吃草"的境界,在不损失分类精度的条件下,学习和预测速度空前。什么?你不知道SNoW?它和白雪公主有什么关系?看来我也得学学"超女"的粉丝们,来一个扫盲了: SNoW是Sparse Network of Winnows的简称,其中实现了Winnow算法,但是记住Sparse Network才是其重点,正是因为有了这块玄铁,SNoW之剑才会如此锋利。

       近年来Roth也赶时髦,把触角伸向了Structured Data学习领域,但与其他人在学习的时候就试图加入结构化信息(典型的如CRF)不同,Roth主张在预测的最后阶段加入约束进行推理,这可以使的学习效率极大的提高,同时在某些应用上,还取得了更好的结果。还有就是什么Kernel学习,估计他也是学生太多,安排不下了,所以只好开疆扩土。

        Harvard出身的Roth,理论功底也极其深厚,好多涉及统计学习理论的工作就不是我这种学工科的人关心的了。

       个人补充一点:南京大学的一个Machine Learning的牛人网名也叫Daniel是不是跟文中的叙述有关呢,呵呵~

  • The importance of stupidity in scientific research(转自未名BBS)  2009-04-04 21:44  |  (分类:默认分类)

    发信人: emptyhb (bio02|带着梦想去飞), 信区: AdvancedEdu
    标  题: Why and how Ph.D.
    发信站: 北大未名站 (2009年04月04日03:36:41 星期六), 转信
       

    Ph.D.进入第三年,感悟不少,成果还不多@@
    不过至今还觉得过得很开心,还想一辈子干,所以前三年自以为收获不小
    今天朋友发来一篇Essay,内容是一位教授谈他对ph.d.的一些感悟,深有同感,特别是
    "I don't think students are made to understand how hard it is to do research.
    And how very,very hard it is to do important research"

     First published online May 20, 2008
    doi: 10.1242/10.1242/jcs.033340
    Journal of Cell Science 121, 1771 (2008)
    Published by The Company of Biologists 2008

        Essay
        The importance of stupidity in scientific research
        Martin A. Schwartz
      
        Department of Microbiology, UVA Health System, University of
        Virginia, Charlottesville, VA 22908, USA
      
        e-mail: maschwartz@virginia.edu
      
        Accepted 9 April 2008
      
        I recently saw an old friend for the first time in many years. We
        had been Ph.D. students at the same time, both studying science,
        although in different areas. She later dropped out of graduate
        school, went to Harvard Law School and is now a senior lawyer for
        a major environmental organization. At some point, the
        conversation turned to why she had left graduate school. To my
        utter astonishment, she said it was because it made her feel
        stupid. After a couple of years of feeling stupid every day, she
        was ready to do something else.
      
        I had thought of her as one of the brightest people I knew and her
         subsequent career supports that view. What she said bothered me.
        I kept thinking about it; sometime the next day, it hit me.
        Science makes me feel stupid too. It's just that I've gotten used
        to it. So used to it, in fact, that I actively seek out new
        opportunities to feel stupid. I wouldn't know what to do without
        that feeling. I even think it's supposed to be this way. Let me
        explain.
      
        For almost all of us, one of the reasons that we liked science in
        high school and college is that we were good at it. That can't be
        the only reason – fascination with understanding the physical
        world and an emotional need to discover new things has to enter
        into it too. But high-school and college science means taking
        courses, and doing well in courses means getting the right answers
         on tests. If you know those answers, you do well and get to feel
        smart.
      
        A Ph.D., in which you have to do a research project, is a whole
        different thing. For me, it was a daunting task. How could I
        possibly frame the questions that would lead to significant
        discoveries; design and interpret an experiment so that the
        conclusions were absolutely convincing; foresee difficulties and
        see ways around them, or, failing that, solve them when they
        occurred? My Ph.D. project was somewhat interdisciplinary and, for
         a while, whenever I ran into a problem, I pestered the faculty in
         my department who were experts in the various disciplines that I
        needed. I remember the day when Henry Taube (who won the Nobel
        Prize two years later) told me he didn't know how to solve the
        problem I was having in his area. I was a third-year graduate
        student and I figured that Taube knew about 1000 times more than I
         did (conservative estimate). If he didn't have the answer, nobody
         did.
      
        That's when it hit me: nobody did. That's why it was a research
        problem. And being my research problem, it was up to me to solve.
        Once I faced that fact, I solved the problem in a couple of days.
        (It wasn't really very hard; I just had to try a few things.) The
        crucial lesson was that the scope of things I didn't know wasn't
        merely vast; it was, for all practical purposes, infinite. That
        realization, instead of being discouraging, was liberating. If our
         ignorance is infinite, the only possible course of action is to
        muddle through as best we can.
      
        I'd like to suggest that our Ph.D. programs often do students a
        disservice in two ways. First, I don't think students are made to
        understand how hard it is to do research. And how very, very hard
        it is to do important research. It's a lot harder than taking even
         very demanding courses. What makes it difficult is that research
        is immersion in the unknown. We just don't know what we're doing.
        We can't be sure whether we're asking the right question or doing
        the right experiment until we get the answer or the result.
        Admittedly, science is made harder by competition for grants and
        space in top journals. But apart from all of that, doing
        significant research is intrinsically hard and changing
        departmental, institutional or national policies will not succeed
        in lessening its intrinsic difficulty.
      
        Second, we don't do a good enough job of teaching our students how
         to be productively stupid – that is, if we don't feel stupid it
        means we're not really trying. I'm not talking about `relative
        stupidity', in which the other students in the class actually read
         the material, think about it and ace the exam, whereas you don't.
         I'm also not talking about bright people who might be working in
        areas that don't match their talents. Science involves confronting
         our `absolute stupidity'. That kind of stupidity is an
        existential fact, inherent in our efforts to push our way into the
         unknown. Preliminary and thesis exams have the right idea when
        the faculty committee pushes until the student starts getting the
        answers wrong or gives up and says, `I don't know'. The point of
        the exam isn't to see if the student gets all the answers right.
        If they do, it's the faculty who failed the exam. The point is to
        identify the student's weaknesses, partly to see where they need
        to invest some effort and partly to see whether the student's
        knowledge fails at a sufficiently high level that they are ready
        to take on a research project.
      
        Productive stupidity means being ignorant by choice. Focusing on
        important questions puts us in the awkward position of being
        ignorant. One of the beautiful things about science is that it
        allows us to bumble along, getting it wrong time after time, and
        feel perfectly fine as long as we learn something each time. No
        doubt, this can be difficult for students who are accustomed to
        getting the answers right. No doubt, reasonable levels of
        confidence and emotional resilience help, but I think scientific
        education might do more to ease what is a very big transition:
        from learning what other people once discovered to making your own
         discoveries. The more comfortable we become with being stupid,
        the deeper we will wade into the unknown and the more likely we
        are to make big discoveries.

  • 转帖一篇电子工程上的文章,挺有意思。

    *主旨: 存活者偏差----很重要的逻辑观念*

     1941年,第二次世界大战正打得如火如荼。

     有一天,美国哥伦比亚大学著名统计学 家沃德 教授(Abraham Wald)
     遇到了一个意外的访客, 那是英国皇家空军的作战指挥官。
     他说:「 沃德教授,每次飞行员出发去执行轰炸任务,
     我们最怕听到的回报是: 『呼叫总部,我中弹了!』
     请协助我们改善这个攸关飞行员生死的难题吧!」

     沃德接下这个紧急研究案,他受委托分析德国地面炮火击中联军轰炸机的资料,并且
    以统计专业,建议机体装甲应该如何加强,才能降低被炮火击落的机会。但依照当时的
    航空技术,机体装甲只能局部加强,否则机体过重,会导致起飞困难及操控迟钝。
     沃德将联军轰炸机的弹着点资料,描绘成两张比较表,
     沃德的研究发现,机翼是最容易被击中的部位,
     而飞行员的座舱与机尾,则是最少被击中的部位。
     沃德详尽的资料分析,令英国皇家空军十分满意。

     但在研究成果报告的会议上,却发生一场激辩。

     负责该项目的作战指挥官说:「沃德 教授的研究清楚地显示,联军轰炸机的机翼,
    弹孔密密麻麻,最容易中弹。因此,我们应该加强机翼的装甲。」
     沃德客气但坚定地说: 「将军,我尊敬你在飞行上的专业,
     但我有完全不同的看法,我建议加强飞行员座舱与机尾发动机部位的装甲,因为那儿
    最少发现弹孔。」
     在全场错愕怀疑的眼光中,沃德解释说:「我所分析的样本中,只包含顺利返回基地
    的轰炸机。
     从统计的观点来看,我认为被多次击中机翼的轰炸机, 似乎还是能够安全返航,而飞
    机很少发现弹着点的部位,并非真的不会中弹,
     而是一旦中弹,根本就无法返航。」指挥官反驳说:
     「我很佩服沃德教授没有任何飞行经验,就敢做这么大胆的推论,就我个人而言,过
    去在执行任务时,也曾多次机翼中弹严重受创,要不是我飞行技术老到,运气也不错,
    早就机毁人亡了,所以,我依然强烈主张应该加强机翼的装甲。」这两种意见僵持不
    下,皇家空军部部长陷入苦思。

     他到底要相信这个作战经验丰富的飞将军, 还是要相信一个独排众议的统计学家?
     由于战况紧急,无法做更进一步的研究,部长决定接受沃德的建议,立刻加强驾驶舱
    与机尾发动机的防御装甲。不久之后,联军轰炸机被击落的比例,果然显著降低。为了
    确认这个决策的正确性,一段时间后,英国军方动用了敌后工作人员,搜集了部份坠毁
    在德国境内的联军飞机残骸,他们中弹的部位,果真如沃德所预料,主要集中在驾驶舱
    与发动机的位置。
     看不见的弹痕最致命

     乍看之下,作战指挥官加强机翼装甲的决定十分合理, 但他忽略了一个事实:弹着点
    的分布,是一种严重偏误的资料。
     因为最关键的资料,其实是在被击落的飞机身上,
     但这些飞机却无法被观察到,因此,布满了弹痕的机翼,反而是飞机最强韧的部位。
    空军作战指挥官差点因为太重视「看得见」的弹痕,反而做出错误的决策。
     这个案例有两个特别值得警惕的地方。
     死掉或被俘的人无法发表意见
     第一, 搜集更多资料,并不会改善决策品质。 由于弹痕资料的来源本身就有严重的
    偏误,努力搜集更多的资料,恐怕只会更加深原有的误解。
     第二,召集更多作战经验丰富的飞行员来提供专业意见,也不能改善决策品质,因为
    这些飞行员,正是产生偏误资料过程中的一环。他们都是安全回航的飞行员,虽然可能
    有机翼中弹的经验,但都不是驾驶舱或发动机中弹的「烈士」。

     简单的说, 当他们愈认真凝视那些「看得到」的弹痕,他们离真相就愈远。

     信息界有所谓「Garbage In, Garbage Out」,
     前提(或假设)若是错误,再漂亮的统计算式或方法、再多的资料,也不能让后面的
    推论变得正确。
     在管理实务与日常生活中,许多关键的资料, 也像上述轰炸机的个案一样,会因为
    「失败」而观察不到。

     台大 刘顺仁 教授在著作《决胜》一书中, 对「存活者偏差(survivorship
     bias)」举例说明,是我读过的书中最生动贴切又清楚的一个。如果有一位70岁的老人
    在电视上说,
     他就是靠每天抽一包烟、嚼一包槟榔才能长寿, 请想起「死人没法上电视说话」这件
    事。
     同样的道理,不是那个地方长寿的老人家吃或喝某东西, 某东西就是养生圣品。

     再看一个骗钱的例子(这已经进化到E-mail版)
     1月2日你接到一封匿名信, 向你表示,这个月市场会上涨,结果市场果然上涨,但你
    不以为意,因为大家都知道有元月效应这回事
     (历年来一月间股******多跌少)。
     到了 2月1日,你又接到另一封信,向你表示,市场将下跌。这一次,又给那封信说中
    了。
     3月1日再接到一封信,情形一样。7月,你对那位匿名人士的先见之明很感兴趣,对方
    邀你投资某个海外基金。
     于是你把全部的储蓄拿出来投资, 两个月以后,那些钱有如肉包子打狗,一去不回。
     你伏在邻居的肩膀上嚎啕大哭,他告诉你,他也接过两封这种神秘信,但寄到第二封
    就停了。
     他说,第一封信的预测正确,但第二封不正确。
     这是怎么一回事?
     那些骗子玩的把戏是,他们从电话簿找出一万个人名,寄出后市看涨的信给其中一半
    的人, 后市看跌的信给另一半的人。
     一个月后,将有五千人接到的信预测正确, 然后再针对这五千人如法炮制。
     再一个月后,剩下二千五百人接到的信预测正确, 如此直到名单上剩下五百人,其中
    会有两百人受骗上当,
     因此骗子只要花几千美元的邮资,便可赚进数百万美元。
     把手法作些改变。
     某骗子假装投顾老师招收会员,跟你说你可以先加入一般会员,等你觉得准了再加入
    VIP会员。这改变更巧妙的地方在于,骗子一开始就能赚到钱,此外VIP会员还会帮骗子
    建立口碑,证明骗子有多准:存活者偏差(survivorship bias)。

     只要信息不流通, 其它人不知道这假的投顾老师有多么(不)准。
    --

  • 发信人: Thermophile (Jay), 信区: Faculty
    标 题: 谈谈美国大学faculty招聘的规则
    发信站: BBS 未名空间站 (Wed Jan 14 20:56:55 2009)

    谈谈美国大学faculty招聘的规则

    虽然经济不景气,今年我们系招聘assistant professor (AP,Bio 方向) 的工作并没
    有受影响。我系共收到150余封申请。最后6 位面试者中,有 两个是教授(一个在top
    10 名校没拿到tenure, 另一个是想从排名 60~70 名的学校想跳槽到我校来)。Onsite

    Interview 的面试已经开始了。有许多感想(也许并不正确),写在这里与大家分享。


    Search committee招聘教职的过程分三步。1 决定short list: committee 大概会选出

    30~40 个候选人,然后写信给他们的推荐人要推荐信。2 决定Onsite Interview 名单
    (5~6 个),邀请他们来系里给两个报告。一个是public seminar (面试者介绍自己
    的研究成果); 另一个是只有系里的教授参加的 chalk talk (讨论未来的研究和教学

    计划)。 3 综合各教授意见,决定最终Offer 和waiting list的名单。表面上,申
    请教职的成功有三个因素。1 申请者的学术能力 (主要反映在论文数量和水平,申请
    者一般需要3~5 篇 本领域top journal的论文,而且这些论文最好是近期发的);2申
    请者的研究方向。 比如最近两年生物能源方向比较热,我有不少做synthetic biology

    的朋友,都进了很好的大学教书;3 申请者的推荐信(3~5 封),评价申请者科研和教

    学上的能力以及人品。

    有的中国学者学术很好,也有好的推荐信,但申请教职的道路却很艰难。因为教职招聘

    中,有很多不利我们的潜规则。首先是“出生背景”的歧视。找教职就如找对象,学术

    界的“凤凰男”(非名校出生的优秀男性学者)同样会受到冷遇。比方说,系里Search

    committee实力派教授是MIT 和 Harvard出来的,这些教授会对自己的晚辈校友都格外
    关照,他们甚至和这些名校申请者的推荐人是师徒或同窗关系 。 其次是性别歧视。拿

    我们系来说,招聘广告上写着男女equal opportunity。On site interview名单出来后

    ,才明白“男女平等”的意义。一个职位, 有上百个申请,女性不到15%。 On site
    interview, 女性三个,男性三个。做science 或engineering 的女性较少,校长和
    Dean一直鼓励各系多招女性教授, 所以,同样背景的简历,Monica 可以拿到 offer
    ,Bill 也许连shortlist 都进不了(所以女同胞可以考虑加注个English First Name
    )。 如果申请者是黑人女性,那绝对是镇系之宝,即使她学术马马虎虎,前途也是一
    片光明。中国学者申请教职还有一个很大的劣势, 就是中国来的申请者实在太多了(
    比如申请我们系的有近一半是中国人)。所以Search committee在“同等”条件之下,

    更愿意给美国人和欧洲人机会。

    要想当教授,如何应对这些规则呢?第一,推荐人很重要, 最好是“学霸”(比如院士

    )或者有权有钱的人物 (比如系里的chair ,NIH 或NSF里的头头)。一般申请者都会

    有几个full professor的推荐信,如果推荐人没有特别显著的影响力,他们的推荐信作

    用是有限的。当然从大牛实验室来的申请者人数有限,有时还据掉我们学校的offer,
    去了top 10的牛校。所以系里近年新招的教授也有从普通学校毕业的。第二, 若没有
    大牛支持,就得 靠 自己的学术论文了。我的博士学位是在西部一所排名30~50 的大学

    念的,同级的一个中国同学当时跟了新来的中国AP,当年被老板压的很苦,但他是班上

    第一个找到faculty的(在五大湖附近的一所学校)。因为这位老兄既做生物实验, 又

    做数学模型 ,短短几年,一作论文就有十几篇, resume一下子把人给镇住了。所以即

    使你的背景差一些,如果你能发十几篇JBC,一样能找到好的教职。当然,要在JBC这类
    journal 灌那么多水,光靠做传统的生物实验是 很难的。所以一定要多学些实用技术
    ,比如,statistics, compute modeling, mass spectrometry,imaging等等,掌握的

    研究手段和技术多了,就容易出数据,发文章,而且将来如果去工业界或回国发展,也

    会有优势。

    第三,申请 过程也有要注意的地方。1 申请材料要至少找三个美国人proofread,不
    能有任何的语法错误。不能给Search committee造成中国人英语较差,不利教学的印象

    。2 套词也是一个有效手段。申请时,可以寻问一下系里的教授,了解他们想要什么
    样的人。你可以在conference 上套词, 也可直接写信。当年,我套词时就遇到极好的

    中国教授 ,给了我很大的帮助。3 chalk talk时,除了要把自己的未来工作讲好外,
    最好谈谈打算如何与系里其他教授开展教学和科研的合作。4 不要oversell yourself
    ,最好提一下自己工作的不足之处,以及改进的方法,以免引起同行的不满。5 面试
    时,穿着和 吃饭尽量注意,不要显得太猥琐, 不要打听别人的私事。

    如果你实在不热爱做生物研究,就应该趁早转行。美国baby boom 这一代就要退休了,

    不久的将来, 美国会缺少大量的工程技术人员 (美国小孩数学不行)。如果有机会,

    大家应该尽量往数学,会计,工程等方向转,有生物背景的学生也不妨考虑一些交叉学

    科,比如 Bioinformatics, Systems Biology 和 Biostatistics。在这些领域找
    faculty工作也相对比较容易。

    我有好几个朋友是AP。再谈一下做AP的感受。我所知的生物AP的年收入(起薪)大概有

    10~12万,startup 一般是 40~60万(其中实验室的全套仪器设备要花大约20~ 30 万)

    。做AP头两年会非常辛苦。AP招 学生难(大概是AP的名声已经被bbs搞臭了)。AP拿
    funding更难。一个实验室(两个学生+两个postdoc来算)一年要花25万。NIH 和NSF的

    funding竞争很激烈(而且还要照顾女性和 minority),而DOE和foundation 的钱是需

    要内部关系的,所以每年要写很多proposal才能养活一个实验室。如果AP的英文较差
    ,又缺教学经验,教学压力也会很大 。另外,学校的人际关系要比公司复杂多了,AP
    除了要巴结好系里每个教授,带好自己的学生,参加系里的 各种service外, 还要应
    付学校的安检,funding agency 的大老,journal editor, reviewers, 以及同领域的

    大牛,小牛们等等。哪 个环节出问题,都可能影响将来的tenure。 所以每天的工作就

    象排山倒海一样压过来,很少有时间能亲自做实验。享受生活就更不可能了。(有时我
    还挺怀念周末能打网球,打升级的postdoc生活)。

    美国和中国是一样的,有许多不公平的一面。有时候会感觉自己就象“老人与海”中的

    那个渔夫,即使非常的努力, 也难把握最后的结果。所以,生活中,除了要坚持自己
    的理想外,还要保持一颗平常的心态,没有必要去羡慕那些“成功者” ,因为一个人
    的成就并不是仅仅取决于个人的能力和勤奋。想想James Bailey 这样的泰斗,就 会感

    觉到人生的一切都是短暂的,只有天父的荣耀才会永恒。Mark 10:31: But many who
    are first will be last, and the last, first。
  • http://matt-welsh.blogspot.com/2009/02/plight-of-poor-application-paper.html
    Volatile and Decentralized

    Saturday, February 28, 2009

    The plight of the poor application paper

    My research is unapologetically applications-driven: we've deployed sensor networks

    for monitoring volcanoes, disaster response, and for measuring limb movements in

    patients with Parkinson's Disease. One of the joys of working on sensor networks is

    that a lot of exciting research derives from close collaborations with domain

    experts, shedding light on challenges that we wouldn't otherwise be exposed to. It

    also keeps us in check and ensures we're working on real problems, rather than

    artificial ones.

    At the same time, it's a sad truth that "deployment" or "application" papers often

    face an uphill battle when it comes to getting published in major conferences. I've

    seen plenty of (good!) application-focused papers get dinged in program committees

    for, well, simply not being novel enough. Now, we could have a healthy argument

    about the inherent novelty of building a real system, getting it to work, deploying

    it in a challenging field setting, and reporting on the results. But it's true that

    these papers are pretty different than those about a new protocol, algorithm, or

    language. I've thought a bit about what makes it harder for apps papers to get into

    these venues and have come up with the following observations.

    1) Getting something to work in the real world often involves simplifying it to the

    point where most of the "sexy" ideas are watered down.

    It is very rare for a successful sensor network deployment to involve brand-new,

    never-before-published techniques; doing so would involve a tremendous amount of

    risk. Generally it's necessary to use fairly robust code that embodies well-worn

    ideas, at least for the underpinnings of the system design (MAC, routing, time sync,

    and so forth). As a result, the components of the system design might end up not

    being very novel. Also, many application papers involve a combination of several

    "well known" techniques, but combined together in interesting ways. Still, when a

    reviewer picks apart a paper piece by piece, it's hard to identify the individual

    contributions. The hope is that the whole is greater than the sum of the parts; but

    this is often difficult to convey.

    There is a way to avoid this problem, and that is to write the paper about something

    other than the "mundane" aspects of the system design itself. For our OSDI paper on

    the volcano sensor network, we decided to focus on the validation of the network's

    operation during the deployment, not the individual pieces that made up the system.

    Although it took a lot of work to take the "well-tested" implementations of major

    components (such as MultihopLQI) and get them to work robustly in the field, we

    didn't think the paper could rest on that refinement of previously-published ideas.

    The Berkeley paper on monitoring redwoods took a similar approach by focusing on the

    data analysis.

    2) Academic research tends to reward those who come up with an idea first, not those

    who get the idea to work.

    There are lots of great ideas in the literature that have only been studied in

    simulation or small-scale experiments. Almost no credit goes to those who manage to

    get an idea actually deployed and working under less certain conditions. So even

    though it might take an incredible amount of sweat to take, say, a routing protocol

    and get it working on real hardware in a large-scale field deployment, unless you

    ended up making substantial changes to the protocol, or learned something new about

    its operation, you're unlikely to get much credit for doing so.

    We learned this the hard way with our paper on adapting the ADMR multicast protocol

    to work on motes, which we needed for the CodeBlue medical monitoring platform. It

    turns out that taking an existing protocol (which had only been studied using ns-2

    with a simplistic radio model, and without consideration for memory or bandwidth

    limitations of mote-class devices), and implementing it on real hardware, didn't

    blow away the program committees the way we hoped it would. Eventually, we did

    publish this work (in the aptly-named REALMAN workshop). But the initial reviews

    contained things like "everybody knows that MANET protocols won't work on motes!"

    That was frustrating.

    3) Deployments carry a substantial risk that the system won't actually work, making

    it harder to convince a reviewer that the paper is worth accepting.

    Maybe there should be a built-in handicap for real deployment papers. Whereas in the

    lab, you can just keep tweaking and rerunning experiments until you get the results

    you want, this isn't possible in the field. On the other hand, it's not clear that

    we can really hold deployment papers to a different standard; after all, what

    constitutes a "real" deployment? Is an installation of nodes around an academic

    office building good enough? (We've seen plenty of those. If the world ever wants to

    know the average temperature or light level of the offices in a CS department, we

    are ready!) Or does it have to be in some gritty, untethered locale, like a forest,

    or a glacier? Does use of machetes and/or pack animals to reach the deployment site

    count for anything?

    Of course, it is possible to get a great paper out of a deployment that goes

    sideways. The best way is to write the paper as a kind of retrospective, explaining

    what went wrong, and why. These papers are often entertaining to read, and provide

    valuable lessons for those attempting future work along the same lines. Also,

    failures can often take your research into entirely new directions, which I've

    blogged about before. As an example, we ended up developing Lance specifically to

    address the data quality challenges that arose in our deployment at Reventador. We

    would have never stumbled across that problem had our original system worked as

    planned.

    One thing I don't think we should do is sequester deployment and application papers

    in their own venues, for example, by having a workshop on sensor networks

    applications. I understand the desire to get like-minded people together to share

    war stories, but I think it's essential that these kinds of papers be given equal

    billing with papers on more "fundamental" topics. In the best case, they can enrich

    an otherwise dry technical program, as well as inpire and inform future research.

    Besides, the folks who would go to such a workshop don't need to be convinced of the

    merits of application papers.

    Personally, I'd like to see a bunch of real deployment papers submitted to Sensys

    2009. Jie and I are thinking of ways of getting the program committee to think

    outside the box when reviewing these papers, and any suggestions as to how we should

    encourage a more open-minded perspective are most welcome.
    Posted by Matt Welsh at 1:18 PM
    9 comments:

    Ramakrishna Gummadi said...

        There's a fourth concern I have heard with building and deploying artifacts,

    which is their relative transience vis-a-vis ideas.

        One way to counter this handicap is for the community to promote application-

    driven research that either validates or points out significant drawbacks in making

    ideas work in practice. For e.g., in fields such as experimental Physics, it's even

    possible to obtain a Ph.D. for repeating prior ideas or claims carefully.
        March 1, 2009 8:10 PM
    Barath said...

        "Maybe there should be a built-in handicap for real deployment papers."

        Another, perhaps less popular approach might be to require that all papers (in

    OSDI, for example) release the source code / test scripts used in the experiments

    described in the paper. This would shine a light on papers that were based upon

    unrealistic simulations or that don't deal with hard implementation details, and

    might tip the balance back toward application papers.
        March 1, 2009 10:03 PM
    Ramakrishna Gummadi said...

        "Another, perhaps less popular approach might be to require that all papers (in

    OSDI, for example) release the source code / test scripts used in the experiments

    described in the paper."

        I think SIGMOD made this stuff a mandatory requirement last year. I'm not sure

    if the results changed significantly.
        March 2, 2009 11:26 AM
    Matt Welsh said...

        I'm not sure that releasing source code is going to help matters much. PC

    members are overwhelmed with it is and I doubt that anyone would have time to take a

    serious look at the code/scripts/etc. when evaluating a paper. I guess it's a good

    idea in the sense that you could be "audited" at any time by a reviewer, but, it

    also opens up the potential for abuse where a PC member shoots down a paper due to

    lack of understanding of the code (or not liking the coding style, or some other

    trivial issue).
        March 2, 2009 5:01 PM
    Barath said...

        True, releasing code wouldn't help during the review process. However, it might

    help on a longer timescale - it would enable repeatability of experiments (something

    that the community talks about but doesn't do that much in practice), and might keep

    folks from submitting papers they can't back up with code by the time the camera

    ready is due. That in turn would (indirectly) help application papers.

        That said, it might prove too unpopular...
        March 3, 2009 1:17 PM
    ephermata said...

        How about having people explicitly mark as part of the submission process that

    their paper should be considered as an application paper, then allocating reviewers

    accordingly? For SenSys in particular, there should be plenty of PC members and

    subreviewers who have the background and experience to judge these papers. There are

    plenty of issues in figuring out what is a good application paper, several of which

    you have raised. Most of these seem to come down to needing reviewers that have the

    appropriate "taste" in judging the paper.

        Having authors mark "this is an application paper, judge it by those criteria"

    at submission time would save time and match the paper to the right people. Of

    course you would not necessarily have every single PC member on the paper be an

    "applications person," just to keep things from becoming too inbred. Still, it would

    be a way to fairly evaluate such papers and give them a fighting chance without

    splitting off into a separate conference.

        I agree with the concern about splitting into a separate conference, by the way.

    I have seen cases where creating a new conference or workshop pays off with great

    new research (e.g. Privacy Enhancing Technologies, Usenix Electronic Voting

    Technologies), but in general I worry about fields becoming balkanized. Hard enough

    as it is to keep up with all the work coming out in the main focus of an area.
        March 3, 2009 9:12 PM
    Michael Mitzenmacher said...

        Hi Matt. I sympathize. I admit I kind of like the idea of "marking" a paper as

    an applications paper in some way, although one would hope that most people in the

    area would be able to read and judge such a paper appropriately.

        I've actually just put up a post I've been tinkering with for a few weeks on the

    plight of the poor theory paper for networking/systems conferences. Good timing. :)
        March 5, 2009 6:57 AM
    Ramakrishna Gummadi said...

        "I admit I kind of like the idea of "marking" a paper as an applications paper

    in some way"

        I think Mike's suggestion is good. For what it's worth, SIGCOMM 2009 has "focus"

    tags such as "system implementation" in addition to the traditional topics

    classifiers.
        March 5, 2009 11:17 PM
    HfHoeP5ovPKs1QkXrzDOvVuASQELNZnVYco- said...

        Matt,

        Welcome to my world. I think that in today's system, research credit through

    real deployments comes not so much through traditional measures as through the

    respect of one's peers, as measured through your tenure letters. As we know, this

    measurement depends greatly on whom is asked.

        Over the years I have squeezed out quite a few papers because it is hard to

    build something serious without learning *something* new, which you can then

    publish. It is a harder sell to solve an old problem in a new way, but sometimes

    that can be done, too.

        Norman
        March 17, 2009 3:59 PM

    Post a Comment