◇◇新语丝(www.xys.org)(xys4.dxiong.com)(www.xysforum.org)(xys2.dropin.org)◇◇   《南京大学长江学者特聘教授周志华剽窃的证据》客观在哪里?   作者:Hui Huang   我非常失望的看到,一个知耻的人的《更多》一文中又将主观感受作为证据。 在《更多》一文中,居然还出现了公式推导,以证明Perrone&Cooper的文章能得 到周志华的文章。这个知耻的人还不如去投诉爱因斯坦广义相对论抄袭黎曼几何。 我还注意到这个知耻的人再一次提到“10年”:   “不是抄袭,从思路到主要细节到主要结果在大约10年后又完整地跑到你们 脑子里来了?”   不知道为什么,他总要说周志华的工作是抄袭“10年前的”文章,而周志华 的文章明明是8年前的了。   为搞清是否抄袭,我们需要听的是同行看法。   为什么需要同行看法?针对同一个对象开展下去的研究,研究的问题必然会 越来越细致,因此外行未必能看出研究同一问题的论文的区别。此时,同行看法 是评价文章是否抄袭最为准确的标准。所以,我对同行的看法进行了调查,下面 是我的调查结果。   鉴于这里对大陆学者一贯的不信任,这里摘抄了三篇国外论文。这三篇论文 都同时引用了Perrone&Cooper的文章和周志华的文章,而且不仅仅是简单引用, 还对这两片文章进行了评述。从这些评述中,我们来看真正的同行是如果看待这 两篇文章的。   【文献一】: http://www.dcs.shef.ac.uk/intranet/teaching/projects/archive/ug2004/pd f/u1sn.pdf   是英国一毕业论文,作者单位为The University of Sheffield   在这篇文章中,对Perrone&Cooper文章的评述(其中[57]为该文):   Selection has been mentioned in many papers. A heuristic selection method was used by Perrone and Cooper, [57] where they train a population of nets and order them in terms of increasing mean squared error. Nets with the lowest mean squared error are combined in an ensemble.   对周志华文章的评述1(其中[39]为该文):这里特别注意的是,“选择一 部分”这样的做法列了三个文献,即39,45,51旦没有列Perrone&Cooper的文章   Most ensemble approaches use all the nets available for combination, however if only some of the nets are combined it may be possible to achieve a higher generalisation. [39, 45, 51]   对周志华文章的评述2(其中[39]为该文):   More recently, Zhou et al. [39] claim most approaches ensemble all the available networks rather than selecting some of the component nets and ensembling them which they found to be a lot better. [39] They take a genetic algorithm approach to selection to show that many nets when ensembled can be better than all the nets. [39] They present an approach called GASEN, (Genetic Algorithm based Selective Ensemble) which firstly trains a number of neural nets. Random weights are then assigned to the neural nets and the genetic algorithm used to evolve the weights so that the fitness of the neural nets in the constituting ensemble can be characterised. To create the ensemble selection is made according to the evolved weights   对周志华文章的评述3(其中[39]为该文):   A possible method proposed by Zhou, Wu and Tang [39] could be beneficial to evolve effective ensembles by selection. They present that the combination of “many could be better than all” [39] when ensembling neural networks. They argue that most approaches use all the available neural nets to create ensembles, even though the integrity of such an approach has not been formally proven. [39] Therefore, a selection of nets could be compared with the combination of all the nets to see what method returns the most effective ensemble.   可以看出,该毕业论文作者认为Perrone&Cooper的文章和周志华的文章主题 都是不同的,并不认为周志华的论文抄袭。   【文献二】:Niall Rooney, David Patterson, Chris Nugent. Pruning extensions to stacking. Intelligent Data Analysis, 10(1):47-66, 2006   作者单位为University of Ulster at Jordanstown   文中对Perrone&Cooper文章的(还特别分别讨论了BEM和GEM)评述(其中[23] 为该文):   The simplest ensemble method for regression is referred to as the Basic Ensemble Method (BEM) [23]. BEM sets the weights αi to be equal to 1N . This method does not take into account the individual performances of the base models. Bagging [2] is equivalent in its integration approach to BEM, however it requires that the base models be generated using random sampling with replacement.   The generalised ensemble method (GEM) and Linear Regression (LR) were developed to give more “optimal” weights to each base model. However, both GEM and LR techniques may suffer from a numerical problem known as the multi-collinear problem. This problem is a consequence of a situation arising where one or more models can be expressed as a linear combination of one or more of the other models. One approach to ameliorate multi-collinear problems is to use weight regularization. An example of this is where the weights are constrained to sum to one.   对周志华文章的评述(其中[36]为该文):   It has been shown that given the presence of N models it is possible that an ensemble learner can perform better if it only uses a given subset of those models rather than all [36].   可见Niall Rooney, David Patterson, Chris Nugent对Perrone&Cooper的 文章做了详细的评论,旦仍然认为周志华的文章是不同的,而不是抄袭。   【文献三】:Zainal Ahmad, Jie Zhang. Selective combination of multiple neural networks for improving model prediction in nonlinear systems modelling through forward selection and backward elimination. Neurocomputing 72(4-6): 1198-1204, 2009   第一作者单位为University Sains Malaysia,第二作者单位为Newcastle University   对Perrone&Cooper文章的评述(其中[19]为该文)和对周志华文章的评述 (其[30]为该文)。这里特别注意的是,这两段评述是连在一起的。   Excluding these networks could further improve the generalisation capability of the aggregated network. Perrone and Cooper [19] suggest a heuristics selection method whereby the trained networks are ordered in terms of increasing mean-squared errors (MSE) and only those with lower MSE are included in combination. However, combining these networks with lower MSE may not significantly improve model generalisation since these networks can be severely correlated. Zhou et al. [30] show that combining selected networks may be better than combining all individual networks and propose a genetic algorithm-based approach for selecting individual networks in an ensemble.   Zainal Ahmad, Jie Zhang两作者用了转折词“however”来连接 Perrone&Cooper的文章和周志华的文章,可见作者认为是这两文章是不同的。   基于以上调查,我得出的结论是周志华的论文并非抄袭。   我在想,对他人不基于事实的指责,是否也算是学术腐败。 (XYS20091025) ◇◇新语丝(www.xys.org)(xys4.dxiong.com)(www.xysforum.org)(xys2.dropin.org)◇◇