◇◇◇新语丝(www.xys.org)(www.xys2.org)(groups.yahoo.com/group/xys)◇◇◇

中科院计算所何宝宏、林守勋及林宗楷剽窃国外论文（四则）

（林宗楷为中国计算机学会CAD专委会主任，研究员
林守勋为数字化技术研究室副主任，研究员，博士生导师）

【反映】

方先生：

 您好！

我是新疆大学的一名在读硕士研究生,我在查阅与我课题有关的资料时发现了一
些问题,现将其中一个陈述如下,希望有关专家能够指明其中真相。

下面的表是由何宝宏、林守勋及林宗楷(中科院计算所CAD开放实验室　北京
100080)写的《因特网应用中音质技术的研究》一文中的一个表, 于1999年9月发
表在通信学报第20卷　第9期。

（表格见http://www.xys.org/xys/ebooks/others/science/dajia2/hebaohong.gif）
 
而下表是Henning Schulzrinne在NEVOT Implementation and Program 
Structure一文中的一个表，该文写于GMD Fokus, Berlin完成于1996年二月9号。
timestamp    type       AS       cache             transmit

160          talking      2        -                 160 (1)
320          talking      2        -                 320 (2)
480          silent       1        -                 480 (3)
640          silent       0        -                 640 (4)
800          silent       0        800                -
960 	      silent       0        800, 960            -
1120 	      silent       0        800, 960, 1120       -
1280 	      silent       0        960, 1120, 1280      - 
1440	      silent       0        1120, 1280, 1440     -
1600 	      talking      2        -                 1120 (5*), 1280 (6), 
1440 (7), 1600 (8)
1760 	      talking      2        -                 1760 (9)
1920 	      talking      2        -                 1920 (10)

Table 4: Example of Talkspurt and Silence Handling

仔细观察上面两个表，我们会发现它们除了timestamp也就是何宝宏等的表中时
间戳数值相差一倍之外几乎没有区别，我想两篇文章如此巧合应该有些蹊跷吧，
而且Henning Schulzrinne一文成于1996年且他是世界上多媒体通信方面的权威，
再说了他也不懂中文呀，相信应该不会有抄袭何宝宏等的文章的可能。最为可笑
的是何宝宏等的表中1760 (9)1920 (10)两项不知从何而来。这样的文章竟然能
在中国通信方面的权威刊物上发表，而且竟然是由中国计算机方面的权威机构的
博士加博导"写"的。我都怀疑中国的科技还有没有希望了。我不愿透露我的姓名，
因为我还年轻，在中国现在这种环境下，我还的夹着尾巴做人。但是我对以上叙
述的真实性负责，此事产生的任何后果与您无关。

【鉴定一】

舟子：

计算所的文章仔细读完。德国人的文章也读过了一遍。尽快给你一个初步答复。

计算所的文章应该说，只是一个实验方案的报告。

往好了估计:

计算所这几位仁兄，将德国人的报告拿过来，在计算所现有的环境下，把实验重做一
遍。依美国的经验，估计德国人的源码软件包ＮｅＶｏＴ也是可以免费拿到的。（德国
人文中倒是没提如何下载）

计算所的人不傻，在他们（大部分改写了）的报告中只重复了德国人的两个部分，这两
个部分的确是德国人文章中最精华的两个部分。

１．所谓＂静音抑制＂(德国人用的词：non-adaptive silence detector)
２．丢失数椐包的恢复处理。

我猜有一部分工作是计算所这帮人自己做的。把德国人的软件包从Ｓｕｎ　Ｍｉｃｒｏ
的环境搬到ＰＣ上。并针对Ｗｉｎｄｏｗｓ环境，对Ｂｕｆｆｅｒ大小做了点更动。

本来这一类的作法，在中国比比皆是。文章又被改写，缩短，突出重点，使用典型的中
国辞汇。

即使内行看了，也不好发难。一个实验在中国本土环境中重做一次还是有一定意义的。

问题出在，计算所的报告中，从头到尾，包括ｒｅｆｅｒｅｎｃｅｓ绝口不提德国人。
却又自以为聪明地把德国人的表，正如那位新疆同学指出，在声音取样数上乘了个二便
搬了过来。

笨就笨在表格右下角的两项数据忘了乘２就抄了过来。这便成了法庭上也难逃过的证据
与笑柄。

结论：抄袭！

(往坏了估计，这帮人兴许连试验都没做。)

以上仅供参考！

走着瞧

【鉴定二】

Notations: Paper 1 represents He's paper
           Paper 2 represents Schulzrinne's paper

Paper 1 describes the design, implementation, and
testing of a Internet audio application, JPhone.

General comments: 

Paper 1 does not provide technical details on many mechanisms/algorithms
"proposed" or "designed" by the authors. They vaguely outlined ideas which are
well understood by the research community. The wording is extremely unclear. I
am really surprised to see that the design and implementation of a complicated
audio application can be described in less than 2.5 pages. I do not see any
contributions. I wonder how this paper passed the journal review process.

Specific comments:

A. Section 3.1 of Paper 1 is the last paragraph on page 7 of Paper 2.

From Paper 2, "This technique, called "hang-over" ensures that the
trailing parts of words, like low-energy aspirated sounds ("s" sounds),
are not chopped off."

I believe that the authors misunderstood "hang-over" and used it to
describe the scenario that audio is chopped off at the end of a talk-spurt.
In addition, there are not lot of words in Chinese that end with "s" sounds.

B. Table 1 of Paper 1, is extremely similar to Table 4 of Paper 2. The
differences are 

(1) Authors of Paper 1 doubles the number of samples in an audio packet.

(2) Fifth column shows the transmission timestamps and sequence numbers of
packets. The last two entries of the fifth column, i.e., 1760(9) and 1920(10),
are NOT correct and impossible. They should be 3520(9) and 3840(10). They did
do a good job of covering up their asses.

(3) Third column in Paper 2 is "AS" or After_Spurt. This field indicates the
number of additional packets that need to be transmitted after a talk-spurt
terminates. The authors of Paper 1 interpreted it as the "depth" of the cache,
which is completely wrong.

C. Section 3.2 of Paper 1 talks about TCP based audio transmission control mechanisms.
TCP is generally considered bad for audio transmission because TCP ensures
end-to-end reliable and in-order delivery of packets, which results in long
packet delays. Excessive delays are bad for audio delivery in interactive
applications. Paper 2 points out "Only UDP is likely to work" (see page 14 of
Paper 2). JPhone is an interactive application. It seems that authors of Paper 1 still use
TCP based mechanism. Details, however, are not given.

D. Figure 1 of Paper 1 indicates that packet #n are transmitted three times.

From the context, it is only the FEC (forward error code) or other similar
information that is transmitted three times not the packet itself.

E. No figures or data concerning the performance of JPhone in a network
or a test-bed network setting are given in Paper 1.

F. As of 1999, interactive audio applications were in practical use and freely
available. The authors of Paper 1 conclude "it is possible to design an
interactive audio application."  What a waste of their time!

fbw


【鉴定三】

After carefully review the paper "The Research of Voice Quality 
Techniques in the Internet Applications," I have to say I was amazed 
this paper can be even published in a journal. Let along the possible 
copy-cat, the whole paper has serious fraud, even if it is original, it 
is not worth to be published.

Here is my comments:

In ch. 2.2, the authors described the biggest impact in network voice 
transmission is packet lost, and they listed three possible causes of 
packet lost. Which sounds reasonable, however, this is nothing new either.

In ch. 3.1, the paper described the method used to suppress silence, 
however, the paper mistaken the name of the technique, hang-over, to 
resolve the problem to be the term of the problem. Of course, the 
technique had been described thoroughly in Schulzrinne's paper. Since 
this is technique has been used in many places, including CDMA cell 
phone on transmitting digitized voice to the base station, I cannot be 
sure if this is copyrighted by Schulzrinne. The best we can say is the 
gang of three copied this paragraph from Schulzrinne's paper.

In ch. 3.2, ch. 3.3, the ideas described sound reasonable, however, 
JPhone couldn't have used the technique mentioned here. The technique 
itself, again, is very similar to the CDMA/GSM cell phone trying to 
maintained transmission by sacrifice the voice quality, I need more time 
to see if this technique is patented or not. The fraud here is: JPhone 
according to the authors, was based on TCP/IP, by using TCP as 
transmission protocol, the TCP handles the packet lost by automatic 
retransmitting the packet, also TCP ensures the order of bytes, i.e., 
the order of bytes being sent is the order of byte being received. 
Unless JPhone is running on some special embedded system, which 
according to the authors, is unlikely the case. (JPhone is a 
web-browser/server based software running on multi-tasking OS-see ch.3, 
and ch. 3.4, which is unlikely that JPhone runs on some 3G device). If 
JPhone runs on common platform like Linux, Windows or Unix, there is no 
practical way for JPhone to detect packet lost. Also, TCP cannot support 
real-time application, so JPhone cannot be used as a real-time 
interactive application. In another word, JPhone could have worked as 
described.

I wish I can actually look at the JPhone code, I can tell more about it. 
However, I doubted if a working JPhone actually exists.

regard.
steve

◇◇◇新语丝(www.xys.org)(www.xys2.org)(groups.yahoo.com/group/xys)◇◇◇