◇◇新语丝(www.xys.org)(xys2.dxiong.com)(www.xysforum.org)(xys-reader.org)◇◇ 北京科技大学信息工程学院副院长胡长军等人的一篇拼装文 作者:卿本佳人   北京科技大学信息工程学院副院长、学术委员会秘书长、博士生导师胡长军, 博士生王珏、张纪林,硕士生导师李建江有一篇拼装大作:   OpenMP Extensions for Irregular Parallel Applications on Clusters –   Jue Wang (王珏), Changjun Wu (胡长军教授), Jilin Zhang (张纪 林), Jianjiang Li (李建江副教授)[王文]   发表在 3rd International Workshop on OpenMP (IWOMP 2007)并刊登于 Lecture Notes in Computer Science (LNCS)Volume 4935: http://www.springerlink.com/content/y46605g77724g8m8/ 王文抄自至少四篇他人著作: "Symbolic Communication Set Generation for Irregular Parallel Applications" by Guo, Pan and Liu, The Journal of Supercomputing, vol 25, 2003, pp. 199-214 [GuoPan文] "Effective OpenMP Extensions for Irregular Applications on Cluster Environments" by Guo, Cao, Chang, Li and Liu [GuoCao文] (王文列为参考文献Ref 8) "Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice" by Tseng and Gaudiot [Tseng文] (王文列为参考文献Ref 15) "Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems" by Basumallik and Eigenmann [Basumallik文] (王文列为参考文献Ref 7) 抄袭情况俯拾皆是, 略举一二如下: 1. 王文第1节 Sparse and unstructured computations are widely used in scientific and engineering applications. This means that the data arrays are indexed either through the value in other arrays, which are called indirection array, or through non-affine subscripts. Indirect/nonlinear indexing causes the data access pattern to be highly irregular. Such a problem is called irregular problem. 抄自GuoCao 文 Abstract: Sparse and unstructured computations are widely used in Scientific and Engineering Applications. 第1节: This means that the data arrays are indexed either through the value in other arrays, which are called indirection arrays/index arrays, or through non-affine subscripts. The use of indirect/nonlinear indexing causes the data access patterns ... to be highly irregular. Such a problem is called irregular problem. 2. 王文第1节 If the array subscript expressions are nonlinear form, which appear in some irregular parallel applications, the performance of total execution may not be improved using the techniques mentioned above. 抄拼自 GuoPan 文第1节: However, if the array subscript expressions are not of the linear form - called nonlinear - which appears in some irregular parallel applications - the above mentioned techniques cannot be applied in this situation. GuoCao文第2节: However, if irregular loops are not parceled ..., the performance of total execution may not be improved ... 3. 王文第1节 Fig 1 /* a perfectly nested loop phi */ DO i_1 = 1, N, S_1 DO i_2 = L_2(i_1), U_2(i_1), S_2(i_1) ... DO i_n = L_n(i_1, ..., i_n-1), U_n(i_1, ..., i_n-1), S_n(i_1, ..., i_n-1) A[f(i_1, ..., i_n)] = F(B[g(i_1, ..., i_n)]); ENDDO ... ENDDO ENDDO 抄自GuoPan 文第2节: Given a perfectly nested loop L as shown in the following. L_1: DO i_1 = X_1, Y_1, Z_1 ... L_n: DO i_n = X_n, Y_n, Z_n S: A(f(i_1, i_2, ..., i_n)) = F(B(g(i_1, i_2, ..., i_n))); ENDDO ... ENDDO 4. 王文第1节 For the sake of simplicity, we assume that the data array A and B have only one dimension. In the loop, the array access functions (f and g), the lower and upper bound (L, U) and stride (S) may be arbitrary symbolic expressions made up of loop-invariant variables and loop indices of enclosing loops. General parallel compiling techniques can not be applied to these kinds of irregular applications, because there is no affine relationship between the array global addresses of LHS (Left Hand Side) and RHS (Right Hand Side). 抄自GuoPan 文 第2节: For the sake of simplicity, we will assume that the referenced array A and B have only one dimension. The array access function (f and g), the loop's lower and upper bounds (X_i, Y_i) and stride (Z_i) may be arbitrary symbolic expressions made up of loop-invariant variables and loop indices of enclosing loops. 第1节: General affine communication set generation techniques cannot be applied to these kinds of irregular applications because there is no affine relationship between the array global addresses of LHS and RHS. 5. 王文第2节 In traditional OpenMP specification, there are four scheduling policies available: static scheduling, dynamic scheduling, guided scheduling, and runtime scheduling. In order to reduce communication overhead and achieve load balance, we extend irregular scheduling to OpenMP. This scheduling follows owner-compute rule, where each iteration will be executed by the processor which own the left hand side array reference of the assignment for that iteration. 抄自GuoCao 文第2节: There are four scheduling policies available in OpenMP: static scheduling, dynamic scheduling, guided scheduling, and runtime scheduling. In order to achieve load balance for irregular loops, it is better to select dynamic or guided sceduling ... the chunk parcel follows the owner-compute rule. ... each iteration will be executed by the processor which owns the left hand side array reference of the assignment for that iteration. 6. 王文第2节 In this example, the compiler will treat the loop as a partial ordered, i.e. some iterations are executed in ordered way while some other may be executed in parallel. 抄自GuoCao 文第3节: ... in this case the compiler will treat the loop as partially ordered, that is, some iterations are executed sequentially while others may be executed in parallel. 7. 王文第3节 The sending of messages can be performed individually for each remote processor as soon as each packing is completed, instead of waiting for all message for all processors to be packed. The nonlocal iterations can also be performed based on each received message, instead of waiting for all messages to be received, because the nonlocal iterations are split into groups based on the sending processors. 抄自Tseng 文第3.1节: The sending of messages can be performed individually for each remote processor as soon as each packing is completed, instead of waiting for all messages for all processors to be packed. ... The nonlocal iterations can also be performed based on each received message, instead of waiting for all messages to be received. This is because the nonlocal iterations are split into groups based on the sending processors. 8. 王文第5节 In this section, we will present our transformation scheme relies on deducing the monotonicity of irregular accesses at compile time. 抄自Basumallik 文第1节: The techniques proposed ... in previous work ... relied on deducing certain properties (such as monotonicity) of irregular accesses at compile-time. (XYS20080818) ◇◇新语丝(www.xys.org)(xys2.dxiong.com)(www.xysforum.org)(xys-reader.org)◇◇