作者: Talkative 時(shí)間: 2025-3-21 23:51
,Einführung in die erste Auflage,hat improves the performance of stream-based Java applications. The approach enables the effective usage of Java for HPC applications, due to data locality improvements (i.e., support for efficient data layouts), without losing the object-oriented view of data in the code. The approach extends the J作者: 褪色 時(shí)間: 2025-3-22 01:15 作者: extinct 時(shí)間: 2025-3-22 05:00 作者: inferno 時(shí)間: 2025-3-22 11:44 作者: Commonplace 時(shí)間: 2025-3-22 14:42
https://doi.org/10.1007/978-3-663-09711-2e flexible and thus have many practical applications. In this paper we present the results of our attempt to use the recent advancements in Reinforcement Learning to automate the management of resources in a compute cloud environment. We describe a new approach to self-adaptation of autonomous manag作者: Commonplace 時(shí)間: 2025-3-22 18:48
,Die Sternkunde der Naturv?lker,heoretic approach where players tend to achieve a solution by reaching a Nash equilibrium. We propose a fully distributed algorithm based on applying the Spatial Prisoner’s Dilemma (SPD) game and a phenomenon of collective behavior of players participating in the game composed of two classes of auto作者: mutineer 時(shí)間: 2025-3-23 00:59 作者: 印第安人 時(shí)間: 2025-3-23 02:21 作者: PAD416 時(shí)間: 2025-3-23 07:18
https://doi.org/10.1007/978-3-0348-6519-7omputational complexity and big memory requirements, which exceed the capacity of small devices for inference. Knowledge distillation is an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model). Existing online knowledge distillation methods typical作者: detach 時(shí)間: 2025-3-23 09:47 作者: tattle 時(shí)間: 2025-3-23 14:27
Geschichte der Spiegelteleskope, improving the energy-performance behavior of data-parallel applications on shared-memory multicore systems. We propose to customize the clock frequency individually for the appropriately selected groups of cores corresponding to the diversified time of actual computation. In consequence, the advant作者: Melodrama 時(shí)間: 2025-3-23 20:09 作者: 衍生 時(shí)間: 2025-3-24 00:00
https://doi.org/10.1007/978-3-662-48862-1 motivated by the bi-objective optimization of a matrix multiplication application on such platforms for performance and energy. We formulate the problem and propose an algorithm of polynomial complexity solving the problem where all the application profiles of objective type one are continuous and 作者: Ointment 時(shí)間: 2025-3-24 02:43
,Die angels?chsischen Grundlagen,utions on the supercomputing market. These architectures often require re-coding of scientific kernels. For example, traditional implementations of algorithms for computing the fast Fourier transform (FFT) cannot take full advantage of vector architectures. In this paper, we present the implementati作者: beta-cells 時(shí)間: 2025-3-24 09:32
https://doi.org/10.1007/978-3-662-31582-8ing the optimization require changes to software development practices, through language extensions or constraints on software organization and compilation. This makes such techniques inapplicable for preexisting software in a language like OpenCL..This work introduces an implementation of kernel fu作者: 不在灌木叢中 時(shí)間: 2025-3-24 11:38
Lecture Notes in Computer Sciencehttp://image.papertrans.cn/e/image/316547.jpg作者: 極大痛苦 時(shí)間: 2025-3-24 15:50
978-3-031-06155-4Springer Nature Switzerland AG 2022作者: 安心地散步 時(shí)間: 2025-3-24 19:00 作者: Antigen 時(shí)間: 2025-3-24 23:16 作者: 玩忽職守 時(shí)間: 2025-3-25 04:06 作者: Default 時(shí)間: 2025-3-25 09:36
0302-9743 tational problems to full-edged applications, from architecture, compiler, language and interface design and implementation to tools, support infrastructures, and application performance aspects..978-3-031-06155-4978-3-031-06156-1Series ISSN 0302-9743 Series E-ISSN 1611-3349 作者: hypnogram 時(shí)間: 2025-3-25 13:30
,Einführung in die erste Auflage,roving the data locality can provide a two-fold performance gain in sequential stream applications, which translated into a similar gain over parallel stream implementations. Moreover, the performance is comparable to similar C implementations using OpenMP.作者: cogent 時(shí)間: 2025-3-25 19:33 作者: DRILL 時(shí)間: 2025-3-25 21:28 作者: 地名詞典 時(shí)間: 2025-3-26 04:13 作者: GUEER 時(shí)間: 2025-3-26 07:29 作者: Encumber 時(shí)間: 2025-3-26 11:54 作者: 滔滔不絕的人 時(shí)間: 2025-3-26 14:26 作者: 旋轉(zhuǎn)一周 時(shí)間: 2025-3-26 20:48
Feasibility Study of Molecular Dynamics Kernels Exploitation Using EngineCLnly with OpenCL-based technologies. Experimental evaluation shows improvements in all the kernels studied, obtaining on average speedups of up to 1.38 in performance and 1.60 in energy efficiency over the current optimized version.作者: Engulf 時(shí)間: 2025-3-26 21:53 作者: 諂媚于人 時(shí)間: 2025-3-27 03:51
https://doi.org/10.1007/978-3-663-09711-2nd discuss the results of evaluation which includes autonomous management of a sample application deployed to Amazon Web Services cloud. We also provide the details of training of the management policy using the Proximal Policy Optimization algorithm. Finally, we discuss the feasibility to extend the presented approach to further scenarios.作者: 性上癮 時(shí)間: 2025-3-27 08:14 作者: 夾死提手勢(shì) 時(shí)間: 2025-3-27 11:13 作者: Polydipsia 時(shí)間: 2025-3-27 15:15
https://doi.org/10.1007/978-3-662-48862-1application employing five heterogeneous processors that include two Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi. Based on our experiments, a dynamic energy saving of 17% is gained while tolerating a performance degradation of 5% (a saving of 106?J for an execution time increase of 0.05?s).作者: META 時(shí)間: 2025-3-27 21:27
https://doi.org/10.1007/978-3-662-31582-8n, or even precompiled OpenCL applications, could utilize the optimization. Despite the lack of explicit programmer effort, our compiler was able to deliver an average of 12.3% speedup over a range of applicable benchmarks on a target CPU platform.作者: 驕傲 時(shí)間: 2025-3-28 00:42 作者: miscreant 時(shí)間: 2025-3-28 05:43 作者: 充足 時(shí)間: 2025-3-28 07:35
Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneogical cores in total. The cost and efficiency of the proposed pruning algorithm for selecting heterogeneous DVFS configurations against the brute-force search are verified and compared experimentally.作者: GENRE 時(shí)間: 2025-3-28 12:35
A Novel Algorithm for Bi-objective Performance-Energy Optimization of Applications with Continuous Papplication employing five heterogeneous processors that include two Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi. Based on our experiments, a dynamic energy saving of 17% is gained while tolerating a performance degradation of 5% (a saving of 106?J for an execution time increase of 0.05?s).作者: pantomime 時(shí)間: 2025-3-28 15:50
Kernel Fusion in?OpenCLn, or even precompiled OpenCL applications, could utilize the optimization. Despite the lack of explicit programmer effort, our compiler was able to deliver an average of 12.3% speedup over a range of applicable benchmarks on a target CPU platform.作者: Geyser 時(shí)間: 2025-3-28 18:52
Towards an Efficient Sparse Storage Format for the SpMM Kernel in GPUsising in terms of performance and storage space. In this work, we re-implement the algorithm following the authors’ guidelines, adding two new stages that can benefit performance. The experiments performed using nine sparse matrices of different sizes show significant accelerations with respect to .’s CSR variant.作者: FEMUR 時(shí)間: 2025-3-29 02:10
https://doi.org/10.1007/978-3-662-29027-9k advanced sparse linear algebra routines utilizing the converted kernels to assess the efficiency of the DPC++ backend in the hardware-specific performance bounds. We compare the performance of basic building blocks against routines providing the same functionality that ship with Intel’s oneMKL vendor library.作者: 指耕作 時(shí)間: 2025-3-29 03:52 作者: Insensate 時(shí)間: 2025-3-29 08:17 作者: 俗艷 時(shí)間: 2025-3-29 13:35
Die Geschichte der chirurgischen Anaesthesieising in terms of performance and storage space. In this work, we re-implement the algorithm following the authors’ guidelines, adding two new stages that can benefit performance. The experiments performed using nine sparse matrices of different sizes show significant accelerations with respect to .’s CSR variant.作者: Myocarditis 時(shí)間: 2025-3-29 19:04 作者: 過(guò)濾 時(shí)間: 2025-3-29 22:39 作者: 誘惑 時(shí)間: 2025-3-30 03:30 作者: cartilage 時(shí)間: 2025-3-30 06:15 作者: stroke 時(shí)間: 2025-3-30 11:02
Accelerating FFT Using NEC SX-Aurora Vector Enginef maximizing the vector length usage of the algorithm and that adapting the algorithm to replace memory instructions with register shuffling operations can boost the performance of FFT-like computational kernels.作者: Urea508 時(shí)間: 2025-3-30 15:57
https://doi.org/10.1007/978-3-658-06850-9cus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering stra作者: Ligneous 時(shí)間: 2025-3-30 18:01
Die Geschichte der Kinderheilkundeacts the size and the frequency of data transfers in an application and visualizes them as a communication matrix. To demonstrate the tool in action, we present communication matrices and some statistics for two applications coming from machine translation and image classification domains.作者: 積習(xí)已深 時(shí)間: 2025-3-31 00:10 作者: intention 時(shí)間: 2025-3-31 01:47 作者: Preamble 時(shí)間: 2025-3-31 08:18 作者: arthroscopy 時(shí)間: 2025-3-31 12:13
Locality-Aware Scheduling of?Independent Tasks for?Runtime Systemsrators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient a