TY  - JOUR
A1  - Christgau, Steffen
A1  - Schnor, Bettina
T1  - Exploring one-sided communication and synchronization on a non-cache-coherent many-core architecture
JF  - Concurrency and computation : practice & experience
N2  - The ongoing many-core design aims at core counts where cache coherence becomes a serious challenge. Therefore, this paper discusses how one-sided communication and the required process synchronization can be realized on a non-cache-coherent many-core CPU. The Intel Single-chip Cloud Computer serves as an exemplary hardware architecture. The presented approach is based on software-managed cache coherence for MPI one-sided communication. The prototype implementation delivers a PUT performance of up to 5 times faster than the default message-based approach and reveals a reduction of the communication costs for the NAS Parallel Benchmarks 3-D fast Fourier Transform by a factor of 5. Further, the paper derives conclusions for future non-cache-coherent architectures.
KW  - MPI
KW  - one-sided communication
KW  - programming models and systems for many-cores
KW  - synchronization
KW  - software-managed cache coherence
Y1  - 2017
U6  - https://doi.org/10.1002/cpe.4113
SN  - 1532-0626
SN  - 1532-0634
VL  - 29
PB  - Wiley
CY  - Hoboken
ER  - 
TY  - JOUR
A1  - Ciaccio, Giuseppe
A1  - Ehlert, Marco
A1  - Schnor, Bettina
T1  - Exploiting gigabit ethernet capacity for cluster applications
N2  - In this paper we report about the recently completed porting of GAMMA to the Netgear GA621 Gigabit Ethernet adapter, and provide a comparison among GAMMA, MPI/GAMMA, TCP/IP, and MPICH/TCP, based on the Netgear GA621 and the older Netgear GA620 network adapters and using different device drivers, in a Gigabit Ethernet cluster of PCs running Linux 2.4. GAMMA (the Genoa Active Message MAchine) is a lightweight messaging system based on an Active Message-like paradigm, originally designed for efficient exploitation of Fast Ethernet interconnects. The comparison includes simple latency/hspace{0pt}bandwidth evaluation of the messaging systems on both adapters, as well as performance comparisons based on the NAS NPB and an end-user fluid dynamics application called Modular Ocean Model (MOM). The analysis of results provides useful hints concerning the efficient use of Gigabit Ethernet with clusters of PCs. In particular, it emerges that GAMMA on the GA621 adapter, with a combination of low end-to-end latency (8.5 $mu$s) and high throughput (118.4 MByte/s), provides a performing, cost-effective alternative to proprietary high-speed networks, e.g.~Myrinet, for a wide range of cluster computing applications.
Y1  - 2002
SN  - 0-7695-1591-6
ER  - 
TY  - JOUR
A1  - De Lucia, Marco
A1  - Kühn, Michael
A1  - Lindemann, Alexander
A1  - Lübke, Max
A1  - Schnor, Bettina
T1  - POET (v0.1): speedup of many-core parallel reactive transport simulations with fast DHT lookups
JF  - Geoscientific model development : an interactive open access journal of the European Geosciences Union
N2  - Coupled reactive transport simulations are extremely demanding in terms of required computational power, which hampers their application and leads to coarsened and oversimplified domains. The chemical sub-process represents the major bottleneck: its acceleration is an urgent challenge which gathers increasing interdisciplinary interest along with pressing requirements for subsurface utilization such as spent nuclear fuel storage, geothermal energy and CO2 storage. In this context we developed POET (POtsdam rEactive Transport), a research parallel reactive transport simulator integrating algorithmic improvements which decisively speed up coupled simulations. In particular, POET is designed with a master/worker architecture, which ensures computational efficiency in both multicore and cluster compute environments. POET does not rely on contiguous grid partitions for the parallelization of chemistry but forms work packages composed of grid cells distant from each other. Such scattering prevents particularly expensive geochemical simulations, usually concentrated in the vicinity of a reactive front, from generating load imbalance between the available CPUs (central processing units), as is often the case with classical partitions. Furthermore, POET leverages an original implementation of the distributed hash table (DHT) mechanism to cache the results of geochemical simulations for further reuse in subsequent time steps during the coupled simulation. The caching is hence particularly advantageous for initially chemically homogeneous simulations and for smooth reaction fronts. We tune the rounding employed in the DHT on a 2D benchmark to validate the caching approach, and we evaluate the performance gain of POET's master/worker architecture and the DHT speedup on a 3D benchmark comprising around 650 000 grid elements. The runtime for 200 coupling iterations, corresponding to 960 simulation days, reduced from about 24 h on 11 workers to 29 min on 719 workers. Activating the DHT reduces the runtime further to 2 h and 8 min respectively. Only with these kinds of reduced hardware requirements and computational costs is it possible to realistically perform the longterm complex reactive transport simulations, as well as perform the uncertainty analyses required by pressing societal challenges connected with subsurface utilization.
Y1  - 2021
U6  - https://doi.org/10.5194/gmd-14-7391-2021
SN  - 1991-959X
SN  - 1991-9603
VL  - 14
IS  - 12
SP  - 7391
EP  - 7409
PB  - Copernicus
CY  - Göttingen
ER  - 
TY  - JOUR
A1  - Feider, Henryk
A1  - Schnor, Bettina
A1  - Dramlitsch, Thomas
T1  - Gridmake : the missing link for compilation in the Grid
N2  - In order to take full advantage of Grid environments, applications need to be able to run on various heterogeneous platforms. Distributed runs across several clusters or supercomputers for example, require matching binaries at each site. Thus, at some stage, each Grid enabled application needs to be recompiled for every platform. Up to now, creating matching binaries on different platforms was a manual, sequential, slow, and very error-prone process. Developers had to log into each machine, transfer source code, check consistency and recompile if necessary. This cumbersome procedure is surely one reason for the (still existing) lack of production Grid computing. Gridmake, a tool to automate and speed up this procedure is presented in this paper.
Y1  - 2003
ER  - 
TY  - JOUR
A1  - Friedrich, Sven
A1  - Krahmer, Sebastian
A1  - Schneidenbach, Lars
A1  - Schnor, Bettina
T1  - Loaded: Server Load Balancing for IPv6
N2  - With the next generation Internet protocol IPv6 at the horizon, it is time to think about how applications can migrate to IPv6. Web traffic is currently one of the most important applications in the Internet. The increasing popularity of dynamically generated content on the World Wide Web, has created the need for fast web servers. Server clustering together with server load balancing has emerged as a promising technique to build scalable web servers. The paper gives a short overview over the new features of IPv6 and different server load balancing technologies. Further, we present and evaluate Loaded, an user-space server load balancer for IPv4 and IPv6 based on Linux.
Y1  - 2006
SN  - 0-7695-2622-5
ER  - 
TY  - JOUR
A1  - Friedrich, Sven
A1  - Krahmer, Sebastian
A1  - Schneidenbach, Lars
A1  - Schnor, Bettina
T1  - Loaded : Server Load Balancing for IPv6
Y1  - 2004
ER  - 
TY  - JOUR
A1  - Friedrich, Sven
A1  - Schneidenbach, Lars
A1  - Schnor, Bettina
T1  - SLIBNet : Server Load Balancing for InfiniBand Networks
N2  - Today, InfiniBand is an evolving high speed interconnect technology to build high performance computing clusters, that achieve top 10 rankings in the current top 500 of the worldwide fastest supercomputers. Network interfaces (called host channel adapters) provide transport layer services over connections and datagrams in reliable or unreliable manner. Additionally, InfiniBand supports remote direct memory access (RDMA) primitives that allow for one- sided communication. Using server load balancing together with a high performance cluster makes it possible to build a fast, scalable, and reliable service infrastructure. We have designed and implemented a scalable load balancer for InfiniBand clusters called SLIBNet. Our investigations show that the InfiniBand architecture offers features which perfectly support load balancing. We want to thank the Megware Computer GmbH for providing us an InfiniBand switch to realize a server load balancing testbed.
Y1  - 2005
ER  - 
TY  - JOUR
A1  - Hallama, Nicole
A1  - Luckow, André
A1  - Schnor, Bettina
T1  - Grid Security for Fault Tolerant Grid Applications
Y1  - 2006
SN  - 978-1-880843-60-4
ER  - 
TY  - JOUR
A1  - Hoheisel, A.
A1  - Müller, S.
A1  - Schnor, Bettina
T1  - Fine-grained Security Management in a Service-oriented Grid Architecture
Y1  - 2007
UR  - http://www.cyfronet.krakow.pl/cgw06/presentations/c4-3.pdf
SN  - 978-0-387-72811-7
ER  - 
TY  - JOUR
A1  - Jeske, Janin
A1  - Luckow, André
A1  - Schnor, Bettina
T1  - Reservation-based Resource-Brokering for Grid Computing
N2  - In this paper we present the design and implementation of the Migol brokering framework. Migol is a Grid middleware, which addresses the fault-tolerance of long-running and compute-intensive applications. The framework supports e. g. the automatic and transparent recovery respectively the migration of applications. Another core feature of Migol is the discovery, selection, and allocation of resources using advance reservation. Grid broker systems can significantly benefit from advance reservation. With advance reservation brokers and users can obtain execution guarantees from local resource management systems (LRM) without requiring detailed knowledge of current and future workloads or of the resource owner's policies. Migol's Advance Reservation Service (ARS) provides an adapter layer for reservation capabilities of different LRMs, which is currently not provided by existing Grid middleware platforms. Further, we propose a shortest expected delay (SED) strategy for scheduling of advance reservations within the Job Broker Service. SED needs information about the earliest start time of an application. This is currently not supported by LRMs. We added this feature for PBSPro. Migol depends on Globus and its security infrastructure. Our performance experiments show the substantial overhead of this serviceoriented approach.
Y1  - 2007
UR  - http://edoc.mpg.de/316626
ER  - 
TY  - JOUR
A1  - Jung, Jörg
A1  - Kiertscher, Simon
A1  - Menski, Sebastian
A1  - Schnor, Bettina
T1  - Self-Adapting Load Balancing for DNS
JF  - Journal of networks
N2  - The Domain Name System belongs to the core services of the Internet infrastructure. Hence, DNS availability and performance is essential for the operation of the Internet and replication as well as load balancing are used for the root and top level name servers.
 This paper proposes an architecture for credit based server load balancing (SLB) for DNS. Compared to traditional load balancing algorithms like round robin or least connection, the benefit of credit based SLB is that the load balancer can adapt more easily to heterogeneous load requests and back end server capacities. The challenge of this approach is the definition of a suited credit metric. While this was done before for TCP based services like HTTP, the problem was not solved for UDP based services like DNS.
 In the following an approach is presented to define credits also for UDP based services. This UDP/DNS approach is implemented within the credit based SLB implementation salbnet. The presented measurements confirm the benefit of the self-adapting credit based SLB approach. In our experiments, the mean (first) response time dropped significantly compared to weighted round robin (WRR) (from over 4 ms to about 0.6 ms for dynamic pressure relieve (DPR)).
KW  - Load Balancing
KW  - Cluster Computing
KW  - Performance Evaluation
Y1  - 2015
U6  - https://doi.org/10.1109/SPECTS.2014.6879994
VL  - 10
IS  - 4
SP  - 222
EP  - 231
PB  - Kluwer Academic Publishers
CY  - Oulu
ER  - 
TY  - JOUR
A1  - Kiertscher, Simon
A1  - Zinke, Jörg
A1  - Schnor, Bettina
T1  - CHERUB power consumption aware cluster resource management
JF  - Cluster computing : the journal of networks, software tools and applications
N2  - This paper presents an evaluation of ACPI energy saving modes, and deduces the design and implementation of an energy saving daemon for clusters called cherub. The design of the cherub daemon is modular and extensible. Since the only requirement is a central approach for resource management, cherub is suited for Server Load Balancing (SLB) clusters managed by dispatchers like Linux Virtual Server (LVS), as well as for High Performance Computing (HPC) clusters. Our experimental results show that cherub's scheduling algorithm works well, i.e. it will save energy, if possible, and avoids state-flapping.
KW  - Green computing
KW  - Cluster computing
Y1  - 2013
U6  - https://doi.org/10.1007/s10586-011-0176-5
SN  - 1386-7857
VL  - 16
IS  - 1
SP  - 55
EP  - 63
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Kling, Christoph
A1  - Schneidenbach, Lars
A1  - Schnor, Bettina
T1  - A high performance gigabit ethernet messaging method for PVFS
N2  - Parallel File Systems like PVFS2 are a necessary compo nent for high-performance computing. The design of ef ;cient communication layers for these systems is still of great research interest. This paper presents a low- latency messaging method for PVFS2 dedicated for Gigabit Ether net networks and discusses relevant design issues. In con trast to other approaches, we argue that zero-copying can be achieved also for big messages without use of a rendez vous protocol. Further, ef;ciency within the communica tion layer like a small call stack plays an important role.
Y1  - 2005
SN  - 0-88986-525-6
ER  - 
TY  - JOUR
A1  - Lanfermann, Gerd
A1  - Schnor, Bettina
A1  - Seidel, Edward
T1  - Characterizing Grids
N2  - We present a new data model approach to describe the various objects that either represent the Grid infrastructure or make use of it. The data model is based on the experiences and experiments conducted in heterogeneous Grid environments. While very sophisticated data models exist to describe and characterize e.g. compute capacities or web services, we will show that a general description, which combines {em all} of these aspects, is needed to give an adequate representation of objects on a Grid. The Grid Object Description Language (GODsL)} is a generic and extensible approach to unify the various aspects that an object on a Grid can have. GODsL provides the content for the XML based communication in Grid migration scenarios, carried out in the GridLab project. We describe the data model architecture on a general level and focus on the Grid application scenarios.
Y1  - 2003
SN  - 1-4020-7418-2
ER  - 
TY  - JOUR
A1  - Liang, Feng
A1  - Liu, Yunzhen
A1  - Liu, Hai
A1  - Ma, Shilong
A1  - Schnor, Bettina
T1  - A Parallel Job Execution Time Estimation Approach Based on User Submission Patterns within Computational Grids
JF  - International journal of parallel programming
N2  - Scheduling performance in computational grid can potentially benefit a lot from accurate execution time estimation for parallel jobs. Most existing approaches for the parallel job execution time estimation, however, require ample past job traces and the explicit correlations between the job execution time and the outer layout parameters such as the consumed processor numbers, the user-estimated execution time and the job ID, which are hard to obtain or reveal. This paper presents and evaluates a novel execution time estimation approach for parallel jobs, the user-behavior clustering for execution time estimation, which can give more accurate execution time estimation for parallel jobs through exploring the job similarity and revealing the user submission patterns. Experiment results show that compared to the state-of-art algorithms, our approach can improve the accuracy of the job execution time estimation up to 5.6 %, meanwhile the time that our approach spends on calculation can be reduced up to 3.8 %.
KW  - User submission pattern
KW  - Parallel job execution time estimation
KW  - Computational grid
Y1  - 2015
U6  - https://doi.org/10.1007/s10766-013-0294-1
SN  - 0885-7458
SN  - 1573-7640
VL  - 43
IS  - 3
SP  - 440
EP  - 454
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Liske, Stefan
A1  - Rebensburg, Klaus
A1  - Schnor, Bettina
T1  - SPIT-Erkennung, -Bekanntgabe und -Abwehr in SIP-Netzwerken
N2  - SPAM ist in den letzten Jahren zur großten Bedrohung der E-Mail-Kommunikation herangewachsen - jedoch nicht nur auf diesen Kommunikationsweg beschrankt. Mit steigender Anzahl von VoIP-Anschlüssen werden auch hier die teilnehmenden Benutzer mit SPAM-Anrufen (SPIT) konfrontiert werden. Neben derzeit diskutierten juristischen Maßnahmen müssen auch technische Abwehrmaßnahmen geschaffen werden, welche SPAM erkennen und vermeiden können. Dieser Beitrag stellt zwei Erweiterungen für das VoIP-Protokoll SIP vor, welche es erstens den Providern ermöglichen, SPIT-Einschätzungen über den Anrufer zum angerufenen Benutzer zu übermitteln und zweitens den Angerufenen die Möglichkeit geben, mit einer Kostenanforderung auf potentielle SPIT-Anrufe zu reagieren.
Y1  - 2007
SN  - 978-3-540-69961-3
ER  - 
TY  - JOUR
A1  - Lorenz, Claas
A1  - Clemens, Vera Elisabeth
A1  - Schrötter, Max
A1  - Schnor, Bettina
T1  - Continuous verification of network security compliance
JF  - IEEE transactions on network and service management
N2  - Continuous verification of network security compliance is an accepted need. Especially, the analysis of stateful packet filters plays a central role for network security in practice. But the few existing tools which support the analysis of stateful packet filters are based on general applicable formal methods like Satifiability Modulo Theories (SMT) or theorem prover and show runtimes in the order of minutes to hours making them unsuitable for continuous compliance verification. In this work, we address these challenges and present the concept of state shell interweaving to transform a stateful firewall rule set into a stateless rule set. This allows us to reuse any fast domain specific engine from the field of data plane verification tools leveraging smart, very fast, and domain specialized data structures and algorithms including Header Space Analysis (HSA). First, we introduce the formal language FPL that enables a high-level human-understandable specification of the desired state of network security. Second, we demonstrate the instantiation of a compliance process using a verification framework that analyzes the configuration of complex networks and devices - including stateful firewalls - for compliance with FPL policies. Our evaluation results show the scalability of the presented approach for the well known Internet2 and Stanford benchmarks as well as for large firewall rule sets where it outscales state-of-the-art tools by a factor of over 41.
KW  - Security
KW  - Tools
KW  - Network security
KW  - Engines
KW  - Benchmark testing;
KW  - Analytical models
KW  - Scalability
KW  - Network
KW  - security
KW  - compliance
KW  - formal
KW  - verification
Y1  - 2021
U6  - https://doi.org/10.1109/TNSM.2021.3130290
SN  - 1932-4537
VL  - 19
IS  - 2
SP  - 1729
EP  - 1745
PB  - Institute of Electrical and Electronics Engineers
CY  - New York
ER  - 
TY  - JOUR
A1  - Luckow, Andre
A1  - Jha, Shantenu
A1  - Kim, Joohyun
A1  - Merzky, Andre
A1  - Schnor, Bettina
T1  - Adaptive distributed replica-exchange simulations
N2  - Owing to the loose coupling between replicas, the replica-exchange (RE) class of algorithms should be able to benefit greatly from using as many resources as available. However, the ability to effectively use multiple distributed resources to reduce the time to completion remains a challenge at many levels. Additionally, an implementation of a pleasingly distributed algorithm such as replica-exchange, which is independent of infrastructural details, does not exist. This paper proposes an extensible and scalable framework based on Simple API for Grid Applications that provides a general-purpose, opportunistic mechanism to effectively use multiple resources in an infrastructure-independent way. By analysing the requirements of the RE algorithm and the challenges of implementing it on real production systems, we propose a new abstraction (BIGJOB), which forms the basis of the adaptive redistribution and effective scheduling of replicas.
Y1  - 2009
UR  - http://rsta.royalsocietypublishing.org/
U6  - https://doi.org/10.1098/rsta.2009.0051
SN  - 1364-503X
ER  - 
TY  - JOUR
A1  - Luckow, André
A1  - Schnor, Bettina
T1  - Migol : a Fault-Tolerant Service Framework for MPI Applications in the Grid
N2  - In a distributed, inherently dynamic Grid environment the reliability of individual resources cannot be guaranteed. The more resources and components are involved the more error-prone is the system. Therefore, it is important to enhance the dependability of the system with fault-tolerance mechanisms. In this paper, we present Migol, a fault-tolerant, self-healing Grid service infrastructure for MPI applications. The benefit of the Grid is that in case of a failure an application may be migrated and restarted from a checkpoint file on another site. This approach requires a service infrastructure which handles the necessary activities transparently for an application. But any migration framework cannot support fault-tolerant applications, if it is not fault-tolerant itself.
Y1  - 2005
SN  - 978-3-540-29009-4
ER  - 
TY  - JOUR
A1  - Luckow, André
A1  - Schnor, Bettina
T1  - Migol : a Fault Tolerant Service Framework for Grid Computing : Evolution to WSRF (2006)
Y1  - 2006
ER  -