TY  - JOUR
A1  - Friedrich, Sven
A1  - Schneidenbach, Lars
A1  - Schnor, Bettina
T1  - SLIBNet : Server Load Balancing for InfiniBand Networks
N2  - Today, InfiniBand is an evolving high speed interconnect technology to build high performance computing clusters, that achieve top 10 rankings in the current top 500 of the worldwide fastest supercomputers. Network interfaces (called host channel adapters) provide transport layer services over connections and datagrams in reliable or unreliable manner. Additionally, InfiniBand supports remote direct memory access (RDMA) primitives that allow for one- sided communication. Using server load balancing together with a high performance cluster makes it possible to build a fast, scalable, and reliable service infrastructure. We have designed and implemented a scalable load balancer for InfiniBand clusters called SLIBNet. Our investigations show that the InfiniBand architecture offers features which perfectly support load balancing. We want to thank the Megware Computer GmbH for providing us an InfiniBand switch to realize a server load balancing testbed.
Y1  - 2005
ER  - 
TY  - JOUR
A1  - Feider, Henryk
A1  - Schnor, Bettina
A1  - Dramlitsch, Thomas
T1  - Gridmake : the missing link for compilation in the Grid
N2  - In order to take full advantage of Grid environments, applications need to be able to run on various heterogeneous platforms. Distributed runs across several clusters or supercomputers for example, require matching binaries at each site. Thus, at some stage, each Grid enabled application needs to be recompiled for every platform. Up to now, creating matching binaries on different platforms was a manual, sequential, slow, and very error-prone process. Developers had to log into each machine, transfer source code, check consistency and recompile if necessary. This cumbersome procedure is surely one reason for the (still existing) lack of production Grid computing. Gridmake, a tool to automate and speed up this procedure is presented in this paper.
Y1  - 2003
ER  - 
TY  - JOUR
A1  - Schneidenbach, Lars
A1  - Schnor, Bettina
A1  - Petri, Stefan
T1  - Architecture and Implementation of the Socket Interface on Top of GAMMA
Y1  - 2003
SN  - 0-7695-2037-5
ER  - 
TY  - BOOK
A1  - Mihahn, Michael
A1  - Schnor, Bettina
T1  - Fault-Tolerant Grid Peer Services
T3  - Technischer Bericht
Y1  - 2004
SN  - 0946-7580
PB  - Universität Potsdam, Institut für Informatik
CY  - Potsdam
ER  - 
TY  - BOOK
A1  - Schnor, Bettina
T1  - Seminarband: Sensornetze
T3  - Technischer Bericht
Y1  - 2004
SN  - 0946-7580
PB  - Universität Potsdam, Institut für Informatik
CY  - Potsdam
ER  - 
TY  - JOUR
A1  - Liang, Feng
A1  - Liu, Yunzhen
A1  - Liu, Hai
A1  - Ma, Shilong
A1  - Schnor, Bettina
T1  - A Parallel Job Execution Time Estimation Approach Based on User Submission Patterns within Computational Grids
JF  - International journal of parallel programming
N2  - Scheduling performance in computational grid can potentially benefit a lot from accurate execution time estimation for parallel jobs. Most existing approaches for the parallel job execution time estimation, however, require ample past job traces and the explicit correlations between the job execution time and the outer layout parameters such as the consumed processor numbers, the user-estimated execution time and the job ID, which are hard to obtain or reveal. This paper presents and evaluates a novel execution time estimation approach for parallel jobs, the user-behavior clustering for execution time estimation, which can give more accurate execution time estimation for parallel jobs through exploring the job similarity and revealing the user submission patterns. Experiment results show that compared to the state-of-art algorithms, our approach can improve the accuracy of the job execution time estimation up to 5.6 %, meanwhile the time that our approach spends on calculation can be reduced up to 3.8 %.
KW  - User submission pattern
KW  - Parallel job execution time estimation
KW  - Computational grid
Y1  - 2015
U6  - https://doi.org/10.1007/s10766-013-0294-1
SN  - 0885-7458
SN  - 1573-7640
VL  - 43
IS  - 3
SP  - 440
EP  - 454
PB  - Springer
CY  - New York
ER  - 
TY  - JOUR
A1  - Jung, Jörg
A1  - Kiertscher, Simon
A1  - Menski, Sebastian
A1  - Schnor, Bettina
T1  - Self-Adapting Load Balancing for DNS
JF  - Journal of networks
N2  - The Domain Name System belongs to the core services of the Internet infrastructure. Hence, DNS availability and performance is essential for the operation of the Internet and replication as well as load balancing are used for the root and top level name servers.
 This paper proposes an architecture for credit based server load balancing (SLB) for DNS. Compared to traditional load balancing algorithms like round robin or least connection, the benefit of credit based SLB is that the load balancer can adapt more easily to heterogeneous load requests and back end server capacities. The challenge of this approach is the definition of a suited credit metric. While this was done before for TCP based services like HTTP, the problem was not solved for UDP based services like DNS.
 In the following an approach is presented to define credits also for UDP based services. This UDP/DNS approach is implemented within the credit based SLB implementation salbnet. The presented measurements confirm the benefit of the self-adapting credit based SLB approach. In our experiments, the mean (first) response time dropped significantly compared to weighted round robin (WRR) (from over 4 ms to about 0.6 ms for dynamic pressure relieve (DPR)).
KW  - Load Balancing
KW  - Cluster Computing
KW  - Performance Evaluation
Y1  - 2015
U6  - https://doi.org/10.1109/SPECTS.2014.6879994
VL  - 10
IS  - 4
SP  - 222
EP  - 231
PB  - Kluwer Academic Publishers
CY  - Oulu
ER  - 
TY  - JOUR
A1  - Luckow, André
A1  - Schnor, Bettina
T1  - Migol : a fault-tolerant service framework for MPI applications in the grid
N2  - Especially for sciences the provision of massive parallel CPU capacity is one of the most attractive features of a grid. A major challenge in a distributed, inherently dynamic grid is fault tolerance. The more resources and components involved, the more complicated and error-prone becomes the system. In a grid with potentially thousands of machines connected to each other the reliability of individual resources cannot be guaranteed.The benefit of the grid is that in case of a failure ail application may be migrated and restarted from a checkpoint file on another site. This approach requires a service infrastructure which handles the necessary activities transparently. In this article, we present Migol, a fault-tolerant and self-healing grid middleware for MPI applications. Migol is based on open standards and extends the services of the Globus toolkit to support the fault tolerance of grid applications.Further, the Migol framework itself is designed with special focus on fault tolerance. For example, Migol eplicates ritical services and uses a ring-based replication protocol to achieve data consistency. (c) 2007 Elsevier B.V. All rights reserved.
Y1  - 2008
U6  - https://doi.org/10.1016/j.future.2007.03.007
ER  - 
TY  - JOUR
A1  - Luckow, Andre
A1  - Jha, Shantenu
A1  - Kim, Joohyun
A1  - Merzky, Andre
A1  - Schnor, Bettina
T1  - Adaptive distributed replica-exchange simulations
N2  - Owing to the loose coupling between replicas, the replica-exchange (RE) class of algorithms should be able to benefit greatly from using as many resources as available. However, the ability to effectively use multiple distributed resources to reduce the time to completion remains a challenge at many levels. Additionally, an implementation of a pleasingly distributed algorithm such as replica-exchange, which is independent of infrastructural details, does not exist. This paper proposes an extensible and scalable framework based on Simple API for Grid Applications that provides a general-purpose, opportunistic mechanism to effectively use multiple resources in an infrastructure-independent way. By analysing the requirements of the RE algorithm and the challenges of implementing it on real production systems, we propose a new abstraction (BIGJOB), which forms the basis of the adaptive redistribution and effective scheduling of replicas.
Y1  - 2009
UR  - http://rsta.royalsocietypublishing.org/
U6  - https://doi.org/10.1098/rsta.2009.0051
SN  - 1364-503X
ER  - 
TY  - JOUR
A1  - Kiertscher, Simon
A1  - Zinke, Jörg
A1  - Schnor, Bettina
T1  - CHERUB power consumption aware cluster resource management
JF  - Cluster computing : the journal of networks, software tools and applications
N2  - This paper presents an evaluation of ACPI energy saving modes, and deduces the design and implementation of an energy saving daemon for clusters called cherub. The design of the cherub daemon is modular and extensible. Since the only requirement is a central approach for resource management, cherub is suited for Server Load Balancing (SLB) clusters managed by dispatchers like Linux Virtual Server (LVS), as well as for High Performance Computing (HPC) clusters. Our experimental results show that cherub's scheduling algorithm works well, i.e. it will save energy, if possible, and avoids state-flapping.
KW  - Green computing
KW  - Cluster computing
Y1  - 2013
U6  - https://doi.org/10.1007/s10586-011-0176-5
SN  - 1386-7857
VL  - 16
IS  - 1
SP  - 55
EP  - 63
PB  - Springer
CY  - New York
ER  -