Refine
Has Fulltext
- no (29) (remove)
Year of publication
Document Type
- Article (17)
- Other (10)
- Part of a Book (1)
- Conference Proceeding (1)
Is part of the Bibliography
- yes (29)
Keywords
- Security (2)
- digital identity (2)
- self-sovereign identity (2)
- Anomaly detection (1)
- Attention span (1)
- Attribute aggregation (1)
- Authentication (1)
- Blockchain (1)
- Blockchains (1)
- Cloud Computing (1)
Institute
- Hasso-Plattner-Institut für Digital Engineering gGmbH (29) (remove)
Evaluating creativity of verbal responses or texts is a challenging task due to psychometric issues associated with subjective ratings and the peculiarities of textual data. We explore an approach to objectively assess the creativity of responses in a sentence generation task to 1) better understand what language-related aspects are valued by human raters and 2) further advance the developments toward automating creativity evaluations. Over the course of two prior studies, participants generated 989 four-word sentences based on a four-letter prompt with the instruction to be creative. We developed an algorithm that scores each sentence on eight different metrics including 1) general word infrequency, 2) word combination infrequency, 3) context-specific word uniqueness, 4) syntax uniqueness, 5) rhyme, 6) phonetic similarity, and similarity of 7) sequence spelling and 8) semantic meaning to the cue. The text metrics were then used to explain the averaged creativity ratings of eight human raters. We found six metrics to be significantly correlated with the human ratings, explaining a total of 16% of their variance. We conclude that the creative impression of sentences is partly driven by different aspects of novelty in word choice and syntax, as well as rhythm and sound, which are amenable to objective assessment.
ATIB
(2021)
Identity management is a principle component of securing online services. In the advancement of traditional identity management patterns, the identity provider remained a Trusted Third Party (TTP). The service provider and the user need to trust a particular identity provider for correct attributes amongst other demands. This paradigm changed with the invention of blockchain-based Self-Sovereign Identity (SSI) solutions that primarily focus on the users. SSI reduces the functional scope of the identity provider to an attribute provider while enabling attribute aggregation. Besides that, the development of new protocols, disregarding established protocols and a significantly fragmented landscape of SSI solutions pose considerable challenges for an adoption by service providers. We propose an Attribute Trust-enhancing Identity Broker (ATIB) to leverage the potential of SSI for trust-enhancing attribute aggregation. Furthermore, ATIB abstracts from a dedicated SSI solution and offers standard protocols. Therefore, it facilitates the adoption by service providers. Despite the brokered integration approach, we show that ATIB provides a high security posture. Additionally, ATIB does not compromise the ten foundational SSI principles for the users.
CloudStrike
(2020)
Most cyber-attacks and data breaches in cloud infrastructure are due to human errors and misconfiguration vulnerabilities. Cloud customer-centric tools are imperative for mitigating these issues, however existing cloud security models are largely unable to tackle these security challenges. Therefore, novel security mechanisms are imperative, we propose Risk-driven Fault Injection (RDFI) techniques to address these challenges. RDFI applies the principles of chaos engineering to cloud security and leverages feedback loops to execute, monitor, analyze and plan security fault injection campaigns, based on a knowledge-base. The knowledge-base consists of fault models designed from secure baselines, cloud security best practices and observations derived during iterative fault injection campaigns. These observations are helpful for identifying vulnerabilities while verifying the correctness of security attributes (integrity, confidentiality and availability). Furthermore, RDFI proactively supports risk analysis and security hardening efforts by sharing security information with security mechanisms. We have designed and implemented the RDFI strategies including various chaos engineering algorithms as a software tool: CloudStrike. Several evaluations have been conducted with CloudStrike against infrastructure deployed on two major public cloud infrastructure: Amazon Web Services and Google Cloud Platform. The time performance linearly increases, proportional to increasing attack rates. Also, the analysis of vulnerabilities detected via security fault injection has been used to harden the security of cloud resources to demonstrate the effectiveness of the security information provided by CloudStrike. Therefore, we opine that our approaches are suitable for overcoming contemporary cloud security issues.
Generative multi-adversarial network for striking the right balance in abdominal image segmentation
(2020)
Purpose: The identification of abnormalities that are relatively rare within otherwise normal anatomy is a major challenge for deep learning in the semantic segmentation of medical images. The small number of samples of the minority classes in the training data makes the learning of optimal classification challenging, while the more frequently occurring samples of the majority class hamper the generalization of the classification boundary between infrequently occurring target objects and classes. In this paper, we developed a novel generative multi-adversarial network, called Ensemble-GAN, for mitigating this class imbalance problem in the semantic segmentation of abdominal images. Method: The Ensemble-GAN framework is composed of a single-generator and a multi-discriminator variant for handling the class imbalance problem to provide a better generalization than existing approaches. The ensemble model aggregates the estimates of multiple models by training from different initializations and losses from various subsets of the training data. The single generator network analyzes the input image as a condition to predict a corresponding semantic segmentation image by use of feedback from the ensemble of discriminator networks. To evaluate the framework, we trained our framework on two public datasets, with different imbalance ratios and imaging modalities: the Chaos 2019 and the LiTS 2017. Result: In terms of the F1 score, the accuracies of the semantic segmentation of healthy spleen, liver, and left and right kidneys were 0.93, 0.96, 0.90 and 0.94, respectively. The overall F1 scores for simultaneous segmentation of the lesions and liver were 0.83 and 0.94, respectively. Conclusion: The proposed Ensemble-GAN framework demonstrated outstanding performance in the semantic segmentation of medical images in comparison with other approaches on popular abdominal imaging benchmarks. The Ensemble-GAN has the potential to segment abdominal images more accurately than human experts.
In cloud computing, users are able to use their own operating system (OS) image to run a virtual machine (VM) on a remote host. The virtual machine OS is started by the user using some interfaces provided by a cloud provider in public or private cloud. In peer to peer cloud, the VM is started by the host admin. After the VM is running, the user could get a remote access to the VM to install, configure, and run services. For the security reasons, the user needs to verify the integrity of the running VM, because a malicious host admin could modify the image or even replace the image with a similar image, to be able to get sensitive data from the VM. We propose an approach to verify the integrity of a running VM on a remote host, without using any specific hardware such as Trusted Platform Module (TPM). Our approach is implemented on a Linux platform where the kernel files (vmlinuz and initrd) could be replaced with new files, while the VM is running. kexec is used to reboot the VM with the new kernel files. The new kernel has secret codes that will be used to verify whether the VM was started using the new kernel files. The new kernel is used to further measuring the integrity of the running VM.
High-dimensional data is particularly useful for data analytics research. In the healthcare domain, for instance, high-dimensional data analytics has been used successfully for drug discovery. Yet, in order to adhere to privacy legislation, data analytics service providers must guarantee anonymity for data owners. In the context of high-dimensional data, ensuring privacy is challenging because increased data dimensionality must be matched by an exponential growth in the size of the data to avoid sparse datasets. Syntactically, anonymising sparse datasets with methods that rely of statistical significance, makes obtaining sound and reliable results, a challenge. As such, strong privacy is only achievable at the cost of high information loss, rendering the data unusable for data analytics. In this paper, we make two contributions to addressing this problem from both the privacy and information loss perspectives. First, we show that by identifying dependencies between attribute subsets we can eliminate privacy violating attributes from the anonymised dataset. Second, to minimise information loss, we employ a greedy search algorithm to determine and eliminate maximal partial unique attribute combinations. Thus, one only needs to find the minimal set of identifying attributes to prevent re-identification. Experiments on a health cloud based on the SAP HANA platform using a semi-synthetic medical history dataset comprised of 109 attributes, demonstrate the effectiveness of our approach.
Devices on the Internet of Things (IoT) are usually battery-powered and have limited resources. Hence, energy-efficient and lightweight protocols were designed for IoT devices, such as the popular Constrained Application Protocol (CoAP). Yet, CoAP itself does not include any defenses against denial-of-sleep attacks, which are attacks that aim at depriving victim devices of entering low-power sleep modes. For example, a denial-of-sleep attack against an IoT device that runs a CoAP server is to send plenty of CoAP messages to it, thereby forcing the IoT device to expend energy for receiving and processing these CoAP messages. All current security solutions for CoAP, namely Datagram Transport Layer Security (DTLS), IPsec, and OSCORE, fail to prevent such attacks. To fill this gap, Seitz et al. proposed a method for filtering out inauthentic and replayed CoAP messages "en-route" on 6LoWPAN border routers. In this paper, we expand on Seitz et al.'s proposal in two ways. First, we revise Seitz et al.'s software architecture so that 6LoWPAN border routers can not only check the authenticity and freshness of CoAP messages, but can also perform a wide range of further checks. Second, we propose a couple of such further checks, which, as compared to Seitz et al.'s original checks, more reliably protect IoT devices that run CoAP servers from remote denial-of-sleep attacks, as well as from remote exploits. We prototyped our solution and successfully tested its compatibility with Contiki-NG's CoAP implementation.
LoANs
(2019)
Recently, deep neural networks have achieved remarkable performance on the task of object detection and recognition. The reason for this success is mainly grounded in the availability of large scale, fully annotated datasets, but the creation of such a dataset is a complicated and costly task. In this paper, we propose a novel method for weakly supervised object detection that simplifies the process of gathering data for training an object detector. We train an ensemble of two models that work together in a student-teacher fashion. Our student (localizer) is a model that learns to localize an object, the teacher (assessor) assesses the quality of the localization and provides feedback to the student. The student uses this feedback to learn how to localize objects and is thus entirely supervised by the teacher, as we are using no labels for training the localizer. In our experiments, we show that our model is very robust to noise and reaches competitive performance compared to a state-of-the-art fully supervised approach. We also show the simplicity of creating a new dataset, based on a few videos (e.g. downloaded from YouTube) and artificially generated data.
Cloud Storage Broker (CSB) provides value-added cloud storage service for enterprise usage by leveraging multi-cloud storage architecture. However, it raises several challenges for managing resources and its access control in multiple Cloud Service Providers (CSPs) for authorized CSB stakeholders. In this paper we propose unified cloud access control model that provides the abstraction of CSP's services for centralized and automated cloud resource and access control management in multiple CSPs. Our proposal offers role-based access control for CSB stakeholders to access cloud resources by assigning necessary privileges and access control list for cloud resources and CSB stakeholders, respectively, following privilege separation concept and least privilege principle. We implement our unified model in a CSB system called CloudRAID for Business (CfB) with the evaluation result shows it provides system-and-cloud level security service for cfB and centralized resource and access control management in multiple CSPs.
Many universities record the lectures being held in their facilities to preserve knowledge and to make it available to their students and, at least for some universities and classes, to the broad public. The way with the least effort is to record the whole lecture, which in our case usually is 90 min long. This saves the labor and time of cutting and rearranging lectures scenes to provide short learning videos as known from Massive Open Online Courses (MOOCs), etc. Many lecturers fear that recording their lectures and providing them via an online platform might lead to less participation in the actual lecture. Also, many teachers fear that the lecture recordings are not used with the same focus and dedication as lectures in a lecture hall. In this work, we show that in our experience, full lectures have an average watching duration of just a few minutes and explain the reasons for that and why, in most cases, teachers do not have to worry about that.
Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. In this work, we propose an end-to-end trainable deep bidirectional LSTM (Bi-LSTM (Long Short-Term Memory)) model to address the problem. By combining a deep convolutional neural network (CNN) and two separate LSTM networks, our model is capable of learning long-term visual-language interactions by making use of history and future context information at high-level semantic space. We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent over-fitting in training deep models. To understand how our models "translate" image to sentence, we visualize and qualitatively analyze the evolution of Bi-LSTM internal states over time. The effectiveness and generality of proposed models are evaluated on four benchmark datasets: Flickr8K, Flickr30K, MSCOCO, and Pascal1K datasets. We demonstrate that Bi-LSTM models achieve highly competitive performance on both caption generation and image-sentence retrieval even without integrating an additional mechanism (e.g., object detection, attention model). Our experiments also prove that multi-task learning is beneficial to increase model generality and gain performance. We also demonstrate the performance of transfer learning of the Bi-LSTM model significantly outperforms previous methods on the Pascal1K dataset.
After almost two decades of development, modern Security Information and Event Management (SIEM) systems still face issues with normalisation of heterogeneous data sources, high number of false positive alerts and long analysis times, especially in large-scale networks with high volumes of security events. In this paper, we present our own prototype of SIEM system, which is capable of dealing with these issues. For efficient data processing, our system employs in-memory data storage (SAP HANA) and our own technologies from the previous work, such as the Object Log Format (OLF) and high-speed event normalisation. We analyse normalised data using a combination of three different approaches for security analysis: misuse detection, query-based analytics, and anomaly detection. Compared to the previous work, we have significantly improved our unsupervised anomaly detection algorithms. Most importantly, we have developed a novel hybrid outlier detection algorithm that returns ranked clusters of anomalies. It lets an operator of a SIEM system to concentrate on the several top-ranked anomalies, instead of digging through an unsorted bundle of suspicious events. We propose to use anomaly detection in a combination with signatures and queries, applied on the same data, rather than as a full replacement for misuse detection. In this case, the majority of attacks will be captured with misuse detection, whereas anomaly detection will highlight previously unknown behaviour or attacks. We also propose that only the most suspicious event clusters need to be checked by an operator, whereas other anomalies, including false positive alerts, do not need to be explicitly checked if they have a lower ranking. We have proved our concepts and algorithms on a dataset of 160 million events from a network segment of a big multinational company and suggest that our approach and methods are highly relevant for modern SIEM systems.
Creation, collection and retention of knowledge in digital communities is an activity that currently requires being explicitly targeted as a secure method of keeping intellectual capital growing in the digital era. In particular, we consider it relevant to analyze and evaluate the empathetic cognitive personalities and behaviors that individuals now have with the change from face-to-face communication (F2F) to computer-mediated communication (CMC) online. This document proposes a cyber-humanistic approach to enhance the traditional SECI knowledge management model. A cognitive perception is added to its cyclical process following design thinking interaction, exemplary for improvement of the method in which knowledge is continuously created, converted and shared. In building a cognitive-centered model, we specifically focus on the effective identification and response to cognitive stimulation of individuals, as they are the intellectual generators and multiplicators of knowledge in the online environment. Our target is to identify how geographically distributed-digital-organizations should align the individual's cognitive abilities to promote iteration and improve interaction as a reliable stimulant of collective intelligence. The new model focuses on analyzing the four different stages of knowledge processing, where individuals with sympathetic cognitive personalities can significantly boost knowledge creation in a virtual social system. For organizations, this means that multidisciplinary individuals can maximize their extensive potential, by externalizing their knowledge in the correct stage of the knowledge creation process, and by collaborating with their appropriate sympathetically cognitive remote peers.
Embedded smart home
(2017)
The popularity of MOOCs has increased considerably in the last years. A typical MOOC course consists of video content, self tests after a video and homework, which is normally in multiple choice format. After solving this homeworks for every week of a MOOC, the final exam certificate can be issued when the student has reached a sufficient score. There are also some attempts to include practical tasks, such as programming, in MOOCs for grading. Nevertheless, until now there is no known possibility to teach embedded system programming in a MOOC course where the programming can be done in a remote lab and where grading of the tasks is additionally possible. This embedded programming includes communication over GPIO pins to control LEDs and measure sensor values. We started a MOOC course called "Embedded Smart Home" as a pilot to prove the concept to teach real hardware programming in a MOOC environment under real life MOOC conditions with over 6000 students. Furthermore, also students with real hardware have the possibility to program on their own real hardware and grade their results in the MOOC course. Finally, we evaluate our approach and analyze the student acceptance of this approach to offer a course on embedded programming. We also analyze the hardware usage and working time of students solving tasks to find out if real hardware programming is an advantage and motivating achievement to support students learning success.
Selection of initial points, the number of clusters and finding proper clusters centers are still the main challenge in clustering processes. In this paper, we suggest genetic algorithm based method which searches several solution spaces simultaneously. The solution spaces are population groups consisting of elements with similar structure. Elements in a group have the same size, while elements in different groups are of different sizes. The proposed algorithm processes the population in groups of chromosomes with one gene, two genes to k genes. These genes hold corresponding information about the cluster centers. In the proposed method, the crossover and mutation operators can accept parents with different sizes; this can lead to versatility in population and information transfer among sub-populations. We implemented the proposed method and evaluated its performance against some random datasets and the Ruspini dataset as well. The experimental results show that the proposed method could effectively determine the appropriate number of clusters and recognize their centers. Overall this research implies that using heterogeneous population in the genetic algorithm can lead to better results.
The identification of vulnerabilities relies on detailed information about the target infrastructure. The gathering of the necessary information is a crucial step that requires an intensive scanning or mature expertise and knowledge about the system even though the information was already available in a different context. In this paper we propose a new method to detect vulnerabilities that reuses the existing information and eliminates the necessity of a comprehensive scan of the target system. Since our approach is able to identify vulnerabilities without the additional effort of a scan, we are able to increase the overall performance of the detection. Because of the reuse and the removal of the active testing procedures, our approach could be classified as a passive vulnerability detection. We will explain the approach and illustrate the additional possibility to increase the security awareness of users. Therefore, we applied the approach on an experimental setup and extracted security relevant information from web logs.
This paper discusses a new approach for designing and deploying Security-as-a-Service (SecaaS) applications using cloud native design patterns. Current SecaaS approaches do not efficiently handle the increasing threats to computer systems and applications. For example, requests for security assessments drastically increase after a high-risk security vulnerability is disclosed. In such scenarios, SecaaS applications are unable to dynamically scale to serve requests. A root cause of this challenge is employment of architectures not specifically fitted to cloud environments. Cloud native design patterns resolve this challenge by enabling certain properties e.g. massive scalability and resiliency via the combination of microservice patterns and cloud-focused design patterns. However adopting these patterns is a complex process, during which several security issues are introduced. In this work, we investigate these security issues, we redesign and deploy a monolithic SecaaS application using cloud native design patterns while considering appropriate, layered security counter-measures i.e. at the application and cloud networking layer. Our prototype implementation out-performs traditional, monolithic applications with an average Scanner Time of 6 minutes, without compromising security. Our approach can be employed for designing secure, scalable and performant SecaaS applications that effectively handle unexpected increase in security assessment requests.
Securing e-prescription from medical identity theft using steganography and antiphishing techniques
(2017)
Drug prescription is among the health care process that usually makes references to the patients’ medical and insurance information among other personal data, because this information is very vital and delicate, it should be adequately protected from identity thieves. This article aims at securing Electronic Prescription (EP) in order to minimize patient’s data theft and foster patients’ trust of EP system.
This paper presents a steganography and antiphishing technique for preventing medical identity theft in EP. The proposed EP system design focused on the security features in the prescriber and dispensers’ modules of EP by ensuring the prescriber sends the prescription of the patient in a safe manner and to the right dispenser without the interference of fake third parties. Hexadecimal steganography image system is used to cover and secure the
sent prescription details. Malicious electronic dispensing system is prevented through an authentication technique where a dispenser uses a captcha together with a one-time password, and the web server encrypted token for prescriber’s device authentication. The steganography system is evaluated using Peak Signal to Noise Ratio (PSNR).
The system implementation results showed that steganography
and antiphishing techniques are capable of providing a secure EP systems.
Massive Open Online Courses (MOOCs) have left their mark on the face of education during the recent years. At the Hasso Plattner Institute (HPI) in Potsdam, Germany, we are actively developing a MOOC platform, which provides our research with a plethora of e-learning topics, such as learning analytics, automated assessment, peer assessment, team-work, online proctoring, and gamification. We run several instances of this platform. On openHPI, we provide our own courses from within the HPI context. Further instances are openSAP, openWHO, and mooc.HOUSE, which is the smallest of these platforms, targeting customers with a less extensive course portfolio. In 2013, we started to work on the gamification of our platform. By now, we have implemented about two thirds of the features that we initially have evaluated as useful for our purposes. About a year ago we activated the implemented gamification features on mooc.HOUSE. Before activating the features on openHPI as well, we examined, and re-evaluated our initial considerations based on the data we collected so far and the changes in other contexts of our platforms.
Network Topology Discovery and Inventory Listing are two of the primary features of modern network monitoring systems (NMS). Current NMSs rely heavily on active scanning techniques for discovering and mapping network information. Although this approach works, it introduces some major drawbacks such as the performance impact it can exact, specially in larger network environments. As a consequence, scans are often run less frequently which can result in stale information being presented and used by the network monitoring system. Alternatively, some NMSs rely on their agents being deployed on the hosts they monitor. In this article, we present a new approach to Network Topology Discovery and Network Inventory Listing using only passive monitoring and scanning techniques. The proposed techniques rely solely on the event logs produced by the hosts and network devices present within a network. Finally, we discuss some of the advantages and disadvantages of our approach.
Multiperiod robust optimization for proactive resource provisioning in virtualized data centers
(2014)
Intrusion Detection Systems are widely deployed in computer networks. As modern attacks are getting more sophisticated and the number of sensors and network nodes grow, the problem of false positives and alert analysis becomes more difficult to solve. Alert correlation was proposed to analyse alerts and to decrease false positives. Knowledge about the target system or environment is usually necessary for efficient alert correlation. For representing the environment information as well as potential exploits, the existing vulnerabilities and their Attack Graph (AG) is used. It is useful for networks to generate an AG and to organize certain vulnerabilities in a reasonable way. In this article, a correlation algorithm based on AGs is designed that is capable of detecting multiple attack scenarios for forensic analysis. It can be parameterized to adjust the robustness and accuracy. A formal model of the algorithm is presented and an implementation is tested to analyse the different parameters on a real set of alerts from a local network. To improve the speed of the algorithm, a multi-core version is proposed and a HMM-supported version can be used to further improve the quality. The parallel implementation is tested on a multi-core correlation platform, using CPUs and GPUs.
Intrusion Detection Systems (IDS) have been widely deployed in practice for detecting malicious behavior on network communication and hosts. False-positive alerts are a popular problem for most IDS approaches. The solution to address this problem is to enhance the detection process by correlation and clustering of alerts. To meet the practical requirements, this process needs to be finished fast, which is a challenging task as the amount of alerts in large-scale IDS deployments is significantly high. We identifytextitdata storage and processing algorithms to be the most important factors influencing the performance of clustering and correlation. We propose and implement a highly efficient alert correlation platform. For storage, a column-based database, an In-Memory alert storage, and memory-based index tables lead to significant improvements of the performance. For processing, algorithms are designed and implemented which are optimized for In-Memory databases, e.g. an attack graph-based correlation algorithm. The platform can be distributed over multiple processing units to share memory and processing power. A standardized interface is designed to provide a unified view of result reports for end users. The efficiency of the platform is tested by practical experiments with several alert storage approaches, multiple algorithms, as well as a local and a distributed deployment.
This paper presents the state of the art in the development of Semantic-Web-enabled software using object-oriented programming languages. Object triple mapping (OTM) is a frequently used method to simplify the development of such software. A case study that is based on interviews with developers of OTM frameworks is presented at the core of this paper. Following the results of the case study, the formalization of OTM is kept separate from optional but desirable extensions of OTM with regard to metadata, schema matching, and integration into the Semantic-Web infrastructure. The material that is presented is expected to not only explain the development of Semantic-Web software by the usage of OTM, but also explain what properties of Semantic-Web software made developers come up with OTM. Understanding the latter will be essential to get nonexpert software developers to use Semantic-Web technologies in their software.
Spam has posed a serious problem for users of email since its infancy. Today, automated strategies are required to deal with the massive amount of spam traffic. IPv4 networks offer a variety of solutions to reduce spam, but IPv6 networks' large address space and use of temporary addresses - both of which are particularly vulnerable to spam attacks - makes dealing with spam and the use of automated approaches much more difficult. IPv6 thus poses a unique security issue for ISPs because it's more difficult for them to differentiate between good IP addresses and those that are known to originate spam messages.
In this article, we discuss the notions of experts and expertise in resource discovery in the context of collaborative tagging systems. We propose that the level of expertise of a user with respect to a particular topic is mainly determined by two factors. First, an expert should possess a high-quality collection of resources, while the quality of a Web resource in turn depends on the expertise of the users who have assigned tags to it, forming a mutual reinforcement relationship. Second, an expert should be one who tends to identify interesting or useful resources before other users discover them, thus bringing these resources to the attention of the community of users. We propose a graph-based algorithm, SPEAR (spamming-resistant expertise analysis and ranking), which implements the above ideas for ranking users in a folksonomy. Our experiments show that our assumptions on expertise in resource discovery, and SPEAR as an implementation of these ideas, allow us to promote experts and demote spammers at the same time, with performance significantly better than the original hypertext-induced topic search algorithm and simple statistical measures currently used in most collaborative tagging systems.