Persone Apulia Research Gate

A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution

We present a message-passing based parallel version of the Space Saving algorithm designed to solve the $k$--majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many other different contexts. We apply our algorithm to the detection of frequent items in both real and synthetic datasets whose probability distribution functions are a Hurwitz and a Zipf distribution respectively. Also, we compare its parallel performances and accuracy against a parallel algorithm recently proposed for merging summaries derived by the Space Saving or Frequent algorithms.

CUDA Based Parallel Implementations of Space-Saving on a GPU

We present four CUDA based parallel implementations of the Space-Saving algorithm for determining frequent items on a GPU. The first variant exploits the open-source CUB library to simplify the implementation of a user's defined reduction, whilst the second is based on our own implementation of the parallel reduction. The third and the fourth, built on the previous variants, are meant to improve the performance by taking advantage of hardware based atomic instructions. In particular, we implement a warp based ballot mechanism to accelerate the Space-Saving updates. We show that our implementation of the parallel reduction, coupled with the ballot based update mechanism, is the fastest, and provides extensive experimental results regarding its performance.

Fast and Accurate Mining of Correlated Heavy Hitters

The problem of mining Correlated Heavy Hitters (CHH) from a two- dimensional data stream has been introduced recently, and a deterministic algo- rithm based on the use of the Misra–Gries algorithm has been proposed by Lahiri et al. to solve it. In this paper we present a new counter-based algorithm for tracking CHHs, formally prove its error bounds and correctness and show, through exten- sive experimental results, that our algorithm outperforms the Misra–Gries based algorithm with regard to accuracy and speed whilst requiring asymptotically much less space.

Finding frequent items in parallel

We present a deterministic parallel algorithm for the k-majority problem, that can be used to find in parallel frequent items, i.e. those whose multiplicity is greater than a given threshold, and is therefore useful to process iceberg queries and in many other different contexts of applied mathematics and information theory. The algorithm can be used both in the online (stream) context and in the offline setting, the difference being that in the former case we are restricted to a single scan of the input elements, so that verifying the frequent items that have been determined is not allowed (e.g. network traffic streams passing through internet routers), while in the latter a parallel scan of the input can be used to determine the actual k-majority elements. To the best of our knowledge, this is the first parallel algorithm solving the proposed problem.

Grids, Clouds and Virtualization

This chapter introduces and puts in context Grids, Clouds, and Virtualization. Grids promised to deliver computing power on demand. However, despite a decade of active research, no viable commercial grid computing provider has emerged. On the other hand, it is widely believed—especially in the Business World—that HPC will eventually become a commodity. Just as some commercial consumers of electricity have mission requirements that necessitate they generate their own power, some consumers of computational resources will continue to need to provision their own supercomputers. Clouds are a recent business-oriented development with the potential to render this eventually as rare as organizations that generate their own electricity today, even among institutions who currently consider themselves the unassailable elite of the HPC business. Finally, Virtualization is one of the key technologies enabling many different Clouds. We begin with a brief history in order to put them in context, and recall the basic principles and concepts underlying and clearly differentiating them. A thorough overview and survey of existing technologies provides the basis to delve into details as the reader progresses through the book.

Merging Frequent Summaries

Recently, an algorithm for merging counter-based data sum- maries which are the output of the Frequent algorithm (Frequent summaries) has been proposed by Agarwal et al. In this paper, we present a new algorithm for merging Frequent summaries. Our algorithm is fast and simple to implement, and retains the same computational complexity of the algorithm presented by Agarwal et al. while providing better frequency estimation.

Message from BDSE2013 Chairs

Mining frequent items in the time fading model

We present FDCMSS, a new sketch-based algorithm for mining frequent items in data streams. The algorithm cleverly combines key ideas borrowed from forward decay, the Count-Min and the Space Saving algorithms. It works in the time fading model, mining data streams according to the cash register model. We formally prove its correctness and show, through extensive experimental results, that our algorithm outperforms λ-HCount, a recently developed algorithm, with regard to speed, space used, precision attained and error committed on both synthetic and real datasets.

On Frequency Estimation and Detection of Frequent Items in Time Faded Streams

We deal with the problem of detecting frequent items in a stream under the constraint that items are weighted, and recent items must be weighted more than older ones. This kind of problem naturally arises in a wide class of applications in which recent data is considered more useful and valuable with regard to older, stale data. The weight assigned to an item is therefore a function of its arrival timestamp. As a consequence, whilst in traditional frequent item mining applications we need to estimate frequency counts, we are instead required to estimate decayed counts. These applications are said to work in the time fading model. Two sketch-based algorithms for processing time-decayed streams have been recently published independently near the end of 2016. The FSSQ algorithm, besides a sketch, also uses an additional data structure called Quasi-Heap to maintain frequent items. FDCMSS, our algorithm, cleverly combines key ideas borrowed from forward decay, the Count-Min sketch and the Space Saving algorithm. Therefore, it makes sense to compare and contrast the two algorithms in order to fully understand their strengths and weaknesses. We show, through extensive experimental results, that FSSQ is better for detecting frequent items than for frequency estimation. The use of the Quasi-Heap data structure slows down the algorithm owing to the huge number of maintenance operations. Therefore, FSSQ may not be able to cope with high-speed data streams. FDCMSS is better suitable for frequency estimation; moreover, it is extremely fast and can be used in the context of high-speed data streams and for the detection of frequent items as well, since its recall is always greater than 99%, even when using an extremely tiny amount of space. Therefore, FDCMSS proves to be an overall good choice when considering jointly the recall, precision, average relative error and the speed.

On the Equivalence of Two Security Notions for Hierarchical Key Assignment Schemes in the Unconditional Setting

The access control problem in a hierarchy can be solved by using a hierarchical key assignment scheme, where each class is assigned an encryption key and some private information. A formal security analysis for hierarchical key assignment schemes has been traditionally considered in two different settings, i.e., the unconditionally secure and the computationally secure setting, and with respect to two different notions: security against key recovery (KR-security) and security with respect to key indistinguishability (KI-security), with the latter notion being cryptographically stronger. Recently, Freire, Paterson and Poettering proposed strong key indistinguishability (SKI-security) as a new security notion in the computationally secure setting, arguing that SKI-security is strictly stronger than KI-security in such a setting. In this paper we consider the unconditionally secure setting for hierarchical key assignment schemes. In such a setting the security of the schemes is not based on specific unproven computational assumptions, i.e., it relies on the theoretical impossibility of breaking them, despite the computational power of an adversary coalition. We prove that, in this setting, SKI-security is not stronger than KI-security, i.e., the two notions are fully equivalent from an information-theoretic point of view.

Parallel damage detection through finite frequency changes on multicore processors

This manuscript deals with a novel approach aimed at identifying multiple damaged sites in structural components through finite frequency changes. Natural frequencies, meant as a privileged set of modal data, are adopted along with a numerical model of the system. The adoption of finite changes efficiently allows challenging characteristic problems encountered in damage detection techniques such as unexpected comparison of possible shifted modes and the significance of modal data changes very often affected by experimental/environmental noise. The new procedure extends MDLAC and exploits parallel computing on modern multicore processors. Smart filters, aimed at reducing the potential damaged sites, are implemented in order to reduce the computational effort. Several use cases are presented in order to illustrate the potentiality of the new damage detection procedure.

Parallel space saving on multi- and many-core processors

Given an array A of n elements and a value 2≤k≤n, a frequent item or k-majority element is an element occurring in A more than n/k times. The k-majority problem requires finding all of the k-majority elements. In this paper, we deal with parallel shared-memory algorithms for frequent items; we present a shared-memory version of the Space Saving algorithm, and we study its behavior with regard to accuracy and performance on many and multi-core processors, including the Intel Phi accelerator. We also investigate a hybrid MPI/OpenMP version against a pure MPI-based version. Through extensive experimental results, we prove that the MPI/OpenMP parallel version of the algorithm significantly enhances the performance of the earlier pure MPI version of the same algorithm. Results also prove that for this algorithm the Intel Phi accelerator does not introduce any improvement with respect to the Xeon octa-core processor.

Peer-to-Peer Data Discovery in Health Centers

The sharing and integration of health care data such as medical history, pathology, therapy, radiology images, etc., is a key requirement for improving the patient diagnosis and in general the patient care. Today, many EPR (Electronic Patient Record) systems are present both in the same or different health centers and record a huge amount of data regarding a patient. In most cases the care treatment of a patient involves different healthcare facilities, including the cares provided by the family doctors. Managing these data, typically petabytes or terabytes in size, and optimizing the applications (image analysis, data mining, etc.) for these architectures is one of the challenges that must be tackled. Therefore, there is a clear need for the design and implementation of new scalable approaches to deal with the associated information overload and cognitive complexity issues. A possible solution involves considering a simplification of data coming from different EPRs, in a structured schema, typically called a meta-EPR. Owing to the security of patient data, each health center manages its own meta-EPR whereas a framework integrates these data among different sites. This work addresses the issue of sharing and integrating health care data, proposing a meta-EPR, based on Peer-to-peer (P2P) technology for data fusion. We describe an implementation of a distributed information service, that shares meta-EPRs and provides aggregation of relevant clinical information about patients based on a structured P2P overlay.

Preference-based Matchmaking of Grid Resources With CP-Nets

We deal with the problem of preference-based matchmaking of computational resources belonging to a grid. We introduce CP–Nets, a recent development in the field of Artificial Intelligence, as a means to deal with user’s preferences in the context of grid scheduling. We discuss CP–Nets from a theoretical perspective and then analyze, qualitatively and quantitatively, their impact on the matchmaking process, with the help of a grid simulator we developed for this purpose. Many different experiments have been setup and carried out, and we report here our main findings and the lessons learnt.

Space-efficient Verifiable Secret Sharing Using Polynomial Interpolation

Preserving data confidentiality in clouds is a key issue. Secret Sharing, a cryptographic primitive for the distribution of a secret among a group of n participants designed so that only subsets of shareholders of cardinality 0 < t ≤ n are allowed to reconstruct the secret by pooling their shares, can help mitigating and minimizing the problem. A desirable feature of Secret Sharing schemes is cheater detection, i.e. the ability to detect one or more malicious shareholders trying to reconstruct the secret by obtaining legal shares from the other shareholders while providing them with fake shares. Verifiable Secret Sharing schemes solve this problem by allowing shareholders verifying the others’ shares. We present new verification algorithms providing arbitrary secret sharing schemes with cheater detection capabilities, and prove their space efficiency with regard to other schemes appeared in the literature. We also introduce, in one of our schemes, the Exponentiating Polynomial Root Problem (EPRP), which is believed to be NP-Intermediate and therefore difficult.

Special Section: Grid and Pervasive Computing 2009

This Special Section of Future Generation Computer Systems contains selected high-quality papers from the 4th International Conference on Grid and Pervasive Computing (GPC 2009), which was held in May 2009 in Geneva, Switzerland, and its related workshops. Research problems in these papers have been analyzed systematically, and for specific approaches or models a detailed evaluation was performed to demonstrate their feasibility and advantages. The papers were selected on this basis and also peer reviewed thoroughly.

Ruolo

Organizzazione

Dipartimento

Area Scientifica

Settore Scientifico Disciplinare

Settore ERC 1° livello

Settore ERC 2° livello

Settore ERC 3° livello

Ruolo

Organizzazione

Dipartimento

Area Scientifica

Settore Scientifico Disciplinare

Settore ERC 1° livello

Settore ERC 2° livello

Settore ERC 3° livello

16 PUBBLICAZIONI

A Parallel Space Saving Algorithm For Frequent Items...

CUDA Based Parallel Implementations of Space-Saving on a...

Fast and Accurate Mining of Correlated Heavy Hitters

Finding frequent items in parallel

Grids, Clouds and Virtualization

Merging Frequent Summaries

Message from BDSE2013 Chairs

Mining frequent items in the time fading model

On Frequency Estimation and Detection of Frequent Items...

On the Equivalence of Two Security Notions for...

Parallel damage detection through finite frequency changes on...

Parallel space saving on multi- and many-core processors

Peer-to-Peer Data Discovery in Health Centers

Preference-based Matchmaking of Grid Resources With CP-Nets

Space-efficient Verifiable Secret Sharing Using Polynomial Interpolation

Special Section: Grid and Pervasive Computing 2009

0 PROGETTI

0 BREVETTI

0 SPINOFF