Effettua una ricerca
Pier Luigi Mazzeo
Ruolo
III livello - Ricercatore
Organizzazione
Consiglio Nazionale delle Ricerche
Dipartimento
Non Disponibile
Area Scientifica
AREA 09 - Ingegneria industriale e dell'informazione
Settore Scientifico Disciplinare
ING-INF/03 - Telecomunicazioni
Settore ERC 1° livello
PE - PHYSICAL SCIENCES AND ENGINEERING
Settore ERC 2° livello
PE6 Computer Science and Informatics: Informatics and information systems, computer science, scientific computing, intelligent systems
Settore ERC 3° livello
PE6_7 Artificial intelligence, intelligent systems, multi agent systems
It has been proved that Autism Spectrum Disorders (ASD) are associated with amplified emotional responses and poor emotional control. Underlying mechanisms and characteristics of these difficulties in using, sharing and responding to emotions are still not understood. Recent non-invasive technological frameworks based on computer vision can be applied to overcome this knowledge gap and this paper is right aimed at demonstrating how facial measurements from images can be exploited to compare how ASD children react to external stimuli with respect a control set of children.
Distributed networks of sensors have been recognized to be a powerful tool for developing fully automated systems that monitor environments and human activities. Nevertheless, problems such as active control of heterogeneous sensors for high-level scene interpretation and mission execution are open. This paper presents the authors' ongoing research about design and implementation of a distributed heterogeneous sensor network that includes static cameras and multi-sensor mobile robots. The system is intended to provide robot-assisted monitoring and surveillance of large environments. The proposed solution exploits a distributed control architecture to enable the network to autonomously accomplish general-purpose and complex monitoring tasks. The nodes can both act with some degree of autonomy and cooperate with each other. The paper describes the concepts underlying the designed system architecture and presents the results obtained working on its components, including some simulations performed in a realistic scenario to validate the distributed target tracking algorithm.
In recent years, "FragTrack" has become one of the most cited real time algorithms for visual tracking of an object in a video sequence. However, this algorithm fails when the object model is not present in the image or it is completely occluded, and in long term video sequences. In these sequences, the target object appearance is considerably modified during the time and its comparison with the template established at the first frame is hard to compute. In this work we introduce improvements to the original FragTrack: the management of total object occlusions and the update of the object template. Basically, we use a voting map generated by a non-parametric kernel density estimation strategy that allows us to compute a probability distribution for the distances of the histograms between template and object patches. In order to automatically determine whether the target object is present or not in the current frame, an adaptive threshold is introduced. A Bayesian classifier establishes, frame by frame, the presence of template object in the current frame. The template is partially updated at every frame. We tested the algorithm on well-known benchmark sequences, in which the object is always present, and on video sequences showing total occlusion of the target object to demonstrate the effectiveness of the proposed method.
Automatic sport team discrimination, that is the correct assignment of each player to the relative team, is a fundamental step in high level sport video sequences analysis applications. In this work we propose a novel set of features based on a variation of classic color histograms called Positional Histograms: these features try to overcome the main drawbacks of classic histograms, first of all the weakness of any kind of relation between spectral and spatial contents of the image. The basic idea is to extract histograms as a function of the position of points in the image, with the goal of maintaining a relationship between the color distribution and the position: this is necessary because often the actors in a play field dress in a similar way, with just a different distribution of the same col-ors across the silhouettes. Further, different unsupervised classifiers and different feature sets are jointly evaluated with the goal of investigate toward the feasibility of unsupervised techniques in sport video analysis.
In this work a first attempt to undertake the difficult challenge of embedding a technological level into a standardized protocol for Autism spectrum disorders (ASD) diagnose and assessment is introduced. In particular the Autism Diagnostic Observation Schedule (ADOS-2) is taken under consideration and a technological framework is introduced to compute, in an objective and automatic way, the evaluation scores for some of the involved tasks. The proposed technological framework makes use of a hidden RGB-D device for scene acquisition. Acquired data then feed a cascade of algorithmic steps by which people and objects are detected and temporally tracked and then extracted information is exploited by fitting a spatial and temporal model described by means of an ontology-based approach. The ontology metadata are finally processed to find a mapping between them and the behavioral tasks described in the protocol.
Unmanned aerial vehicles (UAVs) are an active research field since several years. They can be applied in a large variety of different scenarios, and supply a test bed to investigate several unsolved problems such as path planning, control and navigation. Furthermore, with the availability of low cost, robust and small video cameras, UAV video has been one of the fastest growing data sources in the last couple of years. In other words, object detection and tracking as well as visual navigation has recently received a lot of attention. This paper proposes an advanced technology framework that, through the use of UAVs, allows to supervise a specific sensible area (i.e. traffic monitoring, dangerous zone and so on). In particular, one of the most cited real-rime visual tracker proposed in the literature, Struck, is applied on video sequences tipically supplied by UAVs equipped with amonocular camera. Furthermore in this paper is investigated on the feasibility to graft different features characterization into the original tracking architecture (replacing the orginal ones). The used feature extraction methods are based on Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG). Objects to be tracked could be selected manually or by means of advanced detection technique based, for example, on change detection or template matching strategies. The experimental results on well known benchmark sequences show as these features replacing improve the overall performances of the original considered real-time visual tracker.
In the last decade, soccer video analysis has received a lot of attention from the scientific community. This increasing interest is motivated by the possible applications over a wide spectrum of topics: indexing, summarization, video enhancement, team and players statistics, tactics analysis, referee support, etc. The application of computer vision methodologies in the soccer context requires many problems to be faced: ball and players have to be detected in the images in any light and weather condition, they have to be localized in the field, tracked over time and finally their interactions have to be detected and analyzed. The latter task is fundamental, especially for statistic and referee decision support purposes, but, unfortunately, it has not received adequate attention from the scientific community and a lot of research remains to be done. In this paper a multicamera system is presented to detect the ball player interactions during soccer matches. The proposed method extracts, by triangulation from multiple cameras, the 3D ball and player trajectories and, by estimating the trajectory intersections, detects the ball-player interactions. An inference process is then introduced to determine the player kicking the ball and to estimate the interaction frame. The system was tested during several matches of the Italian first division football championship and experimental results demonstrated that the proposed method is robust and accurate.
Automatic sport video analysis has became one of the most attractive research fields in the areas of computer vision and multimedia technologies. In particular, there has been a boom in soccer video analysis research. This paper presents a new multi-step algorithm to automatically detect the soccer ball in image sequences acquired from static cameras. In each image, candidate ball regions are selected by analyzing edge circularity and then ball patterns are extracted representing locally affine invariant regions around distinctive points which have been highlighted automatically. The effectiveness of the proposed methodologies is demonstrated through a huge number of experiments using real balls under challenging conditions, as well as a favorable comparison with some of the leading approaches from the literature.
Mobility and multi-functionality have been recognized as being basic requirements for the development of fully automated surveillance systems in realistic scenarios. Nevertheless, problems such as active control of heterogeneous mobile agents, integration of information from fixed and moving sensors for high-level scene interpretation, and mission execution are open. This paper describes recent and current research of the authors concerning the design and implementation of a multi-agent surveillance system, using static cameras and mobile robots. The proposed solution takes advantage of a distributed control architecture that allows the agents to autonomously handle general-purpose tasks, as well as more complex surveillance issues. The various agents can either take decisions and act with some degree of autonomy, or cooperate with each other. This paper presents an overview of the system architecture and of the algorithms involved in developing such an autonomous, multi-agent surveillance system.
Deep Learning has becoming a popular and effective way to address a large set of issues. In particular, in computer vision, it has been exploited to get satisfying recognition performance in unconstrained conditions. However, this wild race towards even better performance in extreme conditions has overshadowed an important step i.e. the assessment of the impact of this new methodology on traditional issues on which for years the researchers had worked. This is particularly true for biometrics applications where the evaluation of deep learning has been made directly on newest large and more challencing datasets. This lead to a pure data driven evaluation that makes difficult to analyze the relationships between network configurations, learning process and experienced outcomes. This paper tries to partially fill this gap by applying a DNN for gender recognition on the MORPH dataset and evaluating how a lower cardinality of examples used for learning can bias the recognition performance.
Joint attention is an early-developing social-communicative skill in which two people (usually a young child and an adult) share attention with regards to an interesting object or event, by means of gestures and gaze, and its presence is a key element in evaluating the therapy in the case of autism spectrum disorders. In this work, a novel automatic system able to detect joint attention by using completely non-intrusive depth camera installed on the room ceiling is presented. In particular, in a scenario where a humanoid-robot, a therapist (or a parent) and a child are interacting, the system can detect the social interaction between them. Specifically, a depth camera mounted on the top of a room is employed to detect, first of all, the arising event to be monitored (performed by an humanoid robot) and, subsequently, to detect the eventual joint attention mechanism analyzing the orientation of the head. The system operates in real-time, providing to the therapist a completely non-intrusive instrument to help him to evaluate the quality and the precise modalities of this predominant feature during the therapy session.
In this paper a system for automatic visual monitoring of welding process, in dry stainless steel kegs for food storage, is proposed. In the considered manufacturing process the upper and lower skirts are welded to the vessel by means of Tungsten Inert Gas (TIG) welding. During the process several problems can arise: 1) residuals on the bottom 2) darker weld 3) excessive/poor penetration and 4) outgrowths. The proposed system deals with all the four aforementioned problems and its inspection performances have been evaluated by using a large set of kegs demonstrating both the reliability in terms of defect detection and the suitability to be introduced in the manufacturing system in terms of computational costs.
Ball recognition in soccer matches is a critical issue for automatic soccer video analysis. Unfortunately, because of the difficulty in solving the problem, many efforts of numerous researchers have still not produced fully satisfactory results in terms of accuracy. This paper proposes a ball recognition approach that introduces a double level of innovation. Firstly, a randomized circle detection approach based on the local curvature information of the isophotes is used to identify the edge pixels belonging to the ball boundaries. Then, ball candidates are validated by a learning framework formulated into a three-layered model based on a variation of the conventional local binary pattern approach. Experimental results were obtained on a significant set of real soccer images, acquired under challenging lighting conditions during Italian "Serie A" matches. The results have been also favorably compared with the leading state-of-the-art methods.
This paper focuses on the ball detection algorithm that analyzes candidate ball regions to detect the ball. Unfortunately, in the time of goal, the goal-posts (and sometimes also some players) partially occlude the ball or alter its appearance (due to their shadows cast on it). This often makes ineffective the traditional pattern recognition approaches and it forces the system to make the decision about the event based on estimates and not on the basis of the real ball position measurements. To overcome this drawback, this work compares different descriptors of the ball appearance, in particular it investigates on both different well known feature extraction approaches and the recent local descriptors BRISK in a soccer match context. This paper analyzes critical situations in which the ball is heavily occluded in order to measure robustness, accuracy and detection performances. The effectiveness of BRISK compared with other local descriptors is validated by a huge number of experiments on heavily occluded ball examples acquired under realistic conditions
In this paper, a computational approach is proposed and put into practice to assess the capability of children having had diagnosed Autism Spectrum Disorders (ASD) to produce facial expressions. The proposed approach is based on computer vision components working on sequence of images acquired by an off-the-shelf camera in unconstrained conditions. Action unit intensities are estimated by analyzing local appearance and then both temporal and geometrical relationships, learned by Convolutional Neural Networks, are exploited to regularize gathered estimates. To cope with stereotyped movements and to highlight even subtle voluntary movements of facial muscles, a personalized and contextual statistical modeling of non-emotional face is formulated and used as a reference. Experimental results demonstrate how the proposed pipeline can improve the analysis of facial expressions produced by ASD children. A comparison of system's outputs with the evaluations performed by psychologists, on the same group of ASD children, makes evident how the performed quantitative analysis of children's abilities helps to go beyond the traditional qualitative ASD assessment/diagnosis protocols, whose outcomes are affected by human limitations in observing and understanding multi-cues behaviors such as facial expressions.
Context analysis is a research field that is attracting growing interest in recent years, especially due to the encouraging results carried out by the semantic-based approach. Anyway, semantic strategies entail the use of trackers capable to show robustness to long-term occlusions, viewpoint changes and identity swap that represent the main problem of many tracking-by-detection solutions. This paper proposes a robust tracking-by-detection framework based on dense SIFT descriptors in combination with an ad-hoc target appearance model update able to overtake the discussed issues. The obtained performances show how our tracker competes with state-of-the-art results and manages occlusions, clutter, changes of scale, rotation and appearance, better than competing tracking methods.
Estimation of demographic information from video sequence with people is a topic of growing interest in the last years. Indeed automatic estimation of audience statistics in digital signage as well as the human interaction in social robotic environment needs of increasingly robust algorithm for gender, race and age classification. In the present paper some of the state of the art features descriptors and sub space reduction approaches for gender, race and age group classification in video/image input are analyzed. Moreover a wide discussion about the influence of dataset distribution, balancing and cardinality is shown. The aim of our work is to investigate the best solution for each classification problem both in terms of estimation approach and dataset training. Additionally the computational problem it considered and discussed in order to contextualize the topic in a practical environment.
Moving object detection is a crucial step in many application contexts such as people detection, action recognition, and visual surveillance for safety and security. The recent advance in depth camera technology has suggested the possibility to exploit a multi-sensor information (color and depth) in order to achieve better results in video segmentation. In this paper, we present a technique that combines depth and color image information and demonstrate its effectiveness through experiments performed on real image sequences recorded by means of a stereo camera.
In order to perform automatic analysis of sport videos ac-quired from a multi-sensing environment, it is fundamental to face theproblem of automatic football team discrimination. A correct assignmentof each player to the relative team is a preliminary task that togetherwith player detection and tracking algorithms can strongly a®ect anyhigh level semantic analysis. Supervised approaches for object classi¯-cation, require the construction of ad hoc models before the processingand also a manual selection of di®erent player patches belonging to theteam classes. The idea of this paper is to collect the players patches com-ing from six di®erent cameras, and after a pre-processing step based onCBTF (Cumulative Brightness Transfer Function) studying and compar-ing di®erent unsupervised method for classi¯cation. The pre-processingstep based on CBTF has been implemented in order to mitigate di®er-ence in appearance between images acquired by di®erent cameras. Wetested three di®erent unsupervised classi¯cation algorithms (MBSAS - asequential clustering algorithm; BCLS - a competitive one; and k-means- a hard-clustering algorithm) on the transformed patches. Results ob-tained by comparing di®erent set of features with di®erent classi¯ers areproposed. Experimental results have been carried out on di®erent realmatches of the Italian Serie A.1
The detection of moving objects is a crucial step in many application contexts such as people detection, action recognition, and visual surveillance for safety and security. The recent advance in depth camera technology has suggested the possibility to exploit a multi-sensor information (color and depth) in order to achieve better results in video segmentation. In this paper, we present a technique that combines depth and color image information and demonstrate its effectiveness through experiments performed on real image sequences recorded by means of a stereo camera.
ASD diagnose and assessment make use of medical protocol validated by the scientific community that is still reluctant to new protocols introducing invasive technologies, as robots or wearable devices, whose influence on the theraphy has not been deeply investigated. This work attempts to undertake the difficult challenge of embedding a technological level into the standardized ASD protocol known as Autism Diagnostic Observation Schedule (ADOS-2). An intelligent video system is introduced to compute, in an objective and automatic way, the evaluation scores for some of the tasks involved in the protocol. It make use of a hidden RGB-D device for scene acquisition the data of which feed a cascade of algorithmic steps by which people and objects are detected and temporally tracked and then extracted information is exploited by fitting a spatial and temporal model described by means of an ontology approach. The ontology metadata are finally processed to find a mapping between them and the behavioral tasks described in the protocol.
Convolutional Neural Networks (CNNs) attracted growing interest in recent years thanks to their high generalization capabilities that are highly recommended especially for applications working in the wild context. However CNNs rely on a huge number of parameters that must be set during training sessions based on very large datasets in order to avoid over-fitting issues. As a consequence the lack in training data is one of the greatest limits for the applicability of deep networks. Another problem is represented by the fixed scale of the filter in the first convolutional layer that limits the analysis performed through the subsequent layers of the network.
In the last decades the development of very high speed trains in railway transportation requires new maintenance strategies. New trolleys equipped with innovative measuring systems have been employed for monitoring overhead lines (catenaries). Using this system gives two great advantages: i) the diagnose can be performed with a low level of breaking in railway traffic; ii) the monitoring can be executed at the same speed of ordinary locomotives in order to point out the stress suffered by mechanical components of the train and the railroad structure. In this paper we present a vision system for monitoring of the catenary staggering. We propose a new method which is able to measure the position of the overhead line by a stereovision system. All these sensors are installed on a innovative maintenance trolley. Experimental results in real context are presented.
In this paper, a real case study on a Goal Line Monitoringsystem is presented. The core of the paper is a re-fined ball detection algorithm that analyzes candidate ballregions to detect the ball. A decision making approach, bymeans of camera calibration, decides about the goal eventoccurrence. Differently from other similar approaches, theproposed one provides, as unquestionable proof, the imagesequence that records the goal event under consideration.Moreover, it is non-invasive: it does not require any changein the typical football devices (ball, goal posts, and so on).Extensive experiments were performed on both real matchesacquired during the Italian Serie A championship, and specificevaluation tests by means of an artificial impact walland a shooting machine for shot simulation. The encouragingexperimental results confirmed that the system couldhelp humans in ambiguous goal line event detection.
In the last years, smart surveillance has been one of the most active research topics in computervision because of the wide spectrum of promising applications. Its main point is about the use of automatic videoanalysis technologies for surveillance purposes. In general, a processing framework for smart surveillanceconsists of a preliminary motion detection step in combination with high-level reasoning that allows automaticunderstanding of evolutions of observed scenes. In this paper, we propose a surveillance framework based on aset of reliable visual algorithms that perform different tasks: a motion analysis approach that segmentsforeground regions is followed by three procedures, which perform object tracking, homographic transformationsand edge matching, in order to achieve the real-time monitoring of forbidden areas and the detection ofabandoned or removed objects. Several experiments have been performed on different real image sequencesacquired from a Messapic museum (indoor context) and the nearby archaeological site (outdoor context) todemonstrate the effectiveness and the flexibility of the proposed approach.
In thiswork, a real-time system able to automatically recognize soft-biometric traits is introduced and used to improve the capability of a humanoid robot to interact with humans. In particular the proposed system is able to estimate gender and age of humans in images acquired from the embedded camera of the robot. This knowledge allows the robot to properly react with customized behaviors related to the gender/age of the interacting individuals. The system is able to handle multiple persons in the same acquired image, recognizing the age and gender of each person in the robot's field of view. These features make the robot particularly suitable to be used in socially assistive applications.
This paper presents a robust visual tracking algorithm based on dense local descriptors. These local invariant representations with a robust object/context nearest neighbor classifier, permits to build a very powerful visual tracker. The performances are very promising even in very long video sequences. © 2015 OSA.
The present invention relates to a visual inspection system and method for the maintenance of infrastructures, in particular railway infrastructures. It is a system able to operate in real time, wholly automatically, for the automatic detection of the presence/absence of characterizing members of the infrastructure itself, for example the coupling locks fastening the rails to the sleepers.
Condividi questo sito sui social