Effettua una ricerca
Cosimo Distante
Ruolo
III livello - Ricercatore
Organizzazione
Consiglio Nazionale delle Ricerche
Dipartimento
Non Disponibile
Area Scientifica
AREA 09 - Ingegneria industriale e dell'informazione
Settore Scientifico Disciplinare
ING-INF/03 - Telecomunicazioni
Settore ERC 1° livello
PE - PHYSICAL SCIENCES AND ENGINEERING
Settore ERC 2° livello
PE6 Computer Science and Informatics: Informatics and information systems, computer science, scientific computing, intelligent systems
Settore ERC 3° livello
PE6_8 Computer graphics, computer vision, multi media, computer games
This paper presents a new method to automatically locate pupils in images (even with low-resolution) containing human faces. In particular pupils are localized by a two steps procedure: at first self-similarity information is extracted by considering the appearance variability of local regions and then they are combined with an estimator of circular shapes based on a modified version of the Circular Hough Transform. Experimental evidence of the effectiveness of the method was achieved on challenging databases containing facial images acquired under different lighting conditions and with different scales and poses.
Face indexing is a very popular research topic and it has been investigated over the last 10 years. It can be used for a wide range of applications such as automatic video content analysis, data mining, video annotation and labeling, etc. In this work a fully automated framework that can detect how many people are present in a generic video (even having low resolution and/or taken from a mobile camera) is presented. It also extracts the intervals of frames in which each person appears. The main contributions of the proposed work are that no initializations neither a priory knowledge about the scene contents are required. Moreover, this approach introduces a generalized version of the k-means method that, through different statistical indices, automatically determines the number of people in the scene. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.
Reliability of diagnostic hysteroscopy is dependent on operator expertise and the kind of pathology (low for endometrial hyperplasia). Atypical area are characterized by vascular and structural irregularities as neo-angiogenesis is the first detectable sign of neoplastic process. We developed in cooperation with INO (National Institute of Optics) an original software which delineates vascular network in order to improve the diagnostic accuracy of hysteroscopy in case of hyperplastic/neoplastic lesions.
In recent years, "FragTrack" has become one of the most cited real time algorithms for visual tracking of an object in a video sequence. However, this algorithm fails when the object model is not present in the image or it is completely occluded, and in long term video sequences. In these sequences, the target object appearance is considerably modified during the time and its comparison with the template established at the first frame is hard to compute. In this work we introduce improvements to the original FragTrack: the management of total object occlusions and the update of the object template. Basically, we use a voting map generated by a non-parametric kernel density estimation strategy that allows us to compute a probability distribution for the distances of the histograms between template and object patches. In order to automatically determine whether the target object is present or not in the current frame, an adaptive threshold is introduced. A Bayesian classifier establishes, frame by frame, the presence of template object in the current frame. The template is partially updated at every frame. We tested the algorithm on well-known benchmark sequences, in which the object is always present, and on video sequences showing total occlusion of the target object to demonstrate the effectiveness of the proposed method.
Soft biometric systems have spread among recent years, both for powering classical biometrics, as well as stand alone solutions with several application scopes ranging from digital signage to human-robot interaction. Among all, in the recent years emerged the possibility to consider as a soft biometrics also the temporal evolution of the human gaze and some recent works in the literature explored this exciting research line by using expensive and (perhaps) unsafe devices which require user cooperation to be calibrated. This work is instead the first attempt to perform biometric identification of individuals on the basis of data acquired by a low-cost, non-invasive, safe and calibration-free gaze estimation framework consisting of two main components conveniently combined and performing user's head pose estimation and eyes' pupil localization on data acquired by a RGB-D device. The experimental evidence of the feasibility of using the proposed framework as soft-biometrics is given on a set of users watching three benchmark heterogeneous videos in an unconstrained environment.
Gender recognition is a topic of high interest especially in the growing field of audience measurement techniques for digital signage applications. Usually, supervised approaches are employed and they require a preliminary training phase performed on large datasets of annotated facial images that are expensive (e.g. MORPH) and, anyhow, they cannot be updated to keep track of the continuous mutation of persons' appearance due to changes of fashions and styles (e.g. hairstyles or makeup). The use of small-sized (and then updatable in a easier way) datasets is thus high desirable but, unfortunately, when few examples are used for training, the gender recognition performances dramatically decrease since the state-of-art classifiers are unable to handle, in a reliable way, the inherent data uncertainty by explicitly modeling encountered distortions. To face this drawback, in this work an innovative classification scheme for gender recognition has been introduced: its core is the Minimax approach, i.e. a smart classification framework that, including a number of existing regularized regression models, allows a robust classification even when few examples are used for training. This has been experimentally proved by comparing the proposed classification scheme with state of the art classifiers (SVM, kNN and Random Forests) under various pre-processing methods.
Face indexing is a very popular research topic and it has been investigated over the last 10 years. It can be used for a wide range of applications such as automatic video content analysis, data mining, video annotation and labeling, etc. In this work a statistical approach to address this challenging issue is presented: the number of persons that are present in a generic video (even having low resolution and/or taken from a mobile camera) is automatically detected and also the intervals of frames in which each person appears are extracted. The main contributions of the proposed work are that no initializations neither a priory knowledge about the scene contents are required. Moreover, this approach introduces a generalized version of the k-means method that, through different statistical indices, automatically determines the number of people in the scene.
This paper presents a detailed study about different algorithmic configurations for estimating soft biometric traits. In particular, a recently introduced common framework is the starting point of the study: it includes an initial facial detection, the subsequent facial traits description, the data reduction step, and the final classification step. The algorithmic configurations are featured by different descriptors and different strategies to build the training dataset and to scale the data in input to the classifier. Experimental proofs have been carried out on both publicly available datasets and image sequences specifically acquired in order to evaluate the performance even under real-world conditions, i.e., in the presence of scaling and rotation.
We propose an algorithm for the automatic estimation of the in-focus image and the recovery of the correct reconstruction distance for digital holograms. We tested the proposed approach applying it to stretched digital holograms. In fact, by stretching an hologram with a variable elongation parameter, it is possible to change the in-focus distance of the reconstructed image. In this way, the reliability of proposed algorithm can be verified at different distances dispensing with the recording of different holograms. Experimental results are shown with the aim to demonstrate the usefulness of the proposed method and a comparative analysis has been performed with respect to other algorithms developed for digital holography.
Searching and recovering the correct reconstruction distance in digital holography can be a cumbersome and subjective procedure. Here we show an algorithm for the automatically estimating the in-focus image and recovering the correct reconstruction distance for speckle holograms. We have tested the approach in determining the reconstruction distances of stretched digital holograms. Stretching a hologram with a variable elongation parameter gives us the possibility to change the in-focus distance of the reconstructed image. In this way, the proposed algorithm can be verified at different distances by dispensing the recording of different holograms. Experimental results are shown with the aim to demonstrate the usefulness of the proposed method and a comparative analysis has been performed with respect to other existing algorithms developed for digital holography.
In this paper, an artificial olfactory system (Electronic Nose) that mimics the biological olfactory system is introduced. The device consists of a Large-Scale Chemical Sensor Array (16; 384 sensors, made of 24 different kinds of conducting polymer materials) that supplies data to software modules, which perform advanced data processing. In particular, the paper concentrates on the software components consisting, at first, of acrucial step that normalizes the heterogeneous sensor data and reduces their inherent noise. Cleaned data are then supplied as input to a data reduction procedure that extracts the most informative and discriminant directions in order to get an efficient representation in a lowerdimensional space where it is possible to more easily find a robust mapping between the observed outputs and the characteristics of the odors in input to the device. Experimental qualitative proofs of the validity of the procedure are given by analyzing data acquired for two different pure analytes and their binary mixtures. Moreover, a classification task is performed in order to explore the possibility of automatically recognizing pure compoundsand to predict binary mixture concentrations.
This paper investigates the possibility of accurately detecting and tracking human gaze by using an unconstrained and noninvasive approach based on the head pose information extracted by an RGB-D device. The main advantages of the proposed solution are that it can operate in a totally unconstrained environment, it does not require any initial calibration and it can work in real-time. These features make it suitable for being used to assist human in everyday life (e.g., remote device control) or in specific actions (e.g., rehabilitation), and in general in all those applications where it is not possible to ask for user cooperation (e.g., when users with neurological impairments are involved). To evaluate gaze estimation accuracy, the proposed approach has been largely tested and results are then compared with the leading methods in the state of the art, which, in general, make use of strong constraints on the people movements, invasive/additional hardware and supervised pattern recognition modules. Experimental tests demonstrated that, in most cases, the errors in gaze estimation are comparable to the state of the art methods, although it works without additional constraints, calibration and supervised learning.
Automatic Facial Expression Recognition is a topic of high interest especially due to the growing diffusion of assistive computing applications, as Human Robot Interaction, where a robust awareness of the people emotion is a key point. This paper proposes a novel automatic pipeline for facial expression recognition based on the analysis of the gradients distribution, on a single image, in order to characterize the face deformation in different expressions. Firstly, an accurate investigation of optimal HOG parameters has been done. Successively, a wide experimental session has been performed demonstrating the higher detection rate with respect to other State-of-the-Art methods. Moreover, an on-line testing session has been added in order to prove the robustness of our approach in real environments.
In this paper, a new technique to reduce the noise in a reconstructed hologram image is proposed. Unlike all the techniques in the literature, the proposed approach not only takes into account spatial information but also temporal statistics associated with the pixels. This innovative solution enables, at first, the automatic detection of the areas of the image containing the objects (foreground). This way, all the pixels not belonging to any objects are directly cleaned up and the contrast between objects and background is consistently increased. The remaining pixels are then processed with a spatio-temporal filtering which cancels out the effects of speckle noise, while preserving the structural details of the objects. The proposed approach has been compared with other common speckle denoising techniques and it is found to give better both visual and quantitative results.
The searching and recovering of the correct reconstruction distance in digital holography (DH) can be a cumbersome and subjective procedure. Here we report on an algorithm for automatically estimating the in-focus image and recovering the correct reconstruction distance for speckle holograms. We have tested the approach in determining the reconstruction distances of stretched digital holograms. Stretching a hologram with a variable elongation parameter makes it possible to change the in-focus distance of the reconstructed image. In this way, the proposed algorithm can be verified at different distances by dispensing the recording of different holograms. Experimental results are shown with the aim of demonstrating the usefulness of the proposed method, and a comparative analysis has been performed with respect to other existing algorithms developed for DH.
Joint attention is an early-developing social-communicative skill in which two people (usually a young child and an adult) share attention with regards to an interesting object or event, by means of gestures and gaze, and its presence is a key element in evaluating the therapy in the case of autism spectrum disorders. In this work, a novel automatic system able to detect joint attention by using completely non-intrusive depth camera installed on the room ceiling is presented. In particular, in a scenario where a humanoid-robot, a therapist (or a parent) and a child are interacting, the system can detect the social interaction between them. Specifically, a depth camera mounted on the top of a room is employed to detect, first of all, the arising event to be monitored (performed by an humanoid robot) and, subsequently, to detect the eventual joint attention mechanism analyzing the orientation of the head. The system operates in real-time, providing to the therapist a completely non-intrusive instrument to help him to evaluate the quality and the precise modalities of this predominant feature during the therapy session.
Ball recognition in soccer matches is a critical issue for automatic soccer video analysis. Unfortunately, because of the difficulty in solving the problem, many efforts of numerous researchers have still not produced fully satisfactory results in terms of accuracy. This paper proposes a ball recognition approach that introduces a double level of innovation. Firstly, a randomized circle detection approach based on the local curvature information of the isophotes is used to identify the edge pixels belonging to the ball boundaries. Then, ball candidates are validated by a learning framework formulated into a three-layered model based on a variation of the conventional local binary pattern approach. Experimental results were obtained on a significant set of real soccer images, acquired under challenging lighting conditions during Italian "Serie A" matches. The results have been also favorably compared with the leading state-of-the-art methods.
This paper focuses on the ball detection algorithm that analyzes candidate ball regions to detect the ball. Unfortunately, in the time of goal, the goal-posts (and sometimes also some players) partially occlude the ball or alter its appearance (due to their shadows cast on it). This often makes ineffective the traditional pattern recognition approaches and it forces the system to make the decision about the event based on estimates and not on the basis of the real ball position measurements. To overcome this drawback, this work compares different descriptors of the ball appearance, in particular it investigates on both different well known feature extraction approaches and the recent local descriptors BRISK in a soccer match context. This paper analyzes critical situations in which the ball is heavily occluded in order to measure robustness, accuracy and detection performances. The effectiveness of BRISK compared with other local descriptors is validated by a huge number of experiments on heavily occluded ball examples acquired under realistic conditions
This paper presents a new method to automatically locate pupils in images (even with low-resolution) containing human faces. In particular pupils are localized by a two steps procedure: at first self-similarity information is extracted by considering the appearance variability of local regions and then they are combined with an estimator of circular shapes based on a modified version of the Circular Hough Transform. Experimental evidence of the effectiveness of the method was achieved on challenging databases containing facial images acquired under different lighting conditions and with different scales and poses.
We show that a three-dimensional (3D) scene can be coded by joining different objects by combining multiple optically recorded color digital holograms. Recently, an adaptive scheme based on affine transformations, able to correct defocus aberrations of digital holograms, was demonstrated, that we name here Adaptive Transformation in Digital Holography (ATDH). We propose an effective framework to create dynamic color holographic 3D scene by using a generalization of such scheme aided by a speckle reduction method, called Multilevel Bi-dimensional Empirical Mode Decomposition (MBEMD), used for the first time in color holography. We also demonstrate its feasibility to the synthesis of multiple Color Computer Generated Holograms (CCGHs).
We propose a complete framework for the synthesis of 3D holographic scene, combining multiple color holograms of different objects by applying adaptive transformations. In particular, it has been demonstrated that affine transformation of digital holograms can be employed to defocus and chromatic aberrations. By combining these two features we are able to synthesize a color scene where multiple objects are jointly multiplexed. Since holograms transformation could be introduce artifacts in the holographic reconstructions, principally related to the presence of speckle noise, we also implement a denoising step where the Bi-dimensional Empirical Mode Decomposition (BEMD) algorithm is employed. We test the proposed framework in two different scenario, i.e. by coding color three-dimensional scenes and joining different objects that are (i) experimentally recorded and (ii) obtained as color computer generated holograms (CCGH).
Automatic facial expression recognition (FER) is a topic of growing interest mainly due to the rapid spread of assistive technology applications, as human-robot interaction, where a robust emotional awareness is a key point to best accomplish the assistive task. This paper proposes a comprehensive study on the application of histogram of oriented gradients (HOG) descriptor in the FER problem, highlighting as this powerful technique could be effectively exploited for this purpose. In particular, this paper highlights that a proper set of the HOG parameters can make this descriptor one of the most suitable to characterize facial expression peculiarities. A large experimental session, that can be divided into three different phases, was carried out exploiting a consolidated algorithmic pipeline. The first experimental phase was aimed at proving the suitability of the HOG descriptor to characterize facial expression traits and, to do this, a successful comparison with most commonly used FER frameworks was carried out. In the second experimental phase, different publicly available facial datasets were used to test the system on images acquired in different conditions (e.g. image resolution, lighting conditions, etc.). As a final phase, a test on continuous data streams was carried out on-line in order to validate the system in real-world operating conditions that simulated a real-time human-machine interaction.
Iris segmentation is driven by three different quality factors: accuracy, usability and speed. Unfortunately the deeply analysis of the literature shows that the greatest efforts of the researchers mainly focus on accuracy and speed. Proposed solutions, in fact, do not meet the usability requirement since they are based on specific optimizations related to the operating context and they impose binding conditions on the sensors to be used for the acquisition of periocular images. This paper tries to fill this gap by introducing an innovative iris segmentation technique that can be used in unconstrained environments, under non-ideal imaging conditions and, above all, that does not require any interaction for adaptation to different operating conditions. Experimental results, carried out on challenging databases, demonstrate that the high usability of the proposed solution does not penalize segmentation accuracy which, in some respects, outperforms that of the leading approaches in the literature.
An investigation is reported of the identification and measurement of region of interest (ROI) in quantitative phase-contrast maps of biological cells by digital holographic microscopy. In particular, two different methods have been developed for in vitro bull sperm head morphometry analysis. We show that semen analysis can be accomplished by means of the proposed techniques. Extraction and measurement of various parameters are performed. It is demonstrated that both proposed methods are efficient to skim the data set in a preselective analysis for discarding anomalous data. (C) 2011 Optical Society of America
Automatic facial expression recognition is one of the most interesting problem as it impacts on important applications in human-computer interaction area. Many applications in this field require real-time performance but not all the approach are suitable to satisfy this requirement. Geometrical features are usually the most light in terms of computational load but sometimes they exploits a huge number of features and do not cover all the possible geometrical aspect. In order to face up this problem, we propose an automatic pipeline for facial expression recognition that exploits a new set of 32 geometric facial features from a single face side covering a wide set of geometrical peculiarities. As a results, the proposed approach showed a facial expression recognition accuracy of 95,46% with a six-class expression set and an accuracy of 94,24% with a seven-class expression set.
The paper presents a new automatic technique for speckle reduction in the context of digital holography. Speckle noise is a superposition of unwanted spots over objects of interest, due to the behavior of a coherence source of radiation with the object surface characteristics. In the proposed denoising method, bidimensional empirical mode decomposition is used to decompose the image signal, which is then filtered through the Frost filter. The proposed technique was preliminarily tested on the "Lena" image for quality assessment in terms of peak signal-to-noise ratio. Then, its denoising capability was assessed on different holographic images on which also the comparison (using both blind metrics and visual inspection) with the leading strategies in the state of the art was favorably performed. (C) 2014 Society of Photo-Optical Instrumentation Engineers (SPIE)
Assistive technology is a generic system that is used to increase, help or improve the functional capabilities of people with disability. Recently, its employment has generated innovative solutions also in the field of Autism Spectrum Disorder (ASD), where it is extremely challenging to obtain feedback or to extract meaningful data. In this work, a study about the possibility to understand the visual exploration in children with ASD is presented. In order to obtain an automatic evaluation, an algorithm for free gaze estimation is employed. The proposed gaze estimation method can work without constrains nor using additional hardware, IR light sources or other intrusive methods. Furthermore, no initial calibration is required. These relaxations of the constraints makes the technique particularly suitable to be used in the critical context of autism, where the child is certainly not inclined to employ invasive devices. In particular, the technique is used in a scenario where a closet containing specific toys, that are neatly disposed from the therapist, is opened to the child. After a brief environment exploration, the child will freely choose the desired toy that will be subsequently used during therapy. The video acquisition have been accomplished by a Microsoft Kinect sensor hidden into the closet in order to obtain both RGB and depth images, that can be processed by the estimation algorithm, therefore computing gaze tracking by intersection with data coming from the well-known initial disposition of toys. The system has been tested with children with ASD, allowing to understand their choices and preferences, letting to optimize the toy disposition for cognitive-behavioural therapy.
This paper addresses the issue of plane detection in 3 dimensional (3D) range images. The identification of planar structures is a crucial task in many visual-aided autonomous robotic applications. The proposed method consists in implementing, in cascade, two algorithms: Random Sample and Consensus (RANSAC) and the more recent Least Entropy-like Estimator (LEL), a nonlinear prediction error estimator that minimizes a cost function inspired by the definition of Gibbs entropy. LEL estimators allow to improve RANSAC performances while maintaining its robustness; kernel density estimation is used to classify data into inliers and outliers. The method has been experimentally applied to 3D images acquired by a Time-Of- Flight camera and compared with a stand alone RANSAC solution. The proposed solution does not require an accurate estimation of the noise variance or outlier scale. This is of fundamental practical importance as the outlier scale, while severely inuencing standard RANSAC, is usually unknown a priori and hard to estimate. © 2013 IFAC.
Circle detection is a critical issue in image analysis and object detection. Although Hough transform based solvers are largely used, randomized approaches, based on the iterative sampling of the edge pixels,are object of research in order to provide solutions less computationally expensive. This work presents a randomized iterative work-flow, which exploits geometrical properties of isophotes in the image to select the most meaningful edge pixels and to classify them in subsets of equal isophote curvature. The analysis of candidate circles is then performed with a kernel density estimation based voting strategy, followed by a refinement algorithm based on linear error compensation. The method has been applied to a set of real images on which it has also been compared with two leading state of the art approaches and Hough transform based solutions. The achieved results show how, discarding u pto 57% of unnecessary edge pixels, it is able to accurately detect circles with in a limited number of iterations, maintaining a sub-pixel accuracy even in the presence of high level of noise.
The paper proposes a robust estimation method which implements, in cascade, two algorithms: (i) a Random Sample and Consensus (RANSAC) algorithm and (ii) a novel nonlinear prediction error estimator minimizing a cost function inspired by the mathematical definition of Gibbs entropy. The minimization of the nonlinear cost function allows to refine the Consensus Set found with standard RANSAC in order to reach optimal estimates of geometric transformation parameters under image stitching context. The method has been experimentally tested and compared with a standard RANSAC-MSAC algorithm where noticeable improvements are recorded in terms of computational complexity and quality of the stitching process, namely of the mean squared symmetric re-projection error.
A novel plane estimation algorithm from 3D range data is presented. The proposed solution is based on the minimization of a nonlinear prediction error cost function inspired by the mathematical definition of Gibbs' entropy. The method has been experimentally tested and compared with a standard implementation of the RANSAC algorithm. Results suggest that the proposed approach has the potential of performing better in terms of precision and reliability while requiring a lower computational effort.
Circle detection is a critical issue in image analysis: it is undoubtedly a fundamental step in different application contexts, among them one of the most challenging is the detection of the ball in soccer game. Hough Transform based circle detector are largely used but there is a large open research area that attempt to provide more effective and less computationally expensive solutions based on randomized approaches, i.e. based on iterative sampling of the edge pixels. To this end, this work presents an ad-hoc randomized iterative work-flow, which exploits geometrical properties of isophotes, the curvature, to identify edge pixels belonging to the ball boundaries; this allow to consider a large amount of edge pixels, but limiting most of the time-consuming computation only on a restricted subset given by pixels with an high probability to lie on a circular structure. The method, coupled with a background suppression algorithm, has been applied to a set of real images acquired by fixed camera providing performances higher than a standard circular Hough transform solver, with a detection rate > 86%.
Soft biometric systems have spread among recent years, both for powering classical biometrics, as well as stand alone solutions with several application scopes ranging from digital signage to human-robot interaction. Among all, in the recent years emerged the possibility to consider as a soft biometrics also the temporal evolution of the human gaze and some recent works in the literature explored this exciting research line by using expensive and (perhaps) unsafe devices which, moreover, require user cooperation to be calibrated. By our knowledge the use of a low-cost, non-invasive, safe and calibration-free gaze estimator to get soft-biometrics data has not been investigated yet. This paper fills this gap by analyzing the soft-biometrics performances obtained by modeling the series of gaze estimated by exploiting the combination of head poses and eyes' pupil locations on data acquired by an off-the-shelf RGB-D device.
In thiswork, a real-time system able to automatically recognize soft-biometric traits is introduced and used to improve the capability of a humanoid robot to interact with humans. In particular the proposed system is able to estimate gender and age of humans in images acquired from the embedded camera of the robot. This knowledge allows the robot to properly react with customized behaviors related to the gender/age of the interacting individuals. The system is able to handle multiple persons in the same acquired image, recognizing the age and gender of each person in the robot's field of view. These features make the robot particularly suitable to be used in socially assistive applications.
A new method to automatically locate pupils in images (even with low resolution) containing near-frontal human faces is presented. In particular, pupils are localized by an unsupervised procedure consisting of two steps: at first, self-similarity information is extracted by considering the appearance variability of local regions, and then it is combined with an estimator of circular shapes based on a modified version of the circular Hough transform. Experimental evidence of the effectiveness of the method was achieved on challenging databases and video sequences containing facial images acquired under different lighting conditions and with different scales and poses. (C) 2013 SPIE and IS&T
The automatic detection and tracking of human eyes and, in particular, the precise localization of their centers (pupils), is a widely debated topic in the international scientific community. In fact, the extracted information can be effectively used in a large number of applications ranging from advanced interfaces to biometrics and including also the estimation of the gaze direction, the control of human attention and the early screening of neurological pathologies. Independently of the application domain, the detection and tracking of the eye centers are, currently, performed mainly using invasive devices. Cheaper and more versatile systems have been only recently introduced: they make use of image processing techniques working on periocular patches which can be specifically acquired or preliminarily cropped from facial images. In the latter cases the involved algorithms must work even in cases of non-ideal acquiring conditions (e.g in presence of noise, low spatial resolution, non-uniform lighting conditions, etc.) and without user's awareness (thus with possible variations of the eye in scale, rotation and/or translation). Getting satisfying results in pupils' localization in such a challenging operating conditions is still an open scientific topic in Computer Vision. Actually, the most performing solutions in the literature are, unfortunately, based on supervised machine learning algorithms which require initial sessions to set the working parameters and to train the embedded learning models of the eye: this way, experienced operators have to work on the system each time it is moved from an operational context to another. It follows that the use of unsupervised approaches is more and more desirable but, unfortunately, their performances are not still satisfactory and more investigations are required. To this end, this paper proposes a new unsupervised approach to automatically detect the center of the eye: its algorithmic core is a representation of the eye's shape that is obtained through a differential analysis of image intensities and the subsequent combination with the local variability of the appearance represented by self-similarity coefficients. The experimental evidence of the effectiveness of the method was demonstrated on challenging databases containing facial images. Moreover, its capabilities to accurately detect the centers of the eyes were also favourably compared with those of the leading state-of-the-art methods.
This work introduces biometrics as a way to improve human-robot interaction. In particular, gender and age estimation algorithms are used to provide awareness of the user biometrics to a humanoid robot (Aldebaran NAO), in order to properly react with a specific gender/age behavior. The system can also manage multiple persons at the same time, recognizing the age and gender of each participant. All the estimation algorithms employed have been validated through a k-fold test and successively practically tested in a real human-robot interaction environment, allowing for a better natural interaction. Our system is able to work at a frame rate of 13 fps with 640×480 images taken from NAO's embedded camera. The proposed application is well-suited for all assisted environments that consider the presence of a socially assistive robot like therapy with disable people, dementia, post-stroke rehabilitation, Alzheimer disease or autism.
This paper proposes an advanced technology framework that, through the use of UAVs, allows to monitor archaeological sites. In particular this paper focuses on the development of computer vision techniques such as super-resolution and mosaicking aiming at extracting detailed and panoramic views of the sites. In particular, super-resolution aims at providing imagery solutions (from aerial and remote sensing platforms) that create higher resolution views that make visible some details that are not perceivable in the acquired images. Mosaicking aims, instead, at creating a unique large still image from the sequence of video frames contained in a motion imagery clip. In this way large areas can be observed and a global analysis of their temporal changes can be performed. In general super-resolution and mosaicking can be exploited both for touristic or surveillance purposes. In particular they can be efficiently used to allow the enjoyment of the cultural heritage through a fascinating visual experience also eventually containing augmented information but also for surveillance tasks that can help to detect or prevent illegal activities.
This paper presents a robust visual tracking algorithm based on dense local descriptors. These local invariant representations with a robust object/context nearest neighbor classifier, permits to build a very powerful visual tracker. The performances are very promising even in very long video sequences. © 2015 OSA.
Condividi questo sito sui social