Search | arXiv e-print repository

doi 10.1109/DS-RT55542.2022.9932080

Multilevel Modeling as a Methodology for the Simulation of Human Mobility

Authors: Luca Serena, Moreno Marzolla, Gabriele D'Angelo, Stefano Ferretti

Abstract: Multilevel modeling is increasingly relevant in the context of modelling and simulation since it leads to several potential benefits, such as software reuse and integration, the split of semantically separated levels into sub-models, the possibility to employ different levels of detail, and the potential for parallel execution. The coupling that inevitably exists between the sub-models, however, i… ▽ More Multilevel modeling is increasingly relevant in the context of modelling and simulation since it leads to several potential benefits, such as software reuse and integration, the split of semantically separated levels into sub-models, the possibility to employ different levels of detail, and the potential for parallel execution. The coupling that inevitably exists between the sub-models, however, implies the need for maintaining consistency between the various components, more so when different simulation paradigms are employed (e.g., sequential vs parallel, discrete vs continuous). In this paper we argue that multilevel modelling is well suited for the simulation of human mobility, since it naturally leads to the decomposition of the model into two layers, the "micro" and "macro" layer, where individual entities (micro) and long-range interactions (macro) are described. In this paper we investigate the challenges of multilevel modeling, and describe some preliminary results using prototype implementations of multilayer simulators in the context of epidemic diffusion and vehicle pollution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

MSC Class: 68U99 ACM Class: I.6.5

Journal ref: proc. 2022 IEEE/ACM 26th International Symposium on Distributed Simulation and Real-Time Applications (DS-RT'22), Alès, France, September 26-28, 2022, pp. 49-56

arXiv:2403.16713 [pdf, other]

doi 10.1109/DS-RT58998.2023.00015

Design Patterns for Multilevel Modeling and Simulation

Authors: Luca Serena, Moreno Marzolla, Gabriele D'Angelo, Stefano Ferretti

Abstract: Multilevel modeling and simulation (M&S) is becoming increasingly relevant due to the benefits that this methodology offers. Multilevel models allow users to describe a system at multiple levels of detail. From one side, this can make better use of computational resources, since the more detailed and time-consuming models can be executed only when/where required. From the other side, multilevel mo… ▽ More Multilevel modeling and simulation (M&S) is becoming increasingly relevant due to the benefits that this methodology offers. Multilevel models allow users to describe a system at multiple levels of detail. From one side, this can make better use of computational resources, since the more detailed and time-consuming models can be executed only when/where required. From the other side, multilevel models can be assembled from existing components, cutting down development and verification/validation time. A downside of multilevel M&S is that the development process becomes more complex due to some recurrent issues caused by the very nature of multilevel models: how to make sub-models interoperate, how to orchestrate execution, how state variables are to be updated when changing scale, and so on. In this paper, we address some of these issues by presenting a set of design patterns that provide a systematic approach for designing and implementing multilevel models. The proposed design patterns cover multiple aspects, including how to represent different levels of detail, how to combine incompatible models, how to exchange data across models, and so on. Some of the patterns are derived from the general software engineering literature, while others are specific to the multilevel M&S application area. △ Less

Submitted 25 March, 2024; originally announced March 2024.

MSC Class: 68 ACM Class: D.2.10; I.6.5

Journal ref: proc. 2023 IEEE/ACM 27th International Symposium on Distributed Simulation and Real-Time Applications (DS-RT 23), Singapore, October 4-5, 2023, pp 48-55

arXiv:2209.04220 [pdf, ps, other]

A Software Package for Queueing Networks and Markov Chains analysis

Authors: Moreno Marzolla

Abstract: Queueing networks and Markov chains are widely used for conducting performance and reliability studies. In this paper we describe the queueing package, a free software package for queueing networks and Markov chain analysis for GNU Octave. The queueing package provides implementations of numerical algorithms for computing transient and steady-state performance measures of discrete and continuous M… ▽ More Queueing networks and Markov chains are widely used for conducting performance and reliability studies. In this paper we describe the queueing package, a free software package for queueing networks and Markov chain analysis for GNU Octave. The queueing package provides implementations of numerical algorithms for computing transient and steady-state performance measures of discrete and continuous Markov chains, and for steady-state analysis of single-station queueing systems and queueing networks. We illustrate the design principles of the queueing package, describe its most salient features and provide some usage examples. △ Less

Submitted 9 September, 2022; originally announced September 2022.

MSC Class: 58-04 ACM Class: B.8.2

arXiv:2107.11949 [pdf, other]

Dissecting FLOPs along input dimensions for GreenAI cost estimations

Authors: Andrea Asperti, Davide Evangelista, Moreno Marzolla

Abstract: The term GreenAI refers to a novel approach to Deep Learning, that is more aware of the ecological impact and the computational efficiency of its methods. The promoters of GreenAI suggested the use of Floating Point Operations (FLOPs) as a measure of the computational cost of Neural Networks; however, that measure does not correlate well with the energy consumption of hardware equipped with massiv… ▽ More The term GreenAI refers to a novel approach to Deep Learning, that is more aware of the ecological impact and the computational efficiency of its methods. The promoters of GreenAI suggested the use of Floating Point Operations (FLOPs) as a measure of the computational cost of Neural Networks; however, that measure does not correlate well with the energy consumption of hardware equipped with massively parallel processing units like GPUs or TPUs. In this article, we propose a simple refinement of the formula used to compute floating point operations for convolutional layers, called αあるふぁ-FLOPs, explaining and correcting the traditional discrepancy with respect to different layers, and closer to reality. The notion of αあるふぁ-FLOPs relies on the crucial insight that, in case of inputs with multiple dimensions, there is no reason to believe that the speedup offered by parallelism will be uniform along all different axes. △ Less

Submitted 26 July, 2021; originally announced July 2021.

Comments: Article accepted at the 7th International Conference on Machine Learning, Optimization, and Data Science. October 4-8, 2021, Grasmere, Lake District, UK

MSC Class: 68T07 ACM Class: I.2

arXiv:2107.03133 [pdf, other]

A Heuristic for Direct Product Graph Decomposition

Authors: Luca Calderoni, Luciano Margara, Moreno Marzolla

Abstract: In this paper we describe a heuristic for decomposing a directed graph into factors according to the direct product (also known as Kronecker, cardinal or tensor product). Given a directed, unweighted graph~$G$ with adjacency matrix Adj($G$), our heuristic searches for a pair of graphs~$G_1$ and~$G_2$ such that $G = G_1 \otimes G_2$, where $G_1 \otimes G_2$ is the direct product of~$G_1$ and~$G_2$.… ▽ More In this paper we describe a heuristic for decomposing a directed graph into factors according to the direct product (also known as Kronecker, cardinal or tensor product). Given a directed, unweighted graph~$G$ with adjacency matrix Adj($G$), our heuristic searches for a pair of graphs~$G_1$ and~$G_2$ such that $G = G_1 \otimes G_2$, where $G_1 \otimes G_2$ is the direct product of~$G_1$ and~$G_2$. For undirected, connected graphs it has been shown that graph decomposition is "at least as difficult" as graph isomorphism; therefore, polynomial-time algorithms for decomposing a general directed graph into factors are unlikely to exist. Although graph factorization is a problem that has been extensively investigated, the heuristic proposed in this paper represents -- to the best of our knowledge -- the first computational approach for general directed, unweighted graphs. We have implemented our algorithm using the MATLAB environment; we report on a set of experiments that show that the proposed heuristic solves reasonably-sized instances in a few seconds on general-purpose hardware. △ Less

Submitted 7 July, 2021; originally announced July 2021.

MSC Class: 05C70 ACM Class: F.2.2; F.2.1

arXiv:2003.01591 [pdf, ps, other]

Direct Product Primality Testing of Graphs is GI-hard

Authors: Luca Calderoni, Luciano Margara, Moreno Marzolla

Abstract: We investigate the computational complexity of the graph primality testing problem with respect to the direct product (also known as Kronecker, cardinal or tensor product). In [1] Imrich proves that both primality testing and a unique prime factorization can be determined in polynomial time for (finite) connected and nonbipartite graphs. The author states as an open problem how results on the dire… ▽ More We investigate the computational complexity of the graph primality testing problem with respect to the direct product (also known as Kronecker, cardinal or tensor product). In [1] Imrich proves that both primality testing and a unique prime factorization can be determined in polynomial time for (finite) connected and nonbipartite graphs. The author states as an open problem how results on the direct product of nonbipartite, connected graphs extend to bipartite connected graphs and to disconnected ones. In this paper we partially answer this question by proving that the graph isomorphism problem is polynomial-time many-one reducible to the graph compositeness testing problem (the complement of the graph primality testing problem). As a consequence of this result, we prove that the graph isomorphism problem is polynomial-time Turing reducible to the primality testing problem. Our results show that connectedness plays a crucial role in determining the computational complexity of the graph primality testing problem. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:1911.03456 [pdf, other]

doi 10.1145/3369759

Parallel Data Distribution Management on Shared-Memory Multiprocessors

Authors: Moreno Marzolla, Gabriele D'Angelo

Abstract: The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -- a standard framework for interoperability among simulators -- includes a Data Distribution Management (DDM) service whose responsibility is to report all intersect… ▽ More The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -- a standard framework for interoperability among simulators -- includes a Data Distribution Management (DDM) service whose responsibility is to report all intersections between a set of subscription and update regions. The algorithms at the core of the DDM service are CPU-intensive, and could greatly benefit from the large computing power of modern multi-core processors. In this paper we propose two parallel solutions to the DDM problem that can operate effectively on shared-memory multiprocessors. The first solution is based on a data structure (the Interval Tree) that allows concurrent computation of intersections between subscription and update regions. The second solution is based on a novel parallel extension of the Sort Based Matching algorithm, whose sequential version is considered among the most efficient solutions to the DDM problem. Extensive experimental evaluation of the proposed algorithms confirm their effectiveness on taking advantage of multiple execution units in a shared-memory architecture. △ Less

Submitted 26 February, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1703.06680

Journal ref: ACM Transactions on Modeling and Computer Simulation (TOMACS), Vol. 30, No. 1, Article 5. ACM, February 2020. ISSN: 1049-3301

arXiv:1907.07009 [pdf, other]

doi 10.1145/3344948.3344966

Gender Balance in Computer Science and Engineering in Italian Universities

Authors: Moreno Marzolla, Raffaela Mirandola

Abstract: Multiple studies have shown that gender balance in the fields of Science, Technology, Engineering and Maths -- and in particular in ICT -- is still far to be achieved. Several initiatives have been recently taken to increase the women participation, but it is difficult, at present, to evaluate their impact and their potential of changing the situation. This paper contributes to the discussion by p… ▽ More Multiple studies have shown that gender balance in the fields of Science, Technology, Engineering and Maths -- and in particular in ICT -- is still far to be achieved. Several initiatives have been recently taken to increase the women participation, but it is difficult, at present, to evaluate their impact and their potential of changing the situation. This paper contributes to the discussion by presenting a descriptive analysis of the gender balance in Computer Science and Computer Engineering in Italian Universities. △ Less

Submitted 16 July, 2019; originally announced July 2019.

MSC Class: 97P70 ACM Class: K.4.2; K.3.2

Journal ref: Proceedings of the 13th European Conference on Software Architecture - Volume 2 Pages 82-87, Paris, France, September 09-13, 2019

arXiv:1810.00596 [pdf, ps, other]

doi 10.1016/j.simpat.2018.09.012

Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

Authors: Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla

Abstract: This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore proces… ▽ More This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units. △ Less

Submitted 26 March, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1606.07310

Journal ref: Simulation Modelling Practice and Theory, Elsevier, vol. 93 (May 2019)

arXiv:1808.02231 [pdf, other]

doi 10.1109/DISTRA.2018.8600922

Anonymity and Confidentiality in Secure Distributed Simulation

Authors: Antonio Magnani, Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla

Abstract: Research on data confidentiality, integrity and availability is gaining momentum in the ICT community, due to the intrinsically insecure nature of the Internet. While many distributed systems and services are now based on secure communication protocols to avoid eavesdropping and protect confidentiality, the techniques usually employed in distributed simulations do not consider these issues at all.… ▽ More Research on data confidentiality, integrity and availability is gaining momentum in the ICT community, due to the intrinsically insecure nature of the Internet. While many distributed systems and services are now based on secure communication protocols to avoid eavesdropping and protect confidentiality, the techniques usually employed in distributed simulations do not consider these issues at all. This is probably due to the fact that many real-world simulators rely on monolithic, offline approaches and therefore the issues above do not apply. However, the complexity of the systems to be simulated, and the rise of distributed and cloud based simulation, now impose the adoption of secure simulation architectures. This paper presents a solution to ensure both anonymity and confidentiality in distributed simulations. A performance evaluation based on an anonymized distributed simulator is used for quantifying the performance penalty for being anonymous. The obtained results show that this is a viable solution. △ Less

Submitted 16 January, 2019; v1 submitted 7 August, 2018; originally announced August 2018.

Comments: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2018)

arXiv:1806.04544 [pdf, other]

doi 10.1145/3211933.3211950

A Blockchain-based Flight Data Recorder for Cloud Accountability

Authors: Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla

Abstract: Many companies rely on Cloud infrastructures for their computation, communication and data storage requirements. While Cloud services provide some benefits, e.g., replacing high upfront costs for an IT infrastructure with a pay-as-you-go model, they also introduce serious concerns that are notoriously difficult to address. In essence, Cloud customers are storing data and running computations on in… ▽ More Many companies rely on Cloud infrastructures for their computation, communication and data storage requirements. While Cloud services provide some benefits, e.g., replacing high upfront costs for an IT infrastructure with a pay-as-you-go model, they also introduce serious concerns that are notoriously difficult to address. In essence, Cloud customers are storing data and running computations on infrastructures that they can not control directly. Therefore, when problems arise -- violations of Service Level Agreements, data corruption, data leakage, security breaches -- both customers and Cloud providers face the challenge of agreeing on which party is to be held responsible. In this paper, we review the challenges and requirements for enforcing accountability in Cloud infrastructures, and argue that smart contracts and blockchain technologies might provide a key contribution towards accountable Clouds. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: 1st Workshop on Cryptocurrencies and Blockchains for Distributed Systems (CryBlock 2018)

arXiv:1804.07981 [pdf, ps, other]

doi 10.1007/978-3-319-99813-8_46

Parallel Implementations of Cellular Automata for Traffic Models

Authors: Moreno Marzolla

Abstract: The Biham-Middleton-Levine (BML) traffic model is a simple two-dimensional, discrete Cellular Automaton (CA) that has been used to study self-organization and phase transitions arising in traffic flows. From the computational point of view, the BML model exhibits the usual features of discrete CA, where the state of the automaton are updated according to simple rules that depend on the state of ea… ▽ More The Biham-Middleton-Levine (BML) traffic model is a simple two-dimensional, discrete Cellular Automaton (CA) that has been used to study self-organization and phase transitions arising in traffic flows. From the computational point of view, the BML model exhibits the usual features of discrete CA, where the state of the automaton are updated according to simple rules that depend on the state of each cell and its neighbors. In this paper we study the impact of various optimizations for speeding up CA computations by using the BML model as a case study. In particular, we describe and analyze the impact of several parallel implementations that rely on CPU features, such as multiple cores or SIMD instructions, and on GPUs. Experimental evaluation provides quantitative measures of the payoff of each technique in terms of speedup with respect to a plain serial implementation. Our findings show that the performance gap between CPU and GPU implementations of the BML traffic model can be reduced by clever exploitation of all CPU features. △ Less

Submitted 21 April, 2018; originally announced April 2018.

MSC Class: 68W10 ACM Class: D.1.3; D.3.3; J.2

Journal ref: In: Mauri G., El Yacoubi S., Dennunzio A., Nishinari K., Manzoni L. (eds) Cellular Automata. ACRI 2018. Lecture Notes in Computer Science, vol 11115. Springer

arXiv:1710.02282 [pdf, ps, other]

doi 10.1109/DISTRA.2017.8167672

The Quest for Scalability and Accuracy in the Simulation of the Internet of Things: an Approach based on Multi-Level Simulation

Authors: Stefano Ferretti, Gabriele D'Angelo, Vittorio Ghini, Moreno Marzolla

Abstract: This paper presents a methodology for simulating the Internet of Things (IoT) using multi-level simulation models. With respect to conventional simulators, this approach allows us to tune the level of detail of different parts of the model without compromising the scalability of the simulation. As a use case, we have developed a two-level simulator to study the deployment of smart services over ru… ▽ More This paper presents a methodology for simulating the Internet of Things (IoT) using multi-level simulation models. With respect to conventional simulators, this approach allows us to tune the level of detail of different parts of the model without compromising the scalability of the simulation. As a use case, we have developed a two-level simulator to study the deployment of smart services over rural territories. The higher level is base on a coarse grained, agent-based adaptive parallel and distributed simulator. When needed, this simulator spawns OMNeT++ model instances to evaluate in more detail the issues concerned with wireless communications in restricted areas of the simulated world. The performance evaluation confirms the viability of multi-level simulations for IoT environments. △ Less

Submitted 7 August, 2018; v1 submitted 6 October, 2017; originally announced October 2017.

Comments: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2017)

arXiv:1703.06680 [pdf, ps, other]

doi 10.1109/DISTRA.2017.8167660

Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors

Authors: Moreno Marzolla, Gabriele D'Angelo

Abstract: In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service hav… ▽ More In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. These are serious limitations since multicore processors are now ubiquitous, and DDM algorithms -- being CPU-intensive -- could benefit from additional computing power. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. Sort-Based Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize due to data dependencies. We describe the algorithm and compute its asymptotic running time; we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor. △ Less

Submitted 7 August, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

Comments: Proceedings of the 21-th ACM/IEEE International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2017). Best Paper Award @DS-RT 2017

arXiv:1606.07310 [pdf, ps, other]

doi 10.1109/DS-RT.2016.11

Fault-Tolerant Adaptive Parallel and Distributed Simulation

Authors: Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla, Lorenzo Armaroli

Abstract: Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take adv… ▽ More Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ARTÌS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units. △ Less

Submitted 29 December, 2016; v1 submitted 23 June, 2016; originally announced June 2016.

Comments: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2016)

arXiv:1509.00773 [pdf, other]

doi 10.1007/s00607-015-0480-7

A Big Data Analyzer for Large Trace Logs

Authors: Alkida Balliu, Dennis Olivetti, Ozalp Babaoglu, Moreno Marzolla, Alina Sîrbu

Abstract: Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devic… ▽ More Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents BiDAl, a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center. △ Less

Submitted 2 September, 2015; originally announced September 2015.

Comments: 26 pages, 10 figures

Journal ref: Computing, 98(12), Dec 2016, pp. 1225-1249

arXiv:1507.04720 [pdf, other]

doi 10.1016/j.joi.2016.01.009

Assessing evaluation procedures for individual researchers: the case of the Italian National Scientific Qualification

Authors: Moreno Marzolla

Abstract: The Italian National Scientific Qualification (ASN) was introduced as a prerequisite for applying for tenured associate or full professor positions at state-recognized universities. The ASN is meant to attest that an individual has reached a suitable level of scientific maturity to apply for professorship positions. A five member panel, appointed for each scientific discipline, is in charge of eva… ▽ More The Italian National Scientific Qualification (ASN) was introduced as a prerequisite for applying for tenured associate or full professor positions at state-recognized universities. The ASN is meant to attest that an individual has reached a suitable level of scientific maturity to apply for professorship positions. A five member panel, appointed for each scientific discipline, is in charge of evaluating applicants by means of quantitative indicators of impact and productivity, and through an assessment of their research profile. Many concerns were raised on the appropriateness of the evaluation criteria, and in particular on the use of bibliometrics for the evaluation of individual researchers. Additional concerns were related to the perceived poor quality of the final evaluation reports. In this paper we assess the ASN in terms of appropriateness of the applied methodology, and the quality of the feedback provided to the applicants. We argue that the ASN is not fully compliant with the best practices for the use of bibliometric indicators for the evaluation of individual researchers; moreover, the quality of final reports varies considerably across the panels, suggesting that measures should be put in place to prevent sloppy practices in future ASN rounds. △ Less

Submitted 18 March, 2016; v1 submitted 16 July, 2015; originally announced July 2015.

MSC Class: 62P99

Journal ref: Journal of Informetrics 10(2), May 2016, pp. 408-438

arXiv:1505.02435 [pdf, ps, other]

doi 10.1007/978-3-319-08234-9_39-1

Cloud for Gaming

Authors: Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla

Abstract: Cloud for Gaming refers to the use of cloud computing technologies to build large-scale gaming infrastructures, with the goal of improving scalability and responsiveness, improve the user's experience and enable new business models. Cloud for Gaming refers to the use of cloud computing technologies to build large-scale gaming infrastructures, with the goal of improving scalability and responsiveness, improve the user's experience and enable new business models. △ Less

Submitted 17 May, 2016; v1 submitted 10 May, 2015; originally announced May 2015.

Comments: Encyclopedia of Computer Graphics and Games. Newton Lee (Editor). Springer International Publishing, 2015, ISBN 978-3-319-08234-9

ACM Class: C.2.4; I.6.8

arXiv:1412.4081 [pdf, other]

doi 10.1016/j.joi.2015.02.006

Quantitative Analysis of the Italian National Scientific Qualification

Authors: Moreno Marzolla

Abstract: The Italian National Scientific Qualification (ASN) was introduced in 2010 as part of a major reform of the national university system. Under the new regulation, the scientific qualification for a specific role (associate or full professor) and field of study is required to apply to a permanent professor position. The ASN is peculiar since it makes use of bibliometric indicators with associated th… ▽ More The Italian National Scientific Qualification (ASN) was introduced in 2010 as part of a major reform of the national university system. Under the new regulation, the scientific qualification for a specific role (associate or full professor) and field of study is required to apply to a permanent professor position. The ASN is peculiar since it makes use of bibliometric indicators with associated thresholds as one of the parameters used to assess applicants. Overall, more than 59000 applications were submitted, and the results have been made publicly available for a short period of time, including the values of the quantitative indicators for each applicant. The availability of this wealth of information provides an opportunity to draw a fairly detailed picture of a nation-wide evaluation exercise, and to study the impact of the bibliometric indicators on the qualification results. In this paper we provide a first account of the Italian ASN from a quantitative point of view. We show that significant differences exist among scientific disciplines, in particular with respect to the fraction of qualified applicants, that can not be easily explained. Furthermore, we describe some issues related to the definition and use of the bibliometric indicators and thresholds. Our analysis aims at drawing attention to potential problems that should be addressed by decision-makers in future ASN rounds. △ Less

Submitted 12 March, 2015; v1 submitted 12 December, 2014; originally announced December 2014.

Comments: ISSN 1751-1577

MSC Class: 62P99

Journal ref: Journal of Informetrics, Volume 9, Issue 2, April 2015, Pages 285-316

arXiv:1410.1309 [pdf, other]

BiDAl: Big Data Analyzer for Cluster Traces

Authors: Alkida Balliu, Dennis Olivetti, Ozalp Babaoglu, Moreno Marzolla, Alina Sîrbu

Abstract: Modern data centers that provide Internet-scale services are stadium-size structures housing tens of thousands of heterogeneous devices (server clusters, networking equipment, power and cooling infrastructures) that must operate continuously and reliably. As part of their operation, these devices produce large amounts of data in the form of event and error logs that are essential not only for iden… ▽ More Modern data centers that provide Internet-scale services are stadium-size structures housing tens of thousands of heterogeneous devices (server clusters, networking equipment, power and cooling infrastructures) that must operate continuously and reliably. As part of their operation, these devices produce large amounts of data in the form of event and error logs that are essential not only for identifying problems but also for improving data center efficiency and management. These activities employ data analytics and often exploit hidden statistical patterns and correlations among different factors present in the data. Uncovering these patterns and correlations is challenging due to the sheer volume of data to be analyzed. This paper presents BiDAl, a prototype "log-data analysis framework" that incorporates various Big Data technologies to simplify the analysis of data traces from large clusters. BiDAl is written in Java with a modular and extensible architecture so that different storage backends (currently, HDFS and SQLite are supported), as well as different analysis languages (current implementation supports SQL, R and Hadoop MapReduce) can be easily selected as appropriate. We present the design of BiDAl and describe our experience using it to analyze several public traces of Google data clusters for building a simulation model capable of reproducing observed behavior. △ Less

Submitted 6 October, 2014; originally announced October 2014.

Comments: published in E. Plödereder, L. Grunske, E. Schneider, D. Ull (editors), proc. INFORMATIK 2014 Workshop on System Software Support for Big Data (BigSys 2014), September 25--26 2014, Stuttgart, Germany, Lecture Notes in Informatics (LNI) Proceedings, Series of the Gesellschaft für Informatik (GI), Volume P-232, pp. 1781--1795, ISBN 978-3-88579-626-8, ISSN 1617-5468

Journal ref: proc. INFORMATIK 2014 Workshop on System Software Support for Big Data (BigSys 2014), Lecture Notes in Informatics (LNI), Volume P-232, pp. 1781-1795, ISBN 78-3-88579-626-8, ISSN 1617-5468

arXiv:1407.6470 [pdf, other]

doi 10.1016/j.simpat.2014.06.007

New Trends in Parallel and Distributed Simulation: from Many-Cores to Cloud Computing

Authors: Gabriele D'Angelo, Moreno Marzolla

Abstract: Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are happening at the ends of the computing spectrum: at the "small" scale, processors now include an increasing number of independent execution units (cores), at t… ▽ More Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are happening at the ends of the computing spectrum: at the "small" scale, processors now include an increasing number of independent execution units (cores), at the point that a mere CPU can be considered a parallel shared-memory computer; at the "large" scale, the Cloud Computing paradigm allows applications to scale by offering resources from a large pool on a pay-as-you-go model. Multi-core processors and Clouds both require applications to be suitably modified to take advantage of the features they provide. In this paper, we analyze the state of the art of parallel and distributed simulation techniques, and assess their applicability to multi-core architectures or Clouds. It turns out that most of the current approaches exhibit limitations in terms of usability and adaptivity which may hinder their application to these new computing architectures. We propose an adaptive simulation mechanism, based on the multi-agent system paradigm, to partially address some of those limitations. While it is unlikely that a single approach will work well on both settings above, we argue that the proposed adaptive mechanism has useful features which make it attractive both in a multi-core processor and in a Cloud system. These features include the ability to reduce communication costs by migrating simulation components, and the support for adding (or removing) nodes to the execution architecture at runtime. We will also show that, with the help of an additional support layer, parallel and distributed simulations can be executed on top of unreliable resources. △ Less

Submitted 4 April, 2017; v1 submitted 24 July, 2014; originally announced July 2014.

Comments: Simulation Modelling Practice and Theory (SIMPAT), Elsevier, vol. 49 (December 2014)

arXiv:1406.6311 [pdf]

doi 10.1140/epjc/s10052-014-3026-9

The Physics of the B Factories

Authors: A. J. Bevan, B. Golob, Th. Mannel, S. Prell, B. D. Yabsley, K. Abe, H. Aihara, F. Anulli, N. Arnaud, T. Aushev, M. Beneke, J. Beringer, F. Bianchi, I. I. Bigi, M. Bona, N. Brambilla, J. B rodzicka, P. Chang, M. J. Charles, C. H. Cheng, H. -Y. Cheng, R. Chistov, P. Colangelo, J. P. Coleman, A. Drutskoy , et al. (2009 additional authors not shown)

Abstract: This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C. Please note that version 3 on the archive is the auxiliary… ▽ More This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C. Please note that version 3 on the archive is the auxiliary version of the Physics of the B Factories book. This uses the notation alpha, beta, gamma for the angles of the Unitarity Triangle. The nominal version uses the notation phi_1, phi_2 and phi_3. Please cite this work as Eur. Phys. J. C74 (2014) 3026. △ Less

Submitted 31 October, 2015; v1 submitted 24 June, 2014; originally announced June 2014.

Comments: 928 pages, version 3 (arXiv:1406.6311v3) corresponds to the alpha, beta, gamma version of the book, the other versions use the phi1, phi2, phi3 notation

Report number: SLAC-PUB-15968, KEK Preprint 2014-3

Journal ref: Eur. Phys. J. C74 (2014) 3026

arXiv:1405.4329 [pdf, other]

doi 10.1109/TNSE.2015.2425961

Spreading processes in Multilayer Networks

Authors: Mostafa Salehi, Rajesh Sharma, Moreno Marzolla, Matteo Magnani, Payam Siyari, Danilo Montesi

Abstract: Several systems can be modeled as sets of interconnected networks or networks with multiple types of connections, here generally called multilayer networks. Spreading processes such as information propagation among users of an online social networks, or the diffusion of pathogens among individuals through their contact network, are fundamental phenomena occurring in these networks. However, while… ▽ More Several systems can be modeled as sets of interconnected networks or networks with multiple types of connections, here generally called multilayer networks. Spreading processes such as information propagation among users of an online social networks, or the diffusion of pathogens among individuals through their contact network, are fundamental phenomena occurring in these networks. However, while information diffusion in single networks has received considerable attention from various disciplines for over a decade, spreading processes in multilayer networks is still a young research area presenting many challenging research issues. In this paper we review the main models, results and applications of multilayer spreading processes and discuss some promising research directions. △ Less

Submitted 4 December, 2014; v1 submitted 16 May, 2014; originally announced May 2014.

Comments: 21 pages, 3 figures, 4 tables

Journal ref: IEEE Transactions on Network Science and Engineering (TNSE), 2015

arXiv:1309.3458 [pdf, ps, other]

doi 10.1109/DS-RT.2013.23

A Parallel Data Distribution Management Algorithm

Authors: Moreno Marzolla, Gabriele D'Angelo, Marco Mandrioli

Abstract: Identifying intersections among a set of d-dimensional rectangular regions (d-rectangles) is a common problem in many simulation and modeling applications. Since algorithms for computing intersections over a large number of regions can be computationally demanding, an obvious solution is to take advantage of the multiprocessing capabilities of modern multicore processors. Unfortunately, many solut… ▽ More Identifying intersections among a set of d-dimensional rectangular regions (d-rectangles) is a common problem in many simulation and modeling applications. Since algorithms for computing intersections over a large number of regions can be computationally demanding, an obvious solution is to take advantage of the multiprocessing capabilities of modern multicore processors. Unfortunately, many solutions employed for the Data Distribution Management service of the High Level Architecture are either inefficient, or can only partially be parallelized. In this paper we propose the Interval Tree Matching (ITM) algorithm for computing intersections among d-rectangles. ITM is based on a simple Interval Tree data structure, and exhibits an embarrassingly parallel structure. We implement the ITM algorithm, and compare its sequential performance with two widely used solutions (brute force and sort-based matching). We also analyze the scalability of ITM on shared-memory multicore processors. The results show that the sequential implementation of ITM is competitive with sort-based matching; moreover, the parallel implementation provides good speedup on multicore processors. △ Less

Submitted 17 May, 2016; v1 submitted 13 September, 2013; originally announced September 2013.

Comments: In proc. of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2013), oct 30-nov 1, 2013, Delft, the Netherlands

ACM Class: F.2.2

arXiv:1209.5243 [pdf, other]

Walking with the Oracle: Efficient Use of Mobile Networks through Location-Awareness

Authors: Stefano Ferretti, Vittorio Ghini, Moreno Marzolla, Fabio Panzieri

Abstract: Always Best Packet Switching (ABPS) is a novel approach for wireless communications that enables mobile nodes, equipped with multiple network interface cards (NICs), to dynamically determine the most appropriate NIC to use. Using ABPS, a mobile node can seamlessly switch to a different NIC in order to get better performance, without causing communication interruptions at the application level. To… ▽ More Always Best Packet Switching (ABPS) is a novel approach for wireless communications that enables mobile nodes, equipped with multiple network interface cards (NICs), to dynamically determine the most appropriate NIC to use. Using ABPS, a mobile node can seamlessly switch to a different NIC in order to get better performance, without causing communication interruptions at the application level. To make this possible, NICs are kept always active and a software monitor constantly probes the channels for available access points. While this ensures maximum connection availability, considerable energy may be wasted when no access points are available for a given NIC. In this paper we address this issue by investigating the use of an "oracle" able to provide information on network availability. This allows to dynamically switch on/off NICs based on reported availability, thus reducing the power consumption. We present a Markov model which allows us to estimate the impact of the oracle on the ABPS mechanism: results show that significant reduction in energy consumption can be achieved with minimal impact on connection availability. We conclude by describing a prototype implementation of the oracle based on Web services and geolocalization. △ Less

Submitted 24 September, 2012; originally announced September 2012.

Comments: A revised version of this paper appears in Proceedings of Wireless Days 2012, November 21-23 2012, Dublin, Ireland

ACM Class: C.2.2; C.4

arXiv:1206.2775 [pdf, ps, other]

doi 10.1145/2364474.2364487

Parallel Discrete Event Simulation with Erlang

Authors: Luca Toscano, Gabriele D'Angelo, Moreno Marzolla

Abstract: Discrete Event Simulation (DES) is a widely used technique in which the state of the simulator is updated by events happening at discrete points in time (hence the name). DES is used to model and analyze many kinds of systems, including computer architectures, communication networks, street traffic, and others. Parallel and Distributed Simulation (PADS) aims at improving the efficiency of DES by p… ▽ More Discrete Event Simulation (DES) is a widely used technique in which the state of the simulator is updated by events happening at discrete points in time (hence the name). DES is used to model and analyze many kinds of systems, including computer architectures, communication networks, street traffic, and others. Parallel and Distributed Simulation (PADS) aims at improving the efficiency of DES by partitioning the simulation model across multiple processing elements, in order to enabling larger and/or more detailed studies to be carried out. The interest on PADS is increasing since the widespread availability of multicore processors and affordable high performance computing clusters. However, designing parallel simulation models requires considerable expertise, the result being that PADS techniques are not as widespread as they could be. In this paper we describe ErlangTW, a parallel simulation middleware based on the Time Warp synchronization protocol. ErlangTW is entirely written in Erlang, a concurrent, functional programming language specifically targeted at building distributed systems. We argue that writing parallel simulation models in Erlang is considerably easier than using conventional programming languages. Moreover, ErlangTW allows simulation models to be executed either on single-core, multicore and distributed computing architectures. We describe the design and prototype implementation of ErlangTW, and report some preliminary performance results on multicore and distributed architectures using the well known PHOLD benchmark. △ Less

Submitted 24 July, 2014; v1 submitted 13 June, 2012; originally announced June 2012.

Comments: Proceedings of ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC 2012) in conjunction with ICFP 2012. ISBN: 978-1-4503-1577-7

ACM Class: D.1.3; I.6.8

arXiv:1206.2772 [pdf, ps, other]

doi 10.4108/icst.simutools.2012.247736

Time Warp on the Go (Updated Version)

Authors: Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla

Abstract: In this paper we deal with the impact of multi and many-core processor architectures on simulation. Despite the fact that modern CPUs have an increasingly large number of cores, most softwares are still unable to take advantage of them. In the last years, many tools, programming languages and general methodologies have been proposed to help building scalable applications for multi-core architectur… ▽ More In this paper we deal with the impact of multi and many-core processor architectures on simulation. Despite the fact that modern CPUs have an increasingly large number of cores, most softwares are still unable to take advantage of them. In the last years, many tools, programming languages and general methodologies have been proposed to help building scalable applications for multi-core architectures, but those solutions are somewhat limited. Parallel and distributed simulation is an interesting application area in which efficient and scalable multi-core implementations would be desirable. In this paper we investigate the use of the Go Programming Language to implement optimistic parallel simulations based on the Time Warp mechanism. Specifically, we describe the design, implementation and evaluation of a new parallel simulator. The scalability of the simulator is studied when in presence of a modern multi-core CPU and the effects of the Hyper-Threading technology on optimistic simulation are analyzed. △ Less

Submitted 29 July, 2014; v1 submitted 13 June, 2012; originally announced June 2012.

Comments: Proceedings of 3nd ICST/CREATE-NET Workshop on DIstributed SImulation and Online gaming (DISIO 2012). In conjunction with SIMUTools 2012. Desenzano, Italy, March 2012. ISBN: 978-1-936968-47-3

arXiv:1109.0397 [pdf, ps, other]

doi 10.1109/Mobilware.2013.16

Auction-Based Resource Allocation in Digital Ecosystems

Authors: Moreno Marzolla, Stefano Ferretti, Gabriele D'Angelo

Abstract: The proliferation of portable devices (PDAs, smartphones, digital multimedia players, and so forth) allows mobile users to carry around a pool of computing, storage and communication resources. Sharing these resources with other users ("Digital Organisms" -- DOs) opens the door to novel interesting scenarios, where people trade resources to allow the execution, anytime and anywhere, of application… ▽ More The proliferation of portable devices (PDAs, smartphones, digital multimedia players, and so forth) allows mobile users to carry around a pool of computing, storage and communication resources. Sharing these resources with other users ("Digital Organisms" -- DOs) opens the door to novel interesting scenarios, where people trade resources to allow the execution, anytime and anywhere, of applications that require a mix of capabilities. In this paper we present a fully distributed approach for resource sharing among multiple devices owned by different mobile users. Our scheme enables DOs to trade computing/networking facilities through an auction-based mechanism, without the need of a central control. We use a set of numerical experiments to compare our approach with an optimal (centralized) allocation strategy that, given the set of resource demands and offers, maximizes the number of matches. Results confirm the effectiveness of our approach since it produces a fair allocation of resources with low computational cost, providing DOs with the means to form an altruistic digital ecosystem. △ Less

Submitted 30 July, 2014; v1 submitted 2 September, 2011; originally announced September 2011.

Comments: Proceedings of the 6th International Conference on MOBILe Wireless MiddleWARE, Operating Systems, and Applications (MobilWare 2013). Bologna, Italy, November 11-12, 2013

ACM Class: C.2.4; H.m

arXiv:1104.5392 [pdf, other]

A Framework for QoS-aware Execution of Workflows over the Cloud

Authors: Moreno Marzolla, Raffaela Mirandola

Abstract: The Cloud Computing paradigm is providing system architects with a new powerful tool for building scalable applications. Clouds allow allocation of resources on a "pay-as-you-go" model, so that additional resources can be requested during peak loads and released after that. However, this flexibility asks for appropriate dynamic reconfiguration strategies. In this paper we describe SAVER (qoS-Aware… ▽ More The Cloud Computing paradigm is providing system architects with a new powerful tool for building scalable applications. Clouds allow allocation of resources on a "pay-as-you-go" model, so that additional resources can be requested during peak loads and released after that. However, this flexibility asks for appropriate dynamic reconfiguration strategies. In this paper we describe SAVER (qoS-Aware workflows oVER the Cloud), a QoS-aware algorithm for executing workflows involving Web Services hosted in a Cloud environment. SAVER allows execution of arbitrary workflows subject to response time constraints. SAVER uses a passive monitor to identify workload fluctuations based on the observed system response time. The information collected by the monitor is used by a planner component to identify the minimum number of instances of each Web Service which should be allocated in order to satisfy the response time constraint. SAVER uses a simple Queueing Network (QN) model to identify the optimal resource allocation. Specifically, the QN model is used to identify bottlenecks, and predict the system performance as Cloud resources are allocated or released. The parameters used to evaluate the model are those collected by the monitor, which means that SAVER does not require any particular knowledge of the Web Services and workflows being executed. Our approach has been validated through numerical simulations, whose results are reported in this paper. △ Less

Submitted 28 April, 2011; originally announced April 2011.

Journal ref: Proc. 2nd International Conference on Cloud Computing and Services Science (CLOSER 2012), Frank Leymann, Ivan Ivanov, Marten Von Sideren, Tony Shan (Editors), April 18-21 2012, Porto, Portugal, ISBN 978-989-8565-05-1, pp. 216--221

arXiv:1102.0720 [pdf, ps, other]

doi 10.4108/icst.simutools.2011.245539

Adaptive Event Dissemination for Peer-to-Peer Multiplayer Online Games

Authors: Gabriele D'Angelo, Stefano Ferretti, Moreno Marzolla

Abstract: In this paper we show that gossip algorithms may be effectively used to disseminate game events in Peer-to-Peer (P2P) Multiplayer Online Games (MOGs). Game events are disseminated through an overlay network. The proposed scheme exploits the typical behavior of players to tune the data dissemination. In fact, it is well known that users playing a MOG typically generate game events at a rate that ca… ▽ More In this paper we show that gossip algorithms may be effectively used to disseminate game events in Peer-to-Peer (P2P) Multiplayer Online Games (MOGs). Game events are disseminated through an overlay network. The proposed scheme exploits the typical behavior of players to tune the data dissemination. In fact, it is well known that users playing a MOG typically generate game events at a rate that can be approximated using some (game dependent) probability distribution. Hence, as soon as a given node experiences a reception rate, for messages coming from a given peer, which is lower than expected, it can send a stimulus to the neighbor that usually forwards these messages, asking it to increase its dissemination probability. Three variants of this approach will be studied. According to the first one, upon reception of a stimulus from a neighbor, a peer increases its dissemination probability towards that node irrespectively from the sender. In the second protocol a peer increases only the dissemination probability for a given sender towards all its neighbors. Finally, the third protocol takes into consideration both the sender and the neighbor in order to decide how to increase the dissemination probability. We performed extensive simulations to assess the efficacy of the proposed scheme, and based on the simulation results we compare the different dissemination protocols. The results confirm that adaptive gossip schemes are indeed effective and deserve further investigation. △ Less

Submitted 28 July, 2014; v1 submitted 3 February, 2011; originally announced February 2011.

Comments: ICST/CREATE-NET DISIO 2011: 2nd Workshop on DIstributed SImulation and Online gaming. March 21, 2011, Barcelona, Spain

ACM Class: D.2.8; H.4; K.8.0

arXiv:cs/0305054 [pdf, ps, other]

A Monitoring System for the BaBar INFN Computing Cluster

Authors: M. Marzolla, V. Melloni

Abstract: Monitoring large clusters is a challenging problem. It is necessary to observe a large quantity of devices with a reasonably short delay between consecutive observations. The set of monitored devices may include PCs, network switches, tape libraries and other equipments. The monitoring activity should not impact the performances of the system. In this paper we present PerfMC, a monitoring system… ▽ More Monitoring large clusters is a challenging problem. It is necessary to observe a large quantity of devices with a reasonably short delay between consecutive observations. The set of monitored devices may include PCs, network switches, tape libraries and other equipments. The monitoring activity should not impact the performances of the system. In this paper we present PerfMC, a monitoring system for large clusters. PerfMC is driven by an XML configuration file, and uses the Simple Network Management Protocol (SNMP) for data collection. SNMP is a standard protocol implemented by many networked equipments, so the tool can be used to monitor a wide range of devices. System administrators can display informations on the status of each device by connecting to a WEB server embedded in PerfMC. The WEB server can produce graphs showing the value of different monitored quantities as a function of time; it can also produce arbitrary XML pages by applying XSL Transformations to an internal XML representation of the cluster's status. XSL Transformations may be used to produce HTML pages which can be displayed by ordinary WEB browsers. PerfMC aims at being relatively easy to configure and operate, and highly efficient. It is currently being used to monitor the Italian Reprocessing farm for the BaBar experiment, which is made of about 200 dual-CPU Linux machines. △ Less

Submitted 29 May, 2003; originally announced May 2003.

Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 10 pages, LaTeX, 4 eps figures. PSN MOET006

ACM Class: B.8.2; C.2.3

Journal ref: ECONFC0303241:MOET006,2003

Showing 1–31 of 31 results for author: Marzolla, M