A Simulation Model of the Distributed Data Collection Process
distributed data collection, simulation model, data collection, distributed systems, queueing modelAbstract
In this paper, the authors consider the process of collecting and processing data from various web sources. A simple data collection model based on cyclic iteration is investigated, its main disadvantages are identified. The model of distributed data collection as a multi-channel queuing system with unlimited waiting is described. This model uses multiple nodes to access an online resource, and a message queue is used to store information about tasks and to balance them between nodes. The distributed model is also fault tolerance and horizontal scalable. The authors compare simple and distributed models using the AnyLogic simulation tool. Additionaly, various distributions of the server response time of an external web-resource are used in the modeling process. The simulation results show the effectiveness of the distributed data collection process both by the criterion of time and by the criterion of unit cost.
