Control Systems and Computers, N3, 2017, Article 1

DOI: https://doi.org/10.15407/usim.2017.03.006

Upr. sist. maš., 2017, Issue 3 (269), pp. 6-19.

UDC 004.9:004.75:004.451.82:004.738.52: 004.823

A.P. Lozinskiy1, V.M. Simakhin2, A.A. Oursatyev3

Technologies Modeling for Processing Large Data on the Local Cloud Platform

Junior Research Associate, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03186, Ukraine, loza@irtc.org.ua

2 Engineer, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03186, Ukraine, sima@irtc.org.ua

PhD in Techn. Sciences, Leading Research Associate, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03186, Ukraine, aleksei@irtc.org.ua

Introduction. The implementation of the operational local cloud platform model is considered which provide two services at the SaaS level. The first one relates to the issue of optimizing the organization of workplaces in the organization. The second one implements the model of the large data processing environment. The main issue is to solve the problem of combining heterogeneous tasks within a common environment and redistributing computing resources between them.

Purpose. The purpose of this article is to build a model of a multi-purpose local cloud platform with a flexible redistribution of power between workloads and to consider the usages of two applications for the model: the service of terminal access to desktops and the Hadoop software platform for big data analysis. In addition, the modeling of the search engine element – the search robot – as the task of big data analysis.

Methods. Methods of modeling and abstraction were used.

Results. A functioning model of the local cloud platform, which provides a flexible mechanism for modeling and deploying platforms of a wide range of architectures, purpose and production, is proposed. The possibilities of using existing solutions of search robots are analyzed and our own experimental development is created. Both variants of search robots provide the necessary information in a suitable form for the work of consequential elements of search engines.

Conclusion. The model is proposed as a general solution for the deployment of private local clouds in enterprises. One of the possible applications implemented based on a platform for big data analytics is the creation of a search engine.

Keywords: cloud platform, processing large data, big data analytics, SaaS level.

Download full text!

  1. Gritsenko V.I., Oursatyev A.A., Lozinskiy A.P. Cloud Technologies Multipurpose Complexes of Georegional Systems, Upr. sist. mas., 2015, n 2, P. 4–17.
  2. Gritsenko V.I., Oursatyev A.A. Cloud Computing and Cloud Model of IT Service Provision, KVT, 2013, 171, P. 5–19.
  3. ISO/IEC 17788:2014 Information technology – Cloud computing – Overview and vocabulary – impl. 15.10. 2014, Brussels: European Committee for Electrotechnical Standardization, 2014, 16 p.
  4. Cloud Computing Synopsis and Recommendations. Recommendations of the National Institute of Standards and Technology. NIST Special Publication 800–146 / L. Badger, T. Grance, R. Patt-Corner et al, URL: http://csrc.nist.gov/publications/nistpubs/800-146/sp800-146.pdf.
  5. AWS Amazon, URL: https://aws.amazon.com/ru/.
  6. Microsoft Azure, URL: https://azure.microsoft.com/ ru-ru/.
  7. Google Cloud Platform, URL: http://cloud.google. com/?hl=ru.
  8. Openstack open source cloud computing software, URL: https://www.openstack.org/.
  9. Lozinskiy A.P. A glance of the functional possibilities of the software zabezpechennnia hmarnoї platform OpenstackIcehouse, Nauk. scraps, 2014, n 122, P. 84–93, URL: http://www.irbis-nbuv.gov.ua/cgi-bin/ irbis_nbuv/cgiirbis_64.exe?C21COM = 2&I21DBN = UJRN&P21DBN = UJRN&IMAGE_FILE_DOWNLO­AD=1&Image_file_name= PDF/Nzped_2014_122_13.pdf.
  10. ISO/IEC 18384-1:2016(E), Information technology – Reference Architecture for Service Oriented Architecture (SOA RA). – URL: https://webstore.iec.ch/preview/ info_isoiec18384-1%7Bed1.0%7Den.pdf.
  11. What is Open Stack?  URL: http://www.openstack. org/software/.
  12. Format of the disk image of the program QEMU, https://ru.wikipedia.org/wiki/Qcow2.
  13. Linux CentOS images download, URL: http://cloud. centos. org/centos/7/images/.
  14. Open Stack Docs, URL: https://docs.openstack.org/.
  15. Heat Orchestration Template (HOT) Guide, URL: http://docs.openstack.org/ developer/heat/template_guide/ hot_guide.html.
  16. Cloudera Enterprise Solution, URL: http://www. cloudera.com/.
  17. What is Apache Hadoop? URL: http://hortonworks. com/hadoop/.
  18. Hadoop&BigData, URL: https://www.mapr.com/ products/apache-hadoop
  19. Oursatyev A.A Some software environments for large data analytics, Upr. sist. mas., 2016, n 3, P. 29–42.
  20. Oursatyev A.A. Some software environments for large data analytics and machine learning, Upr. sist. mas., 2016, n 5, P. 63–75.
  21. Cloudera Enterprise Download. – URL: http://www. cloudera.com/downloads.html
  22. Installing Cloudera Manager and CDH, URL: http:// www.cloudera.com/documentation/enterprise/latest/ topics/ installation.html
  23. Hadoop, Ch. 1: deployment of a cluster, URL: https://habrahabr.ru/company/selectel/blog/198534/
  24. CDH 5 Packaging and Tarball Information, URL: https://www.cloudera.com/documentation/enterprise/ release-notes/topics/cdh_vd_cdh_package_tarball.html
  25. Apache Hadoop 2.7.2 – MapReduce Tutorial, URL: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduTutorial.html#Example:_WordCount_v.2.0
  26. Machine Learning Library (MLlib) Programming Guide – Spark 1.2.0. Documentation. – https://spark. apache.org/docs/1.2.0/mllib–guide.html
  27. GlybovetsA.N., Dmitruk Ya.O. The effectiveness of using programming languages in the Apache Hadoop framework using MapReduce, Upr. sist. mas., 2016, n 5, P. 84–92.
  28. Tarakeswar K., Kavitha D. Search Engines: A Study // J. of Comp. Appl. (JCA) ISSN: 0974-1925, IV, Issue 1, 2011, URL: http://citeseerx.ist.psu.edu/viewdoc/ download? doi=10.1.1.300.4896&rep=rep1&type=pdf
  29. Apache NutchTM  URL: https://nutch.apache.org
  30. Apache GoraTM  URL: https://gora.apache.org/
  31. Front Page – Nutch Wiki, URL: https://wiki.apache. org/nutch/FrontPage#What_is_Apache_Nutch.3F
  32. Nutch Tutorial – Nutch Wiki, URL: https://wiki. apache.org/nutch/NutchTutorial
  33. Nutch Command Line Options of bin/nutch – Nutch Wiki, URL: https://wiki.apache.org/nutch/Command LineOptions
  34. Laboratorio de Investigación Aplicada – Report by Apache Nutch, URL: http://nitec.wikidot.com/
  35. NutchFileFormats – Nutch Wiki, https://wiki. apache. org/nutch/NutchFileFormats
  36. Dubova N. Innovative Accelerators: The “Big Seven”, Open Systems, 2016, n 4, https://www.osp. ru/os/2016/04/13050983.
  37. Scrapy A Fast and Powerful Scraping and Web Crawling Framework, https://scrapy.org
  38. Github – yasserg/crawler4j, https://github.com/ yasserg/crawler4j
  39. Github – scrapinghub/frontera, https://github.com/ scrapinghub/frontera
  40. Brin S., Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine, Comp. Networks and ISDN Syst., April 1, 1998, 30 Issue 1–7, P. 107–117, http://dx.doi.org/ 10.1016/S0169-7552(98)00110-X.
  41. Croft W.B., Metzler D., Strohman T. Search Engines Information Retrieval in Practice, 2015, 518 p.
  42. GlybovetsA.M., Shabinsky AS, Olshevsky R.Ya. Construction of the search robot of Ukrainian-language scientific materials, Sciences. work, 130, T. 143, http:// lib.chdu.edu.ua/pdf/naukpraci/computer/2010/143-130-13.pdf.
  43. Kolyada A.S., Gogunsky V.D. Automation of information retrieval from scientometric databases, Management of rozvitkom folding systems, 2013, 16, P. 96–99, http://journals.uran.ua/urss/artocle/view/38927/35236
  44. Github – kohlschutter/boilerpipe. – https://github.com/ kohlschutter/boilerpipe
  45. Kohlschütter C., Fankhauser P., Nejdl W. Boilerplate Detection using Shallow Text Features – http://www.l3s.de/ ~kohlschuetter/publications/wsdm187-kohlschuetter.pdf
  46. Boilerpipe Web API, https://boilerpipe-web.appspot.com
  47. jsoup: Java HTML Parser, https://jsoup.org
  48. OpenRefine, http://openrefine.org
  49. Cucumber Simple, human collaboration, https://cucumber.io.

Received 19.05.2017