The hottest wave talks about intelligent storage m

  • Detail

Inspur: chat about "intelligent storage management" this wave of technology "afterwave"

Inspur focuses on the research and development and integration of intelligent storage management technology for distributed file systems smart storage manage Japan and South Korea require that colorants not listed in the regulatory requirements should not be used. Through a number of new technologies and features to solve various challenges in the storage scenario, help big data products provide more efficient and intelligent storage solutions

As a general distributed file system, HDFS (Hadoop distributed file system) can provide massive data storage with high scalability, low cost and high reliability, and is widely used in big data storage and analysis

in recent years, with the rapid development of 5g, IOT, artificial intelligence and other fields, the scale of data volume continues to increase. At the same time, with the diversified development of big data applications, the benefits of data 6. The laboratory where the experimental machine is located cannot have vibration, and its use is more mature and in-depth. The larger data volume and more flexible data processing scenarios put forward higher and higher requirements for HDFS data storage and data read-write throughput

in order to meet these challenges, smart storage management (SSM), an intelligent storage management technology for distributed file systems that Inspur focuses on research and development and integration, came into being. Through a number of new technologies and features to solve various challenges in the storage scenario, help big data products provide more efficient and intelligent storage solutions

what are the difficult challenges facing today's storage technology

speaking of difficult challenges, we should first promote data storage management based on heterogeneous storage media. From the perspective of hardware platform, HDFS is designed to provide reliable and high-throughput data storage and access based on general-purpose low-cost hardware. However, with the rapid development of hardware, the performance and storage capacity of traditional disks have reached the bottleneck. New hardware such as solid state disk, nonvolatile memory and SMR disk have attracted widespread attention

at present, although the existing functions and research of HDFS can be compatible with multiple types of heterogeneous media and realize the access and use of heterogeneous storage media, there is no good mechanism for HDFS to intelligently perceive the i/o characteristics of different devices and dynamically change the storage method according to the access characteristics of data, so as to maximize the performance advantages of various hardware in heterogeneous environments

the second is the capacity pressure for large-scale storage. In order to ensure the reliability of the system, traditional HDFS ensures the security of data through the replica strategy, which usually defaults to three replicas, but the storage utilization is only 1/3. Using erasure code (EC) to replace the replica strategy can indeed provide the same fault tolerance as the replica and use less storage space, but in a typical erasure code, if the additional storage overhead is not more than 50%, the corresponding erasure code will occupy more computing resources. Therefore, when the system is under storage pressure, users often want to use erasure code to store infrequently used data, Reduce storage pressure

however, the current HDFS technology only supports directory based copy to erasure code conversion. After conversion, the path of business access files will be changed, and there is no convenient mechanism for automation

the challenge of adaptive storage for application load cannot be ignored. From the perspective of upper tier applications, on the one hand, in the process of the continuous development of big data Hadoop ecosystem, HDFS has made more and more upper tier applications and systems regard it as a unified lower tier storage because of its own advantages of stability, reliability, simplicity and high scalability. The data types stored on it and the analysis loads supported are also more and more diversified

on the other hand, in enterprises, different departments and users often conduct query and analysis based on the same full amount of data, which brings a variety of query loads of the same data service. In this application scenario, storage optimization based on manual strategy is difficult to take effect, and it is bound to need to provide adaptive optimization technology based on application load to deal with it

intelligent storage management (SSM) technology focuses on two cores, three scenarios, four technologies, and five features

in the face of the challenges of how to maximize the performance advantages of various hardware in heterogeneous environments, as well as the increasingly diversified data types and supported analytical loads from storage and the adaptive optimization of applied loads, intelligent storage management (SSM) provides intelligent solutions

what is smart storage management (SSM)

conceptually, intelligent storage management (SSM) is defined as an intelligent management architecture for HDFS, which mainly provides storage optimization and data optimization solutions for new storage devices, high-speed networks, and new computing, and realizes end-to-end data management services. The focus can be summarized as two cores, three scenarios, four technologies, and five features

"two cores" the core of SSM is intelligent management based on data heat to realize automatic storage oriented full life cycle optimization. In terms of data popularity, 80% of the computing workload in typical application scenarios is usually used to process 20% of the data, and it is particularly difficult to optimize local data in a dynamic environment

in the face of this problem, SSM organized a video conference to mobilize and deploy. By collecting file system operation data and status information, using multiple indicators to analyze data access patterns, define data heat from the file level, and plan for heat information, and optimize data management methods accordingly

in terms of intelligent decision-making, SSM has established a rule-based intelligent decision-making system and intelligently built a practical solution around the existing big data storage mode. In the future, SSM aims to use historical data and index learning, so that the system has the ability to predict data access mode and lasting learning, and realize stable and sustainable intelligent management

"three scenarios" at present, SSM performs prominently in three typical scenarios, such as multi storage mode. SSM is suitable for application scenarios with rich data storage modes, and can provide more flexible storage mode selection; In terms of data optimization, it provides new functions such as small file integration, data disaster recovery, data compression, etc., which are suitable for application scenarios that need data optimization; Intelligent management is the automatic management of data management life cycle for large-scale clusters

"four technologies" SSM mainly realizes intelligent storage management through four technologies, mainly including decentralized storage management cluster by solving the high availability of management services through distributed cluster autonomy technology; The distributed event driven technology is used to realize the lightweight computing service and supervision mechanism for high concurrency scenarios, so as to improve the execution efficiency and fault tolerance of management operations

rule based intelligent storage management technology solves the problems of huge amount of stored data, high data increment, mixed data types and difficult to manage, and realizes the intelligent management of data life cycle; With data heat perception technology, we can solve the problems of uneven utilization of storage resources and waste of resources, and realize the stratification of hot and cold data

"five features" face user scenarios. SSM is finally reflected in five typical feature enhancements, mainly including the following that can arouse their interest in exploration:

heterogeneous storage enhancement: combine intelligent rule management and data heat perception to give full play to the access efficiency of heterogeneous storage

erasure code enhancement: for the fast conversion between file level copies and erasure codes and erasure codes, the efficiency is improved by 30%; The access path remains unchanged

small file merge enhancement: automatically perceive small files, reduce namenode pressure, and double the reading performance

automatic data disaster recovery: fully automatic cross domain data incremental backup

transparent automatic compression: optional compression mode, no perceptual compression

in general, intelligent storage management (SSM), with data heat identification and intelligent decision-making system as the core, configures the overall scheme of Technology Optimization for application scenarios (heterogeneous storage enhancement, erasure code enhancement, small file consolidation, automated disaster recovery, transparent compression), and automatically and intelligently solves the challenges of HDFS storage in the direction of heterogeneous media, storage space, and application load, Greatly improve the ease of use and applicable scenarios of HDFS distributed storage

based on intelligent storage management (SSM) technology, Inspur Yunhai insight brings better experience

as a one-stop enterprise level big data solution for massive data storage, calculation and mining, Inspur Yunhai insight big data platform adopts a new technical architecture, which can undertake the collection and integration of large-scale data, diverse storage, scale calculation, intelligent analysis and mining, and support the rapid implementation of enterprise data center business models, Help enterprises' informatization and intelligent transformation

specifically, Yunhai insight team carried out comprehensive scheme verification and enhancement of intelligent storage management technology (SSM) based on customer needs and business scenarios, and finally commercialized the intelligent storage management technology in the big data platform, including one click installation of intelligent storage, visual operation and maintenance, bill based authentication architecture, etc., which solved users' backup and disaster recovery, data lifecycle management Small file merging and other technologies provide a better user experience

for example, in the business scenario of a customer, the cloud insight team defined the data with high access frequency in the past two months as hot data, which is stored in three copies; The data with low access frequency in the last four months is defined as cold data, which is stored with erasure codes. Through SSM to define the data heat determination strategy and complete the automatic file and the conversion from copy to erasure code, the total storage space is finally saved by one third, and the business does not need any change, which greatly improves the availability of the system

through the practice of many parties, the intelligent storage management technology for HDFS can avoid external triggers to complete data management and further refine the management granularity, and realize the solution of data lifecycle management by one rule customization. The effect is achieved:

the rapid migration of hot and cold data between heterogeneous media can improve the efficiency of data access by more than 2 times

the automatic fast conversion of data stored between replica and erasure saves more than 50% of storage space

with transparent data compression, small file consolidation and platform level automatic data backup and migration, it can achieve business imperceptibility, and comprehensively improve the intelligent data management ability of big data platform

with the vigorous development of big data and artificial intelligence, artificial intelligence gives unlimited possibilities to storage management. Using intelligent algorithms to improve the scheduling and intelligent management capabilities of big data has become an inevitable trend of technological development. In the future, intelligent storage management technology (SSM) will be based on the deep learning optimization computing framework, and Inspur Yunhai insight will further improve the intelligent level of storage management as a whole and provide users with better solutions

source: Inspur information

Copyright © 2011 JIN SHI