Submit Search. Successfully reported this slideshow. We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare. Like this presentation? Why not share! Embed Size px. Start on. Show related SlideShares at end. WordPress Shortcode. Published in: Career. Full Name Comment goes here. Are you sure you want to Yes No. An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook. Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook or other reading material from a Web site such as Barnes and Noble to be read from the user's computer or reading device.
Generally, an eBook can be downloaded in five minutes or less Browse by Genre Available eBooks The literature differs on a definition of data quality, but one thing is certain: data quality depends not only on its own features but also on the business environment using the data, including business processes and business users. Only the data that conform to the relevant uses and meet requirements can be considered qualified or good quality data.
Usually, data quality standards are developed from the perspective of data producers. In the past, data consumers were either direct or indirect data producers, which ensured the data quality. However, in the age of big data, with the diversity of data sources, data users are not necessarily data producers.
Thus, it is very difficult to measure data quality. Therefore, we propose a hierarchical data quality standard from the perspective of the users, as shown in Figure 1. We chose data quality dimensions commonly accepted and widely used as big data quality standards and redefined their basic concepts based on actual business needs.
At the same time, each dimension was divided into many typical elements associated with it, and each element has its own corresponding quality indicators. In this way, hierarchical quality standards for big data were used for evaluation. Figure 2 shows a universal two-layer data quality standard. Some detailed data quality indicators are given in Table 1.
Series: Morgan Kaufmann Series in Data Management Systems
In Figure 2 , the data quality standard is composed of five dimensions of data quality - availability, usability, reliability, relevance, and presentation quality. For each dimension, we identified 1—5 elements with good practices. The first four quality dimensions are regarded as indispensible, inherent features of data quality, and the final dimension is additional properties that improve customer satisfaction. Availability is defined as the degree of convenience for users to obtain data and related information, which is divided into the three elements of accessibility, authorization, and timeliness.
Reliability refers to whether we can trust the data; this consists of accuracy, consistency, completeness, adequacy, and auditability elements. Presentation quality refers to a valid description method for the data, which allows users to fully understand the data.
The Morgan Kaufmann Series in Data Management Systems
Its dimensions are readability and structure. Descriptions of the data quality elements are given below. We present a big data quality assessment framework in Table 1 , which lists the common quality elements and their associated indicators. Generally, a quality element has its own multi-indicators. An appropriate quality assessment method for big data is necessary to draw valid conclusions. Determining the goals of data collection is the first step of the whole assessment process.
Big data users rationally choose the data to be used according to their strategic objectives or business requirements, such as operations, decision making, and planning. The data sources, types, volume, quality requirements, assessment criteria, and specifications as well as the expected goals need to be determined in advance. In different business environments, the selection of data quality elements will differ. For example, for social media data, timeliness and accuracy are two important quality features. Therefore, credibility has become an important quality dimension.
However, social media data are usually unstructured, and their consistency and integrity are not suitable for evaluation.
Jack E. Olson Books | List of books by author Jack E. Olson
The field of biology is an important source of big data. However, due to the lack of uniform standards, data storage software and data formats vary widely. Thus, it is difficult to regard consistency as a quality dimension, and the needs of regarding timeliness and completeness as data quality dimensions are not high.
In order to further quality assessment, we need to choose specific assessment indicators for every dimension.
These require the data to comply with specific conditions or features. The formulation of assessment indicators also depends on the actual business environment. Each quality dimension needs different measurement tools, techniques, and processes, which leads to differences in assessment times, costs, and human resources.
The preliminary assessment results of data quality dimensions determine the baseline while the remaining assessment as a part of the business process is used for continuous detection and information improvement. After the quality assessment preparation is completed, the process enters the data acquisition phase. In the age of big data, data acquisition is relatively easy, but much of the data collected is not always good.
- Leggende del Mondo Emerso - 1. Il destino di Adhara (Chrysalide) (Italian Edition)?
- Doing Business in India For Dummies®?
- La Iglesia: Desde el Principio Hasta el Final (Spanish Edition)?
We need to improve data quality as far as possible under these conditions without a large increase in acquisition cost. Big data sources are very wide and data structures are complex. The data received may have quality problems, such as data errors, missing information, inconsistencies, noise, etc.
The purpose of data cleaning data scrubbing is to detect and remove errors and inconsistencies from data in order to improve their quality. In these four approaches, the third has good practical value and can be applied successfully. Then, the process enters the data quality assessment and monitoring phases. The core of data quality assessment is how to evaluate each dimension. The current method has two categories: qualitative and quantitative methods. The qualitative evaluation method is based on certain evaluation criteria and requirements, according to assessment purposes and user demands, from the perspective of qualitative analysis to describe and assess data resources.
Qualitative analysis should be performed by subject experts or professionals.
The quantitative method is a formal, objective, and systematic process in which numerical data are utilized to obtain information. Therefore, objectivity, generalizability, and numbers are features often associated with this method, whose evaluation results are more intuitive and concrete. After assessment, the data can be compared with the baseline for the data quality assessment established above.
If the data quality accords with the baseline standard, a follow-up data analysis phase can be entered, and a data quality report will be generated. Otherwise, if the data quality fails to satisfy the baseline standard, it is necessary to acquire new data. Strictly speaking, data analysis and data mining do not belong to the scope of big data quality assessment, but they play an important role in the dynamic adjustment and feedback of data quality assessment.
We can use these two methods to discover whether valuable information or knowledge exists in big data and whether the knowledge can be helpful for policy proposals, business decisions, scientific discoveries, disease treatments, etc.
- Poems of Edward Thomas?
- Sensual Submission: Pursuit.
- A Cat Whisperer Named Marcie: Or Cecil Made Nice.
- Nine Introductions in Complex Analysis - Revised Edition (North-Holland Mathematics Studies)?
- Series by cover.
- Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems).
If the analysis results meet the goal, then the results are outputted and fed back to the quality assessment system so as to provide better support for the next round of assessment. If results do not reach the goal, the data quality assessment baseline may not be reasonable, and we need to adjust it in a timely fashion in order to obtain results in line with our goals. The arrival of the big data era makes data in various industries and fields present explosive growth. How to ensure big data quality and how to analyze and mine information and knowledge hidden behind the data become major issues for industry and academia.
Poor data quality will lead to low data utilization efficiency and even bring serious decision-making mistakes. We analyzed the challenges faced by big data quality and proposed the establishment and hierarchical structure of a data quality framework. Then, we formulated a dynamic big data quality assessment process with a feedback mechanism, which has laid a good foundation for further study of the assessment model.
The next stage of research will involve the construction of a big data quality assessment model and formation of a weight coefficient for each assessment indicator. At the same time, the research team will develop an algorithm used to make a practical assessment of the big data quality in a specific field. Alan, F. Alexander, J. Cao, J. Microcomputer Information 09 , pp 12— Cappiello, C. Crosby, P. Demchenko, Y.
Related Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
Copyright 2019 - All Right Reserved