导图社区 DAMA DMBOK2.0全知识点总结(第10-12章 参考数据和主数据 数据仓库和商务智能 元数据管理)
CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 (第1-3章 数字管理 数字伦理 数字治理) (第4-6章 数据架构 数据建模和设计 数据存储和操作) (第7-9章 数据安全 数据集成和互操作 文件和内容管理) (第10-12章 参考数据和主数据 数据仓库和商务智能 元数据管理) (第13-17章 数据质量 大数据和数据科学 数据管理成熟度评估 数据管理组织与角色期望 数据管理和组织变革管理) 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
编辑于2023-03-20 21:31:08 北京市CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 (第1-3章 数字管理 数字伦理 数字治理) (第4-6章 数据架构 数据建模和设计 数据存储和操作) (第7-9章 数据安全 数据集成和互操作 文件和内容管理) (第10-12章 参考数据和主数据 数据仓库和商务智能 元数据管理) (第13-17章 数据质量 大数据和数据科学 数据管理成熟度评估 数据管理组织与角色期望 数据管理和组织变革管理) 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 (第1-3章 数字管理 数字伦理 数字治理) (第4-6章 数据架构 数据建模和设计 数据存储和操作) (第7-9章 数据安全 数据集成和互操作 文件和内容管理) (第10-12章 参考数据和主数据 数据仓库和商务智能 元数据管理) (第13-17章 数据质量 大数据和数据科学 数据管理成熟度评估 数据管理组织与角色期望 数据管理和组织变革管理) 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 1-3章 4-6章 7-9章 10-12章 13-17章 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
社区模板帮助中心,点此进入>>
CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 (第1-3章 数字管理 数字伦理 数字治理) (第4-6章 数据架构 数据建模和设计 数据存储和操作) (第7-9章 数据安全 数据集成和互操作 文件和内容管理) (第10-12章 参考数据和主数据 数据仓库和商务智能 元数据管理) (第13-17章 数据质量 大数据和数据科学 数据管理成熟度评估 数据管理组织与角色期望 数据管理和组织变革管理) 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 (第1-3章 数字管理 数字伦理 数字治理) (第4-6章 数据架构 数据建模和设计 数据存储和操作) (第7-9章 数据安全 数据集成和互操作 文件和内容管理) (第10-12章 参考数据和主数据 数据仓库和商务智能 元数据管理) (第13-17章 数据质量 大数据和数据科学 数据管理成熟度评估 数据管理组织与角色期望 数据管理和组织变革管理) 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
CDMP,全称Certified for Data Management Professional,即数据管理专业人士认证,由数据管理国际协会DAMA International建立,是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。 总结了CDMP英文考试的所有知识点,考点,以及历史真题。 适用于从事数据管理,数据治理,数字转型等方面的高级职业认证。 章节和知识点较多,因此分章节和完成时间分发。 1-3章 4-6章 7-9章 10-12章 13-17章 考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化 数据管理专家
DAMA知识点(第10-12章)
Chapter 10: Reference and Master Data 参考数据与主数据

1. Introduction
1.1. Intro
1.1.1. In any organization, certain data is required across business areas, processes, and systems.
28. Data that is required across business processes areas and systems is called A:Event data B:A Data Mart. C:Reference and master Data D:important Data E:Static Data 正确答案:C 你的答案:C 解析:10.1:在任何组织中,都存在一些需要跨业务领域、跨流程和跨系统使用的数据。如果这些数据实现了共享,所有的业务部门就都可以访问相同的客户清单、地理位置代码、业务部门清单、交付选项、部件清单、成本核算中心代码、政府税收代码以及用于运营业务的其他数据,那么整个组织及其客户都会从中受益。数据使用者在看到不一致的数据之前,通常都会假设这些数据在整个组织中具有一定的一致性。
32. Initiatives focused on building a 'single view of customer' mainly rely on which DMBOK knowledge area? A:Data Storage and Operations B:Data Security C:Data Architecture D:Metadata Management E:Reference and master data 正确答案:E 你的答案:E 解析:10.1:在任何组织中,都存在一些需要跨业务领域、跨流程和跨系统使用的数据。如果这些数据实现了共享,所有的业务部门就都可以访问相同的客户清单、地理位置代码、业务部门清单等
1.1.2. In most organizations, systems and data evolve more organically than data management professionals would like.
1.1.3. This variability increases costs and risks. Both can be reduced through the management of Master Data and Reference Data.
1.2. Business Drivers
1.2.1. The most common drivers for initiating a Master Data Management program are:
1. Meeting organizational data requirements:
Multiple areas within an organization need access to the same data sets, with the confidence that the data sets are complete, current, and consistent. Master Data often form the basis of these data sets
21. A strong argument for pursuing a Reference Data and/or Master data management initiative A:By centralizing the management of Reference and Master data,the organization can conform critical data needed for analysis B:Job security for the data people C:It will not require a lot of effort D:They are essential functions in the data management framework E:lt will not require a lot of time 正确答案:A 你的答案:A 解析:10.1:启动主数据管理最常见的驱动因素包括:1)满足组织数据需求。组织中的多个业务领域需要访问相同的数据集,并且他们都相信这些数据集是完整的、最新的、一致的。主数据通常是这些数据集的基础(例如,要想确定一个分析是否需要包含所有客户,就要先对客户有一个统一的定义)。
2. Managing data quality:
Data inconsistencies, quality issues, and gaps, lead to incorrect decisions orlost opportunities; Master Data Management reduces these risks by enabling a consistentrepresentation of the entities critical to the organization.
3. Managing the costs of data integration:
The cost of integrating new data sources into an already complex environment are higher in the absence of Master Data, which reduces variation in how criticalentities are defined and identified.
4. Reducing risk:
Master Data can enable simplification of data sharing architecture to reduce costs andrisk associated with a complex environment.
1.2.2. The drivers for managing Reference Data are similar. Centrally managed Reference Data enables organizations to:
1. Meet data requirements for multiple initiatives and reduce the risks and costs of data integrationthrough use of consistent Reference Data
2. Manage the quality of Reference Data
15. what is a common motivation for Reference Master data management? A:The need to improve data quality and data integrity across multiple data sources B:Business Intelligence Data warehousing C:Regulatory acts such as BCBS239, GDPR and sox D:The need to build a Data Dictionary of all core data entities attributes E:The need to consolidate all data into one physical database 正确答案:A 你的答案:A 解析:10.1 2)管理数据质量。数据的不一致、质量问题和差异均会导致决策错误或丧失机会。主数据管理通过使用统一的标识来定义对组织至关重要的实体,以降低这些风险。3)管理数据集成的成
19. A common driver for initiating a Reference data Management program is A:It fosters the creative use of data B:lt will consolidate the process of securing third party code sets C:lt can be a one-time-only project D:it will improve data quality and facilitate analysis across the organization E:Managing codes and descriptions requires little effort and low cost 正确答案:D 你的答案:D 解析:10.1:参考数据管理的驱动因素与主数据的相似,集中管理的参考数据会使组织获得如下好处:1)通过使用一致的参考数据,满足多个项目的数据需求,降低数据整合的风险和成本。2)提升参考数据的质量。数据驱动型的组织活动通常侧重于交易数据(增加销售或市场份额、降低成本、展示遵从性等),但利用此类交易数据的能力高度依赖参考数据和主数据的可用性和质量。提高参考数据和主数据的可用性及质量,对提升数据的整体质量和业务信心有显著的影响。这些过程对组织还有很多其他好处,主要包括简化IT环境、提高效率和生产力,以及利用这些功能改善客户体验。
1.2.3. While data-driven organizational initiatives focus on transactional data, the ability to leverage such transactional data is highly dependent on the availability and quality of Reference and Master Data.
Improving the availability and quality of Reference and Master Data has a dramatic impact on overall quality of the data and business confidence in data.
These processes have additional benefits to an organization, including simplification of IT landscape, improved efficiency and productivity, and with these, the potential to improve the customer experience.
1.3. Goals and Principles
1.3.1. The goals of a Reference and Master Data Management program include:
1. Ensuring the organization has complete, consistent, current, authoritative Master and Reference Dataacross organizational processes
2. Enabling Master and Reference Data to be shared across enterprise functions and applications
3. Lowering the cost and reducing the complexity of data usage and integration through standards,common data models, and integration patterns
1.3.2. Reference and Master Data Management follow these guiding principles:
1. Shared Data 共享数据
Reference and Master Data must be managed so that they are shareable across theorganization.
2. Ownership: 所有权
Reference and Master Data belong to the organization, not to a particular application ordepartment. Because they are widely shared, they require a high level of stewardship.
3. Quality:
Reference and Master Data Management require ongoing Data Quality monitoring andgovernance.
4. Stewardship: 管理职责
Business Data Stewards are accountable for controlling and ensuring the quality of Reference Data.
5. Controlled Change: 控制变更
At a given point of time, Master Data values should represent the organization’s best understanding of what is accurate and current. Matching rules that change values should beapplied with caution and oversight. Any identifier merged or split should be reversible.
Changes to Reference Data values should follow a defined process; changes should be approved and communicated before they are implemented.
6. Authority:
Master Data values should be replicated only from the system of record 记录系统. A system of reference 参考数据管理系统 may be required to enable sharing of Master Data across an organization.
1.4. Essential Concepts
1.4.1. Differences Between Master and Reference Data 主数据与参考数据的区别
Malcolm Chisholm has proposed a six-layer taxonomy of data that includes Metadata, Reference Data, enterprise structure data, transaction structure data, transaction activity data, and transaction audit data (Chisholm, 2008; Talburt and Zhou, 2015).
Reference Data, for example, code and description tables, is data that is used solely to characterize other data in an organization, or solely to relate data in a database to information beyond the boundaries of the organization.
12. In a data warehouse,where the classification lists for organization type are inconsistent in different source systems there is an indication that there is a lack of focus on A:Metadata Management B:Master data C:Reference data D:Data Storage E:Data Modelling 正确答案:C 你的答案:C 解析:10.1.3:在这种分类法中,他将主数据定义为参考数据、企业结构数据和交易结构数据的聚合。1)参考数据(Reference Data)。例如,代码表和描述表,仅用于描述组织中的其他数据,或者仅用于将数据库中的数据与组织之外的信息联系起来。
Enterprise Structure Data 企业结构数据, for example, a chart of accounts, enables reporting of business activity bybusiness responsibility.
Transaction Structure Data 交易结构数据, for example customer identifiers, describes the things must be present for a transaction to occur: products, customers, vendors.
Master Data is “the data that provides the context for business activity data in the form of common and abstract concepts that relate to the activity. It includes the details (definitions and identifiers) of internal and external objects involved in business transactions, such as customers, products, employees, vendors, and controlled domains (code values)” (DAMA, 2009).
18. Master data differs from Reference data in the following way A:Master data do not require business definitions B:Master data should be held to a higher data quality standard than Reference data C:Master data does not require a data steward D:Master data is stipulated and controlled by data Governance where Reference data is not E:Unlike Reference data,Master data is not usually limited to predefined domain values 正确答案:E 你的答案:E 解析:10.1.3:在这种分类法中,他将主数据定义为参考数据、企业结构数据和交易结构数据的聚合。1)参考数据(Reference Data)。例如,代码表和描述表,仅用于描述组织中的其他数据,或者仅用于将数据库中的数据与组织之外的信息联系起来。
The primary challenge with Master Data is entity resolution (also called identity management 身份管理), the process of discerning and managing associations between data from different systems and processes.
Master Data requires identifying and / or developing a trusted version of truth for each instance of conceptual entities such as product, place, account, person, or organization and maintaining the currency of that version.
Reference Data and Master Data share conceptually similar purposes. Both provide context critical to the creation and use of transactional data. (Reference Data also provides context for Master Data.) They enable data to be meaningfully understood. Importantly, both are shared resources that should be managed at the enterprise level. Having multiple instances of the same Reference Data is inefficient and inevitably leads to inconsistency between them. Inconsistency leads to ambiguity, and ambiguity introduces risk to an organization. A successful Reference Data or Master Data Management program involves the full range of data management functions (Data Governance, Data Quality, Metadata Management, Data Integration, etc.).
Reference Data also has characteristics that distinguish it from other kinds of Master Data (e.g., enterprise and transactional structure data). It is less volatile. Reference Data sets are generally less complex and smaller than either Transactional or Master Data sets. They have fewer columns and fewer rows. The challenges of entity resolution are not part of Reference Data Management.
20. Reference data A:Usually has more attributes than master data B:Usually has fewer attributes than Master Data C:Is free D:ls also known as external data E:Is more difficult to Govern than master data 正确答案:B 你的答案:B
Master Data Management (MDM) 主数据管理 entails control over Master Data values and identifiers that enable consistent use, across systems, of the most accurate and timely data about essential business entities. The goals of MDM include ensuring availability of accurate, current values while reducing risks associated with ambiguous identifiers (those identified with more than one instance of an entity andthose that refer to more than one entity).
13. Which of the following is NOT a primary Master Data Management area of focus? A:identifying duplicate records B:Generating a golden record best version of the truth C:Producing clear data definitions for Master Data D:Producing read only versions of key data items E:Providing access to golden data records 正确答案:D 你的答案:D 解析:10.1.3. 2)主数据管理(Master Data Management,MDM)。需要对主数据的值和标识符进行控制,以便能够跨系统地、一致地使用核心业务实体中最准确、最及时的数据。主数据管理的目标包括确保当前值的准确性和可用性,同时降低由那些不明确的标识符所引发的相关风险(那些被识别为具有多个实例的实体和那些涉及多个实体的实例)。
6. Master Data Management encompasses 包含all these activities EXCEPT A:integration of new data sources B:classification of data C:administering golden data D:development of new procedures and policies E:None 正确答案:B 你的答案:D 解析:B分类属于文件和内容管理
Reference Data Management (RDM) 参考数据管理 entails control over defined domain values and theirdefinitions. The goal of RDM is to ensure the organization has access to a complete set of accurate and current values for each concept represented.
One challenge of Reference Data Management is that of ownership or responsibility for definition and maintenance. Some Reference Data originates outside of the organizations that use it. Some crosses internal organizational boundaries and may not be owned by a single department. Other Reference Data may be created and maintained within a department but have potential value elsewhere in an organization.
2. Reference data A:should NOT be sourced from outside the company. B:can be sourced from inside and outside the company C:is irrelevant 无关紧要 EXCEPT for large corporations. D:has been replaced by master data E:All 正确答案:B 你的答案:B 解析:10.1.3:参考数据管理面临的一个挑战是由谁主导或负责参考数据的定义和维护。一些参考数据来源于使用它的组织之外,它们跨越了组织内部的边界,不只被一个部门所有。其他的参考数据可能会在某个部门中被创建和维护,但在组织的其他部门具有潜在价值。
1.4.2. Reference Data 参考数据
Reference Data is any data used to characterize or classify other data, or to relate data to information external to an organization
Common storage techniques use
Code tables in relational databases 关系数据库中的代码表
linked via foreign keys to other tables to maintain referentialintegrity functions within the database management system
37. A database uses foreign keys from code tables for column values. This is a way of implementing A:temporal data B:master data. C:reference data D:event data E:star schema data 正确答案:C 你的答案:C 解析:10.1.3:在这种分类法中,他将主数据定义为参考数据、企业结构数据和交易结构数据的聚合。1)参考数据(Reference Data)。例如,代码表和描述表,仅用于描述组织中的其他数据,或者仅用于将数据库中的数据与组织之外的信息联系起来。
Reference Data Management systems 参考数据管理系统
that maintain business entities, allowed, future-state, or deprecated values 弃用值, and term mapping rules to support broader application and data integration use
Object attribute specific Metadata to specify permissible values
with a focus on API or user interfaceaccess
11. Reference data is A:represents B:identifies C:locates D:categorizes other data E:metadata 正确答案:D 你的答案:E 解析:10.1.3. 如前所述,参考数据是指可用于描述或分类其他数据,或者将数据与组织外部的信息联系起来的任何数据(Chisholm,2001)。
22. Reference data A:Have limited value You Answered B:Have obvious definitions C:Are used to categorize and classify other data D:Are always supplied by outside vendors E:when incorrect has a greater impact than errors in Master Transaction data 正确答案:C 你的答案:C 解析: 10.1.3. 如前所述,参考数据是指可用于描述或分类其他数据,或者将数据与组织外部的信息联系起来的任何数据(Chisholm,2001)。
24. which of these is a valid definition of reference data? A:Data that has a common and widely understood data definition B:Data that is fixed and never changes C:Data that provides metadata about other data entities D:Data that is widely accessed and referenced across an organization E:Data used to classify or categorize other data 正确答案:E 你的答案:C 解析: 10.1.3. 如前所述,参考数据是指可用于描述或分类其他数据,或者将数据与组织外部的信息联系起来的任何数据(Chisholm,2001)。
27. The loading of country codes into a CRM is a classic: A:reference data integration B:analytics data integration C:fact data integration. D:master data integration E:transaction data integration 正确答案:A 你的答案:A 解析:10.1.3:2.参考数据如前所述,参考数据是指可用于描述或分类其他数据,或者将数据与组织外部的信息联系起来的任何数据(Chisholm,2001)
34. Reference data is used to A:categories or classify data B:dedupe customer records C:populate fact tables in a data mart. D:enforce enterprise security standards E:describe backup strategies 正确答案:A 你的答案:A 解析:10.1.3. 如前所述,参考数据是指可用于描述或分类其他数据,或者将数据与组织外部的信息联系起来的任何数据(Chisholm,2001)。
Reference Data Management entails control and maintenance of defined domain values, definitions, and the relationships within and across domain values. The goal of Reference Data Management is to ensure values are consistent and current across different functions and that the data is accessible to the organization.
Reference Data Structure 参考数据结构
1. Lists 列表
The simplest form of Reference Data pairs a code value with a description in a list
a highly detailed list will likely cause data quality issues and adoption challenges. Similarly, a list of values that is too generic would prevent knowledge workers from capturing sufficient level of detail.
2. Cross-Reference Lists 交叉参考数据列表
Different applications may use different code sets to represent the same concept. These code sets may be at different granularities or the same granularity with different values. Cross-reference data sets translate between codes values.
3. Taxonomies 分类法
Taxonomic Reference Data structures capture information at different levels of specificity.
Taxonomies enable content classification and multi-faceted navigation 多方位导航 to support Business Intelligence. Taxonomic Reference Data can be stored in a recursive relationship 递归关系存储. Taxonomy management tools also maintain hierarchical information 层次信息.
4. Ontologies 本体
Ontologies can also be understood as a form of Metadata. Ontologies and other complex taxonomies need to be managed in ways similar to how Reference Data is managed. Values need to be complete, current, and clearly defined.
Proprietary or Internal Reference Data 专有或内部参考数据
Many organizations create Reference Data to support internal processes and applications. Often this proprietary reference data often grows organically over time. Part of RDM includes managing these data sets and, ideally, creating consistency between them, where this consistency serves the organization.
In helping manage internal Reference Data sets, Data Stewards must balance between the need to have common words for the same information and the need for flexibility where processes differ from one another.
Industry Reference Data 行业参考数据
Industry Reference Data is a broad term to describe data sets that are created and maintained by industry associations or government bodies, rather than by individual organizations, in order to provide a common standard for codifying important concepts.
Geographic or Geo-statistical Data 地理或地理统计参考数据
Geographic or geo-statistical reference enables classification or analysis based on geography.
Computational Reference Data 计算参考数据
Many business activities rely on access to common, consistent calculations.
Computational Reference Data differs from other types because of the frequency with which it changes.
Many organizations purchase this kind of data from third parties who ensure that it is complete and accurate.
Attempting to maintain this data internally is likely to be fraught with latency issues.
Standard Reference Data Set Metadata 标准参考数据集的元数据
Reference Data, like other data, can change over time. Given its prevalence within any organization, it is important to maintain key Metadata about Reference Data sets to ensure their lineage and currency are understood and maintained.
1.4.3. Master Data 主数据
Master Data is data about the business entities (e.g., employees, customers, products, financial structures, assets, and locations) that provide context for business transactions and analysis. An entity is a real world object (person, organization, place, or thing). Entities are represented by entity instances, in the form of data / records.
3. Master data management is a set of processes that defines and manages A:non-transactional data entities B:raw data C:business rules D:transactional data entities E:All 正确答案:A 你的答案:D 解析:10.1.3:AD冲突,主数据是有关业务实体(如雇员、客户、产品、金融结构、资产和位置等)的数据,这些实体为业务交易和分析提供了语境信息。实体是客观世界的对象(人、组织、地方或事物等)。实体被实体、实例以数据/记录的方式表示。
17. Plant Equipment 工厂设备 is an example of A:Transaction Data B:Reference data C:Master Data D:Inverted data 反转数据 E:None of these 正确答案:C 你的答案:C 解析:10.1.3:主数据是有关业务实体(如雇员、客户、产品、金融结构、资产和位置等)的数据,这些实体为业务交易和分析提供了语境信息。实体是客观世界的对象(人、组织、地方或事物等)。实体被实体、实例以数据/记录的方式表示。
23. Master Data Management: A:Ensures coded values are always used B:Is synonymous with Reference Data Management. C:Controls the definition of business entities D:Allows applications to define business entities as needed and manages the mappings between common data in a central location E:Is time-consuming with questionable impact on data quality 正确答案:C 你的答案:D 解析:10.1.3:3.主数据主数据是有关业务实体(如雇员、客户、产品、金融结构、资产和位置等)的数据,这些实体为业务交易和分析提供了语境信息。实体是客观世界的对象(人、组织、地方或事物等)。实体被实体、实例以数据/记录的方式表示。主数据应该代表与关键业务实体有关的权威的、最准确的数据。在管理良好的情况下,主数据值是可信的,可以放心使用。
35. Master data is data about A:business transactions, eg financial transactions, enquires and service call interactions B:authority to grant access to data across an organization C:business entities, eg : products, customers, assets and locations D:database structures and response time performance targets. E:data values stored and displayed in uppercase 正确答案:C 你的答案:C 解析:10.1.3:主数据是有关业务实体(如雇员、客户、产品、金融结构、资产和位置等)的数据,这些实体为业务交易和分析提供了语境信息。实体是客观世界的对象(人、组织、地方或事物等)。实体被实体、实例以数据/记录的方式表示。
Master Data should represent the authoritative, most accurate data available about key business entities. When managed well, Master Data values are trusted and can be used with confidence.
Business rules typically dictate the format and allowable ranges of Master Data values. Common organizational Master Data includes data about:
1. Parties 参与方
made up of individuals and organizations, and their roles, such as customers, citizens, patients,vendors, suppliers, agents, business partners, competitors, employees, or students
2. Products and Services
both internal and external
3. Financial structures 财务体系
such as contracts, general ledger accounts, cost centers, or profit centers
4. Locations 位置信息
such as addresses and GPS coordinates
System of Record, System of Reference 记录系统,参考系统
A System of Record is an authoritative system where data is created/captured, and/or maintained through a defined set of rules and expectations
ERP system may be the System of Record for sell-to customers
14. According to the DMBOK the system that contains the best version of the Master Data is the A:Golden record B:Consuming system C:System of record D:Spoke E:Source system 正确答案:C 你的答案:A 解析:10.1.2:6)权限。主数据值应仅从记录系统(System of Record)中复制。为了实现跨组织的主数据共享,可能需要建立一个参考数据管理系统(System of Reference)。记录系统(System of Record)是一个权威的系统,它通过使用一套定义好的规则和预期(如ERP系统可以是记录销售客户的记录系统)来创建、获取并维护数据。
A System of Reference is an authoritative system where data consumers can obtain reliable data to support transactions and analysis, even if the information did not originate in the system of reference.
MDM applications, Data Sharing Hubs, and Data Warehouses
7. Data governance and master data management make the definition of ___ one of the top priorities. A:communication plans B:risk mitigation C:authoritative sources 权威来源 D:policies and procedures E:risk plans 正确答案:C 你的答案:C 解析:10.1.3:(2)可信来源,黄金记录基于自动规则和数据内容的手动管理的结合,可信来源(Trusted Source)被认为是“事实的最佳版本”。
25. By comparing the system of record and systems of reference to each other,it is possible to A:validate the consistency of the master data B:update the core reference values C:validate the accuracy of the master data D:construct time variant sequences E:validate the completeness of the master data 正确答案:A 你的答案:E 解析:10.1.3.:(1)记录系统,参考系统当可能有不同版本的“事实”存在时,就有必要对它们加以区分。为了做到这一点,必须知道数据是从哪里来的,或者在哪里被访问的,以及准备这些数据的具体用途和目的。记录系统(System of Record)是一个权威的系统,它通过使用一套定义好的规则和预期(如ERP系统可以是记录销售客户的记录系统)来创建、获取并维护数据。参考系统(System of Reference)也是一个权威系统,数据消费者可以从参考系统中获得可菲的数据来支持交易和分析,即使这些信息并非起源于参考系统,主数据管理应用(MDM)、数据共享中心(Data Sharing Hubs,DSH)和数据仓库(DW)通常会被用作参考系统。
Trusted Source, Golden Record 可信来源,黄金记录
A Trusted Source可信来源 is recognized as the ‘best version of the truth 事实最佳版本’ based on a combination of automated rules and manual stewardship of data content.
A trusted source may also be referred to as a Single View, 360° View
Within a trusted source, records that represent the most accurate data about entity instances can be referred to as Golden Records.
Tech Target defines a Golden Record as “the ‘single version of the truth’, where ‘truth’ is understood to mean the reference to which data users can turn when they want to ensure that they have the correct version of a piece of information. The golden record encompasses all the data in every system of record (SOR 每个记录系统) within a particular organization.”
However, the two parts of this definition bring the concept into question, as data in different systems may not align into ‘a single version of the truth
1. DBMS functions may include all of the following EXCEPT A:meta-data repository. B:committing/aborting database changes C:data storage,retrieval,and update D:administering 'golden' data E:all 正确答案:D 你的答案:D 解析:10.1.3:选项D太窄。在可信来源中,表示一个实体、实例的最准确数据的记录可以被称为黄金记录(Golden Record)。
This is why some prefer the term Trusted Source to refer to the “best version we have” of the Master Data.
The Trusted Source provides multiple perspectives of business entities as identified and defined by Data Stewards.
Master Data Management 主数据管理
Master Data Management entails control over Master Data values and identifiers that enable consistent use, across systems, of the most accurate and timely data about essential business entities. The goals include ensuring availability of accurate, current values while reducing the risk of ambiguous identifiers
Gartner defines Master Data Management as “a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise’s official shared Master Data assets. Master Data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies, and chart of accounts.”
Gartner’s definition stresses that MDM is a discipline, made up of people, processes, and technology. It is not a specific application solution. Unfortunately, the acronym MDM (Master Data Management) is often used to refer to systems or products used to manage Master Data.61 MDM applications can facilitate the methods, and sometimes quite effectively, but using an MDM application does not guarantee that Master Data is being managed to meet the organizational needs.
Assessing an organization’s MDM requirements includes identifying:
1. Which roles, organizations, places, and things are referenced repeatedly
2. What data is used to describe people, organizations, places, and things
3. How the data is defined and structured, including the granularity of the data
4. Where the data is created/sourced, stored, made available, and accessed
5. How the data changes as it moves through systems within the organization
6. Who uses the data and for what purposes
7. What criteria are used to understand the quality and reliability of the data and its sources
Master Data Management is challenging. It illustrates a fundamental challenge with data: People choose different ways to represent similar concepts and reconciliation between these representations is not always straightforward; as importantly, information changes over time and systematically accounting for these changes takes planning, data knowledge, and technical skills. In short, it takes work.
Because of this complexity, it is best to approach Master Data Management one data domain at a time. Start small, with a handful of attributes, and build out over time.
Planning for Master Data Management includes several basic steps. Within a domain:
Identify candidate sources that will provide a comprehensive view of the Master Data entities
Develop rules for accurately matching and merging entity instances
Establish an approach to identify and restore inappropriately matched and merged data
Establish an approach to distribute trusted data to systems across the enterprise
Executing the process, though, is not as simple as these steps imply, as MDM is a lifecycle management process. Activities critical to the lifecycle include:
1. Establishing the context of Master Data entities, including definitions of associated attributes and theconditions of their use. This process requires governance.
2. Identifying multiple instances of the same entity represented within and across data sources; buildingand maintaining identifiers and cross-references to enable information integration.
3. Reconciling and consolidating data across sources to provide a master record or the best version of thetruth. Consolidated records provide a merged view of information across systems and seek to addressattribute naming and data value inconsistencies.
4. Identifying improperly matched or merged instances and ensuring they are resolved and correctlyassociated with identifiers.
5. Provisioning of access to trusted data across applications, either through direct reads, data services, orby replication feeds to transactional, warehousing or analytical data stores.
6. Enforcing the use of Master Data values within the organization. This process also requires governanceand change management to assure a shared enterprise perspective.
Master Data Management Key Processing Steps 主数据管理的关键步骤
1. Key processing steps for MDM include data model management; data acquisition; data validation, standardization, and enrichment; entity resolution; and stewardship and sharing.

2. Data Model Management 数据模型管理
Master Data work brings to light the importance of clear and consistent logical data definitions. The model should help the organization overcome ‘system speak’.
Terms and definitions used within a source system may make sense within the confines of that system but they do not always make sense at an enterprise level.
For attributes that make up Master Data, the granularity 粒度 of the definition and associated data values must also make sense across the organization.
3. Data Acquisition 数据采集
Planning for, evaluating, and incorporating new data sources into the Master Data Management solution must be a reliable, repeatable process.
Data acquisition activities involve:
1. Receiving and responding to new data source acquisition requests
2. Performing rapid, ad-hoc, match and high-level data quality assessments using data cleansing and dataprofiling tools
3. Assessing and communicating complexity of data integration to the requesters to help them with theircost-benefit analysis
4. Piloting acquisition of data and its impact on match rules
5. Finalizing data quality metrics for the new data source
6. Determining who will be responsible for monitoring and maintaining the quality of a new source’s data
7. Completing integration into the overall data management environment
4. Data Validation, Standardization, and Enrichment 数据验证、标准化和数据丰富
To enable entity resolution 实体解析, data must be made as consistent as possible. This entails, at a minimum, reducing variation in format 格式变化 and reconciling values 数值调整. Consistent input data reduces the chance or errors in associating records 关联记录.
Preparation processes include:
Validation: 验证
Identifying data prove-ably erroneous or likely incorrect or defaulted (for example,removal of clearly fake email addresses)
Standardization: 标准化
Ensuring data content conforms to standard Reference Data values (e.g., countrycodes), formats (e.g., telephone numbers) or fields (e.g., addresses)
Enrichment: 数据丰富
Adding attributes that can improve entity resolution services (e.g., Dunn and Bradstreet DUNS Number and Ultimate DUNS Number for relating company records, Acxiom or Experian Consumer IDs for individual records)
5. Entity Resolution and Identifier Management 实体解析和标识符管理
Entity resolution is the process of determining whether two references to real world objects refer to the same object or to different objects (Talburt, 2011).
Entity resolution is a decision-making process. Models for executing the process differ based on the approach they take to determining similarity between two references. While resolution always takes place between pairs of references, the process can be systematically extended to include large data sets. Entity resolution is critical to MDM, as the process of matching and merging records enables the construction of the Master Data set.
Entity resolution includes a set of activities (reference extraction 实例提取, reference preparation 实例准备, reference resolution 实例解析, identity management 身份管理, relationship analysis 关系分析) that enable the identity of entity instances and the relationship between entity instances, to be managed over time.
Within the process of reference resolution, two references may be identified as representing the same entity, through the process of determining equivalency. These references can then be linked through a value (a global identifier 全局标识符) that indicates that they are equivalent 等价的 (Talburt, 2011).
Matching 匹配
Matching, or candidate identification, is the process of identifying how different records may relate to a single entity.
29. The process of identifying how different records may relate to a single entity is called A:mangling B:munging C:mirroring D:matching E:meshing 正确答案:D 你的答案:D 解析:10.1.3:匹配(Matching)。匹配或候选识别是识别不同记录如何与单个实体相关联的过程。
The risks with this process are:
False positives 假阳性
Two references that do not represent the same entity are linked with a single identifier.This results in one identifier that refers to more than one real-world entity instance.
False negatives: 假阴性
Two references represent the same entity but they are not linked with a singleidentifier. This results in multiple identifiers that refer to the same real-world entity when each instanceis expected to have one-and-only-one identifier.
Both situations are addressed through a process called similarity analysis or matching, in which the degree of similarity between any two records is scored, often based on weighted approximate matching between corresponding attribute values.
Deterministic algorithms 确定式算法
like parsing and standardization, rely on defined patterns and rules forassigning weights and scores for determining similarity. Deterministic algorithms are predictable inthat the patterns matched and the rules applied will always yield the same results. This type ofmatching works out-of-the-box with relatively good performance, but it is only as good as thesituations anticipated by the people who developed the rules.
Probabilistic algorithms 或然式算法
rely on statistical techniques for assessing the probability that any pair ofrecords represents the same entity. This relies on the ability to take data samples for training purposesby looking at the expected results for a subset of the records and tuning the matcher to self-adjustbased on statistical analysis. These matchers are not reliant on rules, so the results may benondeterministic. However, because the probabilities can be refined based on experience, probabilisticmatchers are able to improve their matching precision as more data is analyzed.
Identity Resolution 身份解析
Some matches occur with great confidence, based on exact data matches across multiple fields. Other matches are suggested with less confidence due to conflicting values.
Despite the best efforts, match decisions sometimes prove to be incorrect. It is essential to maintain the history of matches so that matches can be undone when discovered to be incorrect. Match rate metrics enable organizations to monitor the impact and effectiveness of their matching inference rules. Reprocessing of match rules can help identify better match candidates as new information is received by the entity resolution process.
Matching Workflows / Reconciliation Types 匹配流程/协调类型
Match rules for different scenarios require different workflows
Duplicate identification match rules 重复标识匹配原则
focus on a specific set of data elements that uniquely identify anentity and identify merge opportunities without taking automatic action. Business Data Stewards canreview these occurrences and decide to take action on a case-by-case basis.
Match-link rules 匹配链接规则
identify and cross-reference records that appear to relate to a master record withoutupdating the content of the cross-referenced record. Match-link rules are easier to implement and mucheasier to reverse.
Match-link is a simpler operation, as it acts on the cross-reference registry and not the individual attributes of the merged Master Data record, even though it may be more difficult to present comprehensive information from multiple records.
Match-merge rules 匹配合并规则
match records and merge the data from these records into a single, unified,reconciled, and comprehensive record. If the rules apply across data sources, create a single, unique,and comprehensive record in each data store. Minimally, use trusted data from one data store tosupplement data in other data stores, replacing missing values or values thought to be inaccurate.
Match-merge rules are complex, and seek to provide the unified, reconciled version of information across multiple records and data sources.
The challenges with match-merge rules include the operational complexity of reconciling the data and the cost of reversing the operation if there is a false merge.
16. which one of the following statements is true? A:Reference Data Management involves identifying the "best" or "golden" record for each domain B:Managing reference data requires the same activities and techniques as does manage ng master data C:Master Data Management requires techniques for splitting or merging an instance of a business entity D:Master Data Management involves identifying and maintaining approved coded values E:Business data stewards maintain lists of valid data values for master data instances 正确答案:C 你的答案:B 解析:10.1.3.匹配合并规则所面临的挑战包括:整合数据的操作复杂性,还原错误合并的操作成本。
Periodically re-evaluate match-merge and match-link rules because confidence levels change over time.
Master Data ID Management 主数据ID管理
Data involves managing identifiers. There are two types of identifiers that need to be managed across data sources in an MDM environment: Global IDs and Cross-Reference (x-Ref) information.
A Global ID 全局标识符 is the MDM solution-assigned and -maintained unique identifier attached to reconciled records. Its purpose is to uniquely identify the entity instance.
Global IDs should be generated by only one authorized solution, regardless of which technology is performing Master Data integration activities, to avoid any risk of duplicate values. Global IDs can be numbers or GUIDs (Global Unique Identifiers 全局唯一标识符), as long as uniqueness can be maintained.
The key complexity that needs to be handled for Global ID generation is to how to maintain the right global ID (to perform appropriate downstream data updates) due to an unmerge-remerge.
X-Ref Management 交叉引用管理 is management of the relationship between source IDs and the Global ID. X-Ref management should include capabilities to maintain history of such mappings to support match rate metrics, and to expose lookup services to enable data integration.
Affiliation Management 从属关系管理
Affiliation Management is establishing and maintaining relationships between Master Data records of entities that have real-world relationships.
Data architecture design of an MDM solution must resolve whether to leverage parent-child relationships, affiliation relationships, or both for a given entity.
Affiliation relationships 隶属关系
provide the greatest flexibility through programming logic. The relationshipstype can be used to expose such data in a parent-child hierarchy. Many downstream solutions, such asreporting or account navigation tools would want to see a hierarchical view of the information.
Parent-Child relationships 父子关系
require less programming logic as the navigation structure is implied.However, if the relationship changes and there isn’t an available affiliation structure, this mayinfluence the quality of the data and Business Intelligence dimensions.
6. Data Sharing and Stewardship 数据共享与管理责任
Although much of the work of Master Data Management can be automated through tools that enable processing of large numbers of records, it still requires stewardship to resolve situations where data is incorrectly matched. Ideally, lessons learned from the stewardship process can be used to improve matching algorithms and reduce instances of manual work
Party Master Data 参与方主数据
Party Master Data includes data about individuals, organizations, and the roles they play in business relationships.
38. A kind of Master Data includes data about individuals, organizations and the roles they play in business relationships. This term is called A:product B:Financial C:Party 参与方 D:Location E:Industry 正确答案:C 你的答案:C 解析:一般组织的主数据包括下列事物的数据:1)参与方。个人和组织,以及他们扮演的角色,如客户、公民、病人、厂商、供应商、代理商、商业伙伴、竞争者、雇员或学生
Customer Relationship Management (CRM) systems manage Master Data about customers. The goal of CRM is to provide complete and accurate information about each and every customer.
An essential aspect of CRM is identifying duplicate, redundant, or conflicting data from different systems and determining whether the data represents one or more than one customer.
managing business party Master Data poses unique challenges:
1. The complexity of roles and relationships played by individuals and organizations
2. Difficulties in unique identification
3. The number of data sources and the differences between them
4. The multiple mobile and social communications channels
5. The importance of the data
6. The expectations of how customers want to be engaged
Master Data is particularly challenging for parties playing multiple roles across an organization (e.g., an employee who is also a customer) and utilizing differing points of contact or engagement methods (e.g., interaction via mobile device application that is tied to a social media site).
Financial Master Data 财务主数据
Financial Master Data includes data about business units, cost centers, profit centers, general ledger accounts, budgets, projections, and projects. Typically, an Enterprise Resource Planning (ERP) system serves as the central hub for financial Master Data (chart of accounts), with project details and transactions created and maintained in one or more spoke applications.
Financial Master Data solutions not only create, maintain, and share information; many can also simulate how changes to existing financial data may affect the organization’s bottom line. Financial Master Data simulations are often part of Business Intelligence reporting, analysis, and planning modules, as well as more straightforward budgeting and projecting.
Legal Master Data 法律主数据
Legal Master Data includes data about contracts, regulations, and other legal matters. Legal Master Data allows analysis of contracts for different entities providing the same products or services, to enable better negotiation or to combine contracts into Master Agreements
41. A kind of Master data includes data about contracts regulations and other legal matters this term is called A:Financial B:Product C:Party D:Legal E:Industry 正确答案:D 你的答案:D 题解:(7)法律主数据法律主数据(Legal Master Data)包括关于合同、法规和其他法律事务的数据.法律主数据允许对提供相同产品或服务的不同实体的合同进行分析,以便更好地协商谈判,或将这些合同合并到主协议中。
Product Master Data 产品主数据
Product Master Data can focus on an organization’s internal products and services or on industry-wide (including competitor) products and services. Different types of product Master Data solutions support different business functions.
1. Product Lifecycle Management (PLM) 产品生命周期管理
focuses on managing the lifecycle of a product or service from conception, through development, manufacturing, sale / delivery, service, and disposal.
2. Product Data Management (PDM) 产品数据管理
supports engineering and manufacturing functions by capturingand enabling secure sharing of product information such as design documents (e.g., CAD drawings),recipes (manufacturing instructions), standard operating procedures, and bills of materials.
3. Product data in Enterprise Resource Planning (ERP) systems 企业资源规划系统
focuses on SKUs to support order entry down to inventory level, where individual units can be identified through a variety of techniques.
4. Product data in Manufacturing Execution Systems (MES) 制造执行系统
focus on raw inventory, semi-finished goods, and finished goods, where finished goods tie to products that can be stored and ordered through the ERP system. This data is also important across the supply chain and logistics systems.
5. Product data in a Customer Relationship Management (CRM) system 客户关系管理
that supports marketing, sales, and support interactions can include product family and brands, sales rep association, and customer territory management, as well as marketing campaigns.
6. 39. A kind of can focus on an organizations internal products and services or on industrywide (including competitor products and services This term is called A:Financial B:Product C:Party D:Location E:Industry 正确答案:B 你的答案:B 解析:一般组织的主数据包括下列事物的数据:2)产品和服务,包括内部和外部的产品及服务。
Many product masters closely tie to Reference Data Management systems.
Location Master Data 位置主数据
Location Master Data provides the ability to track and share geographic information and to create hierarchical relationships or territories based on geographic information. The distinction between reference and Master Data blurs for location data. Here is the difference:
Location Reference Data 位置参考数据
typically includes geopolitical data, such as countries, states or provinces,counties, cities or towns, postal codes, and geographic positioning coordinates, such as latitude,longitude, and altitude. This data rarely changes, and changes are handled by external organizations.
Location Master Data 位置主数据
includes business party addresses and business party location, as well asfacility addresses for locations owned by the organization. As organizations grow or contract, these addresses change more frequently than other Location Reference Data.
40. A kind of Master data provides the ability to track and share geographic information and to create hierarchical relationships or territories based on geographic information. This term is called A:Financial B:Product C:Party D:Location E:Industry 正确答案:D 你的答案:D 解析:一般组织的主数据包括下列事物的数据:4)位置信息。如地址和GPS坐标。
Industry Master Data – Reference Directories 行业主数据-参考目录
Reference Directories are authoritative listings of Master Data entities (companies, people, products, etc.) that organizations can purchase and use as the basis of their transactions. While reference directories are created by external organizations, a managed and reconciled version of the information is maintained in the organization’s own systems.
Reference directories enable Master Data use by:
Providing a starting point for matching and linking new records.
Providing additional data elements that may not be as easily available at the time of record creation
As an organization’s records match and reconcile with the reference directories, the trusted record will deviate from the reference directory with traceability to other source records, contributing attributes, and transformation rules.
1.4.4. Data Sharing Architecture 数据共享架构
There are several basic architectural approaches to reference and Master Data integration. Each Master Data subject area will likely have its own system of record.
The data sharing hub architecture model shown in Figure 77 represents a hub-and-spoke architecture for Master Data. The Master Data hub can handle interactions with spoke items such as source systems, business applications, and data stores while minimizing the number of integration points. A local data hub can extend and scale the Master Data hub.

Each of the three basic approaches to implementing a Master Data hub environment has pros and cons:
Registry 注册表
is an index that points to Master Data in the various systems of record. The systems ofrecord manage Master Data local to their applications. Access to Master Data comes from the master index.
A registry is relatively easy to implement because it requires few changes in the systems of record.
However, it is costly to remove the functionality to update Master Data from existing systems of record. Business rules are implemented in a single system: the Hub.
Transaction Hub 交易中心
applications interface with the hub to access and update Master Data. The Master Data exists within the Transaction Hub and not within any other applications. The Transaction Hub is the system of record for Master Data.
Transaction Hubs enable better governance and provide a consistent source of Master Data.
However, it is costly to remove the functionality to update Master Data from existing systems of record. Business rules are implemented in a single system: the Hub.
A Consolidated approach 混合模式
is a hybrid of Registry and Transaction Hub. The systems of record manage Master Data local to their applications. Master Data is consolidated within a common repository andmade available from a data-sharing hub, the system of reference for Master Data. This eliminates the need to access directly from the systems of record.
The Consolidated approach provides an enterpriseview with limited impact on systems of record.
However, it entails replication of data and there will belatency between the hub and the systems of record.
2. Activities
2.1. MDM Activities
2.1.1. Define MDM Drivers and Requirements 识别驱动因素和需求
Drivers 驱动 often include opportunities to improve customer service and/or operational efficiency, as well as to reduce risks related to privacy and compliance.
Obstacles 障碍 include differences in data meaning and structure between systems.
It is relatively easy to define requirements for Master Data within an application. It is more difficult to define standard requirements across applications.
Start with the simplest category in order to learn from the process.
2.1.2. Evaluate and Assess Data Sources
One outcome from an MDM effort can be improvements in Metadata generated through the effort to assess the quality of existing data.
One goal of assessment is to understand how complete data is with respect to the attributes that comprise Master Data. This process includes clarifying the definitions and granularity of those attributes.
Semantic issues will arise at some point when defining and describing attributes. The Data Stewards will need to collaborate with the business areas on reconciliation and agreement on attribute naming and enterprise level definitions.
10. which of the following would most likely be included in a data steward's role description? A:Document the origination and source of authority B:Develop and approve technical data standards C:Maintain the master data management version controls. D:Ensure clear unambiguous 明确 data element definitions E:All 正确答案:D 你的答案:D 解析:10.2.1.在定义和描述属性时,有时会遇到语义问题,数据管理员需要与业务人员协作,并就属性命名和企业级定义达成一致
The other part of assessing sources is to understand the quality of the data.
33. One of the first steps in a master data management program is to A:review data security protocols B:decommission similar data collection system C:secure funding for 20 years of operations D:build multiple data marts E:evaluate and assess data sources. 正确答案:E 你的答案:E 解析:然而,执行这个过程并不像上述步骤所描述的那样简单,主数据管理是一个全生命周期的管理过程。全生命周期中的关键活动包括:1)建立主数据实体的上下文,包括相关属性的定义及其使用条件,并加以治理。2)识别出在单个数据源内以及多个数据源中代表同一实体的多个实例;构建并维护标识符和交叉引用,以支持信息整合。
2.1.3. Define Architectural Approach 定义架构方法
The architectural approach to MDM depends on business strategy, the platforms of existing data sources, and the data itself, particularly its lineage and volatility, and the implications of high or low latency.
Architecture must account for data consumption and sharing models. Tooling for maintenance depends on both business requirements and architecture options.
Small organizations may effectively utilize a transaction hub
a global organization with multiple systems is more likely to utilize a registry.
An organization with ‘siloed’ business units and various source systems may decide that a consolidated approach
The data sharing hub architecture is particularly useful when there is no clear system of record for Master Data.
The data-sharing hub becomes the source of Master Data content for data warehouses or marts, reducing the complexity of extracts and the processing time for data transformation,
2.1.4. Model Master Data
Master Data Management is a data integration process. To achieve consistent results and to manage the integration of new sources as an organization expands, it is necessary to model the data within subject areas. A logical or canonical model can be defined over the subject areas within the data-sharing hub. This would allow establishment of enterprise level definitions of subject area entities and attributes.
2.1.5. Define Stewardship and Maintenance Processes
Technical solutions can do remarkable work matching, merging, and managing identifiers for master records. However, the process also requires stewardship, not only to address records that fall out of the process, but also to remediate and improve the processes that cause them to fall out in the first place.
2.1.6. Establish Governance Policies to Enforce Use of Master Data
The real benefits (operational efficiency, higher quality, better customer service) come once people and systems start using the Master Data.
The overall effort has to include a roadmap 路线图 for systems to adopt master values and identifiers as input to processes. Establish unidirectional closed loops between systems to maintain consistency of values across systems.
2.2. Reference Data Activities
2.2.1. Define Drivers and Requirements
The primary drivers for Reference Data Management are operational efficiency and higher data quality.
The most important Reference Data sets should drive requirements for a Reference Data Management system. Once such a system is in place, new Reference Data sets can be set up as part of projects. Existing Reference Data sets should be maintained based on a published schedule.
2.2.2. Assess Data Sources
Most industry standard Reference Data sets can be obtained from the organizations that create and maintain them.
value-added features
the delivery of updates on a set schedule
2.2.3. Define Architectural Approach
Before purchasing or building a tool to manage Reference Data, it is critical to account for requirements and for the challenges posed by the Reference Data to be managed. For example, the volatility of data, the frequency of updates, and the consumption models
In cases where Reference Data drives programming logic, the potential impact of changes should be assessed and accounted for before the changes are introduced.
2.2.4. Model Reference Data Sets
Models help data consumers understand the relationships within the Reference Data set and they can be used to establish data quality rules.
2.2.5. Define Stewardship and Maintenance Processes
Reference Data requires stewardship to ensure that values are complete and current and that definitions are clear and understandable
Many Reference Data Management tools include workflows to manage review and approval of changes to Reference Data. These workflows themselves depend on identifying who within an organization is responsible for Reference Data content.
2.2.6. Establish Reference Data Governance Policies
It is important to have policies in place that govern the quality and mandate the use of Reference Data from that repository, whether directly through publication from that repository or indirectly from a system of reference that is populated with data from the central repository.
3. Tools and Techniques
3.1. MDM requires tooling specifically designed to enable identity management. 实现标识管理的工具
3.1.1. data integration tools 整合工具
36. When assessing tools to implement master data management solutions,functionality must include A:document and content management. B:advanced analytics capabilities C:auto-normalization features D:backup and recovery utilities. E:sophisticated integration capability. 正确答案:E 你的答案:E 题解:10.4实施指南主数据和参考数据管理是数据整合的一种方式。用于数据集成和互操作领域的实施原则,也可以应用到主数据和参考数据管理中(参见第8章)。
3.1.2. data remediation tools 修复工具
3.1.3. operational data stores (ODS) 操作性数据存储
3.1.4. data sharing hubs (DSH) 数据共享中心
3.1.5. specialized MDM applications 专门的应用
4. Implementation Guidelines
4.1. Master and Reference Data Management are forms of data integration. The implementation principles that apply to data integration and interoperability apply to MDM and RDM.
4.2. MDM and RDM capabilities cannot be implemented overnight. Solutions require specialized business and technical knowledge. Organizations should expect to implement Reference and Master Data solutions incrementally through a series of projects defined in an implementation roadmap, prioritized based on business needs and guided by an overall architecture.
4.3. Note that MDM programs will fail without proper governance. Data governance professionals must understand the challenges of MDM and RDM and assess the organization’s maturity and ability to meet them.
4.4. Adhere to Master Data Architecture 遵循主数据架构
4.4.1. Establishing and following proper reference architecture is critical to managing and sharing Master Data across an organization. The integration approach should take into account the organizational structure of the business, the number of distinct systems of record, the data governance implementation, the importance of access and latency of data values, and the number of consuming systems and applications.
4.5. Monitor Data Movement 检测数据流动
4.5.1. Data integration processes for Master and Reference Data should be designed to ensure timely extraction and distribution of data across the organization.
4.5.2. data flow should be monitored in order to:
1. Show how data is shared and used across the organization
2. Identify data lineage from / to administrative systems and applications
3. Assist root cause analysis of issues
4. Show effectiveness of data ingestion and consumption integration techniques
5. Denote latency of data values from source systems through consumption
6. Determine validity of business rules and transformations executed within integration components
4.6. Manage Reference Data Change
4.6.1. Since Reference Data is a shared resource, it cannot be changed arbitrarily. 任意的
The key to successful Reference Data Management is organizational willingness to relinquish local control of shared data.
To sustain this support, provide channels to receive and respond to requests for changes to Reference Data.
4.6.2. Types of changes include
1. Row level changes to external Reference Data sets
2. Structural changes to external Reference Data sets
3. Row level changes to internal Reference Data sets
4. Structural changes to internal Reference Data sets
5. Creation of new Reference Data sets
4.6.3. Changes can be planned / scheduled or ad hoc. Planned changes, such as monthly or annual updates to industry standard codes, require less governance than ad hoc updates.
4.6.4. Change requests should follow a defined process, as illustrated in Figure 78. When requests are received, stakeholders should be notified so that impacts can be assessed. If changes need approval, discussions should be held to get that approval. Changes should be communicated.

4.7. Data Sharing Agreements
4.7.1. To assure proper access and use, establish sharing agreements that stipulate what data can be shared and under what conditions
4.7.2. This effort should be driven by the Data Governance program. It may involve Data Architects, Data Providers, Data Stewards, Application Developers, Business Analysts as well as Compliance / Privacy Officers and Security Officers.
4.7.3. SLA’s and metrics should be established to measure the availability and quality of shared data.
4.8. Organization and Cultural Change
4.8.1. Reference and Master Data Management require people to relinquish 放弃 control of some of their data and processes in order to create shared resources.
4.8.2. Improving the availability and quality of reference and Master Data will undoubtedly require changes 改变 to procedures and traditional practices.
4.8.3. Perhaps the most challenging cultural change is central to governance: Determining which individuals are accountable 责任 for which decisions – business Data Stewards, Architects, Managers, and Executives – and which decisions data stewardship teams, program steering committees, and the Data Governance Council should make collaboratively.
5. Reference and Master Data Governance
5.1. Because they are shared resources, Reference and Master Data require governance and stewardship. Not all data inconsistencies can be resolved through automation. Some require that people talk to each other. Without governance, Reference and Master Data solutions will just be additional data integration utilities, unable to deliver their full potential.
5.1.1. 26. In order to gain endorsement 支持 to extend the use of reference and master data across the enterprise a key supporting DMBOK knowledge area is A:Data Governance B:Document and Content Management C:Data Architecture D:Data Storage and Operations E:Data Security 正确答案:A 你的答案:A 解析:10.4:实施指南主数据和参考数据管理是数据整合的一种方式。用于数据集成和互操作领域的实施原则,也可以应用到主数据和参考数据管理中(参见第8章)。主数据管理和参考数据管理的能力不可能在一夜之间实现,相关解决方案需要专门的业务和技术知识。组织机构应该期望通过在行动路线图中定义的一系列里程碑,基于业务需求进行优“级排序,并遵从总体架构指导,以逐步实现参考数据和主数据解决方案。需要注意的,如果缺乏适的治理,将会导致主数据管理项目失败。数据治理的专业人员必须了解主数据管理和参考数据管理的挑战,并评估组织的成熟度和适应能力(参见第15章)。
5.2. Governance processes will determine
5.2.1. The data sources to be integrated
5.2.2. The data quality rules to be enforced
5.2.3. The conditions of use rules to be followed
5.2.4. The activities to be monitored and the frequency of monitoring
5.2.5. The priority and response levels of data stewardships efforts
5.2.6. How information is to be represented to meet stakeholder needs
5.2.7. Standard approval gates, expectations in RDM and MDM deployment
5.3. Metrics
5.3.1. Data quality and compliance: 数据质量和遵从性
DQ dashboards 数据质量仪表盘 can describe the quality of Reference and Master Data.These metrics should denote the confidence (as a percentage) of a subject area entity or associatedattribute and its fit-for-purpose for use across the organization.
5.3.2. Data change activity: 数据变更活动
Auditing the lineage of trusted data is imperative to improving data quality in adata-sharing environment. Metrics should denote the rate of change of data values. These metrics willprovide insight to the systems supplying data to the sharing environment, and can be used to tunealgorithms in MDM processes.
5.3.3. Data ingestion and consumption: 数据获取和消费
Data is supplied by upstream systems and used by downstream systems and processes. These metrics should denote and track what systems are contributing data andwhat business areas are subscribing data from the sharing environment.
5.3.4. Service Level Agreements: 服务水平协议
SLAs should be established and communicated to contributors and subscribers to ensure usage and adoption of the data-sharing environment. The level of adherence to SLAs can provide insight into both support processes and the technical and data problems that mightslow down the MDM application.
5.3.5. Data Steward coverage: 数据管理专员覆盖率
These metrics should note the name or group responsible for data content,and how often the coverage is evaluated. They can be used to identify gaps in support.
5.3.6. Total Cost of Ownership: 总拥有成本
There are multiple factors of this metric and different ways to represent it.From a solution view, costs can include environment infrastructure, software licenses, support staff,consulting fees, training, etc. Effectiveness of this metric is largely based on its consistent applicationacross the organization.
5.3.7. Data sharing volume and usage: 数据共享量和使用情况
Data ingestion and consumption volumes need to be tracked todetermine the effectiveness of the data-sharing environment. These metrics should denote the volumeand velocity of data defined, ingested, and subscribed to and from the data-sharing environment.
6. Works Cited / Recommended
6.1. 4. Each of the following objects is included in an enterprise-wide model EXCEPT A:business function. B:timeline C:business location D:process E:Domain 正确答案:B 你的答案:E 解析:10.1.3 模型中一般不包含时间线,更多的是说明关系。 能存在不同命名的多个属性,在企业级模型中合并为单一属性,并且其数据值处于适当的语境中。有时在单个数据源中呈现多个属性,其各自的数据值合并组成为企业级模型定义的某一个属性数据值。
6.2. 5. Master Data Management (MDM) creating a single unified view of an organization requires that an organization address all of the following EXCEPT A:duplicate customer records. B:vendor information systems C:broken processes D:internal disagreements. E:All 正确答案:B 你的答案:B 解析:address是解决问题,B并不是问题
6.3. 8. which approach is the best MDM approach to implement in an organization where the data structures are simple without significant links to business objects? A:Virtual MDM B:Semantic MDM C:static MDM D:Federated MDM E:none 正确答案:C 你的答案:C 解析:静态则减少变更和依赖,更符合题干
6.4. 9. Master data management in organizations may be part of all of the following EXCEPT A:an ERP program B:a business process management program C:a vendor's particular solution 特定解决方案 to inconsistent corporate data D:a customer data integration program E:an CRM program 正确答案:C 你的答案:C 解析:C为了数据质量改进
6.5. 30. All the systems in the enterprise,apart from a website, are showing updated pricing information This may be due to A:the slowly changing dimension has both from and to dates. B:the website software not integrating with the reference data repository C:the pricing information is not updated in the reference data repository. D:the reference data strategy has not been approved by the executive E:the website software is not using standard XML schemas 正确答案:B 你的答案:B 解析:暂无解析
6.6. 31. Emergency contact phone number would be found in which master data management program? A:Product B:Asset C:Location D:Employee E:Service 正确答案:D 你的答案:E 解析:暂无解析
Chapter 11: Data Warehousing and Business Intelligence 数据仓库与商务智能

1. Introduction
1.1. Definition
1.1.1. The concept of the Data Warehouse emerged in the 1980s as technology enabled organizations to integrate data from a range of sources into a common data model. Integrated data promised to provide insight into operational processes and open up new possibilities for leveraging data to make decisions and create organizational value.
1.1.2. As importantly, data warehouses were seen as a means to reduce the proliferation of decision support systems (DSS), most of which drew on the same core enterprise data.
1.1.3. concept of an enterprise warehouse promised a way to reduce data redundancy, improve the consistency of information, and enable an enterprise to use its data to make better decisions.
1.1.4. Data warehouses began to be built in earnest in the 1990s. Since then (and especially with the co-evolution of Business Intelligence as a primary driver of business decision-making), data warehouses have become ‘mainstream’. Most enterprises have data warehouses and warehousing is the recognized core of enterprise data management.
1.2. Business Drivers
1.2.1. The primary driver for data warehousing is to support operational functions, compliance requirements, and Business Intelligence (BI) activities
BI has evolved from retrospective 回顾 assessment to predictive 预测 analytics.
1.3. Goals and Principles
1.3.1. Goals
1. Support Business Intelligence activity
2. Enable effective business analysis and decision-making
3. Find ways to innovate based on insights from their data
1.3.2. Principles
1. Focus on business goals: 聚焦业务目标
Make sure DW serves organizational priorities and solves business problems.
2. Start with the end in mind: 以终为始
Let the business priority and scope of end-data-delivery in the BI space drive the creation of the DW content.
3. Think and design globally 全局的思考与设计; act and build locally: 局部的行动与建设
Let end-vision guide the architecture, but build and deliver incrementally, through focused projects or sprints that enable more immediate return oninvestment.
4. Summarize and optimize last, not first 总结并持续优化,而非一开始就这样做:
Build on the atomic 原始 data. Aggregate and summarize to meet requirements and ensure performance, not to replace the detail.
5. Promote transparency and self-service: 提升透明度和自助服务
The more context (Metadata of all kinds) provided, the better able data consumers will be to get value out of the data. Keep stakeholders informed about thedata and the processes by which it is integrated.
6. Build Metadata with the warehouse: 与数仓一起建立元数据
Critical to DW success is the ability to explain the data. For example, being able to answer basic questions like “Why is this sum X?” “How was that computed?”and “Where did the data come from?” Metadata should be captured as part of the development cycleand managed as part of ongoing operations.
7. Collaborate: 协同
Collaborate with other data initiatives, especially those for Data Governance, Data Quality, and Metadata.
8. One size does not fit all: 不要千篇一律
Use the right tools and products for each group of data consumers.
1.4. Essential Concepts
1.4.1. Business Intelligence 商务智能
The term Business Intelligence (BI) has two meanings.
First, it refers to a type of data analysis aimed at understanding organizational activities and opportunities.
Results of such analysis are used to improve organizational success. When people say that data holds the key to competitive advantage, they are articulating the promise inherent in Business Intelligence activity: that if an organization asks the right questions of its own data, it can gain insights about its products, services, and customers that enable it to make better decisions about how to fulfill its strategic objectives.
Secondly, Business Intelligence refers to a set of technologies that support this kind of data analysis.
An evolution of decisions support tools, BI tools enable querying, data mining, statistical analysis, reporting, scenario modeling, data visualization, and dashboarding.
11. which of the following provides heuristic and simulation model capabilities to business managers? A:Transaction processing system B:Networked system C:Business intelligence system D:Management information system E:Above all 正确答案:C 你的答案:C 解析:11.1.3:商务智能系统为管理者提供洞察。第二层含义,商务智能指的是支持这类数据分析活动的技术集合。决策支持工具、商务智能工具的不断进化,促成了数据查询、数据挖掘、统计分析、报表分析、场景建模、数据可视化及仪表板等一系列应用,它们被用于从预算到高级分析的方方面面。
12. which of the following is not a good example of BI? A:Decision Support Systems B:Statutory reporting to a Regulatory Body C:supporting Risk Management Decision Reporting D:strategic Analytics for Business Decision E:strategic Analytics for Business Decision 正确答案:B 你的答案:B 解析:暂无解析
1.4.2. Data Warehouse 数据仓库
A Data Warehouse (DW) is a combination of two primary components:
An integrated decision support database and the related software programs used to collect, cleanse, transform, and store data from a variety of operational and external sources.
data warehouse may also include dependent data marts, which are subset copies of data from the warehouse.
In its broadest context, a data warehouse includes any data stores or extracts used to support the delivery of data for BI purposes.
14. in its broadest context the data warehouse includes: A:either an Inmon or Kimball approach B:any data stores or extracts used to support the delivery for BI purposes C:An integrated data store,ETL logic,and extensive data cleansing routines D:data stores and extracts that can be transformed into star schemas E:all the data in the enterprise 正确答案:B 你的答案:C 解析:11.1数据仓库(Data Warehouse,DW)的概念始于20世纪80年代。该技术赋能组织将不同来源的数据整合到公共的数据模型中去,整合后的数据能为业务运营提供洞察,为企业决策支持和创造组织价值开辟新的可能性。
An Enterprise Data Warehouse (EDW) is a centralized data warehouse designed to service the BI needs of the entire organization.
1.4.3. Data Warehousing 数据仓库建设
Data Warehousing describes the operational extract, cleansing, transformation, control, and load processes that maintain the data in a data warehouse.
Data warehousing also includes processes that interact with Metadata repositories.
Traditionally, data warehousing focuses on structured data:
structured data: elements in defined fields, whether in files or tables, as documented in data models.
the BI and DW space now embraces semi-structured and unstructured data.
Semi-structured data, defined as electronic elements organized as semantic entities with no required attribute affinity, predates XML but not HTML
Unstructured data refers to data that is not predefined through a data model.
1.4.4. Approaches to Data Warehousing 数仓建设方法
Inmon 比尔恩门 defines a data warehouse as “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.”
A normalized relational model is used to store and manage data.
21. A key feature of the bill Inmon's approach to data warehousing is A:tight management of data dimensions B:a preference for supporting operational reporting C:its ability to operate on open source platforms D:a normalized relational model to store and manage data E:an exclusive focus on star schemas and cubes 正确答案:D 你的答案:D 解析:11.1.3:5.企业信息工厂(Inmon)Bill lnmon的企业信息工厂(Corporate Information Factory,CIF)是两种主要的数据仓库建设模式之一,Inmon关于数据仓库的组成是这样描述的:面向主题的、整合的、随时间变化的、包含汇总和明细的、稳定的历史数据集合”。
Kimball defines a warehouse as “a copy of transaction data specifically structured for query and analysis.”
Kimball’s approach calls for a dimensional model. NOT A normalized relational
their definitions recognize similar core ideas:
1. Warehouses store data from other systems
2. The act of storage includes organizing the data in ways that increase its value
3. Warehouses make data accessible and usable for analysis
4. Organizations build warehouses because they need to make reliable, integrated data available toauthorized stakeholders
5. Warehouse data serves many purposes, from support of workflow to operational management topredictive analytics
1.4.5. Corporate Information Factory (Inmon) 企业信息工厂
Bill Inmon’s Corporate Information Factory (CIF) describe the differences between warehouses and operational systems
1. Subject-oriented: 面向主题的
The data warehouse is organized based on major business entities, rather than focusing on a functional or application.
2. Integrated: 整合的
Data in the warehouse is unified and cohesive. The same key structures, encoding anddecoding of structures, data definitions, naming conventions are applied consistently throughout thewarehouse. Because data is integrated, Warehouse data is not simply a copy of operational data.Instead, the warehouse becomes a system of record for the data.
3. Time variant: 随时间变化的
The data warehouse stores data as it exists in a set point in time. Records in the DW arelike snapshots. Each one reflects the state of the data at a moment of time. This means that queryingdata based on a specific time period will always produce the same result, regardless of when the queryis submitted.
4. Non-volatile: 稳定的
In the DW, records are not normally updated as they are in operational systems. Instead, new data is appended to existing data. A set of records may represent different states of the sametransaction.
5. Aggregate and detail data: 聚合数据和明细数据
The data in the DW includes details of atomic level transactions, as wellas summarized data. Operational systems rarely aggregate data. When warehouses were firstestablished, cost and space considerations drove the need to summarize data. Summarized data can bepersistent (stored in a table) or non-persistent (rendered in a view) in contemporary DW environments.The deciding factor in whether to persist data is usually performance.
6. Historical: 历史的
The focus of operational systems is current data. Warehouses contain historical data as well. Often they house vast amounts of it.
15. one of the key differences between operational systems and data warehouses is A:operational systems are available 24 x 7; data warehouses are available during business hours B:operational systems focus on historical data; data warehouses contain current data C:operational systems focus on current data; data warehouses contain historical data D:operational systems focus on data quality; data warehouses focus on data security E:operational systems focus on business processes; data warehouses focus on business strategies 正确答案:C 你的答案:C 解析:题解:数据仓库有两个重要组成部分:一个集成的决策支持数据库和与之相关的用于收集、清理、转换和存储来自各种操作和外部源数据的软件程序。为了支持历史的、分析类的和商务智能的需求,数据仓库建设还会包括相依赖的数据集市,数据集市是数据仓库中数据子集的副本。从广义上来说,数据仓库包括为任何支持商务智能目标的实现提供数据的数据存储或提取操作。 6.1.3在实践中,面向行的存储布局非常适合于在线事务处理(OLTP)类的工作负载,此类负载的重点是交互式事务。面向列的存储布局非常适合于在线分析处理(OLAP)类的工作负载。例如,数据仓库通常涉及对所有数据(可能有干兆字节大小)的少量高度复杂的查询。
7. 8. which of the following statements about a data warehouse is NOT true? A:Performance is more important in a data warehouse environment than in an operation al environment B:A user should know where the data comes from C:Programs may have to be written to provide relationships between data in a data warehouse that are NOT obvious in the operational environment D:Some data in a data warehouse may be derived rather than original data E:No answer 正确答案:E 你的答案:B 解析:11.1.3:2.数据仓库数据仓库有两个重要组成部分:一个集成的决策支持数据库和与之相关的用于收集、清理、转换和存储来自各种操作和外部源数据的软件程序。为了支持历史的、分析类的和商务智能的需求,数据仓库建设还会包括相依赖的数据集市,数据集市是数据仓库中数据子集的副本。从广义上来说,数据仓库包括为任何支持商务智能目标的实现提供数据的数据存储或提取操作。企业级数据仓库(EDW)是集中化的数据仓库,为整个组织的商务智能需求服务。EDW的建设遵循企业级数据模型,以确保整个企业内部决策支持活动的一致性。
CIF components include
1. Applications: 应用程序
Applications perform operational processes. Detail data from applications is broughtinto the data warehouse and the operational data stores (ODS) where it can be analyzed.
2. Staging Area: 数据暂存区
A database that stands between the operational source databases and the targetdatabases. The data staging area is where the extract, transform, and load effort takes place. It is notused by end users. Most data in the data staging area is transient, although typically there is somerelatively small amount of persistent data.
3. Integration and transformation: 集成和转换
In the integration layer, data from disparate sources is transformedso that it can be integrated into the standard corporate representation / model in the DW and ODS.
4. Operational Data Store (ODS): 操作型数据存储
An ODS is integrated database of operational data. It may be sourced directly from applications or from other databases. ODS’s generally contain current or near term data(30-90 days), while a DW contains historical data as well (often several years of data). Data in ODS’s is volatile, while warehouse data is stable. Not all organizations use ODS’s. They evolved as to meetthe need for low latency data. An ODS may serve as the primary source for a data warehouse; it may also be used to audit a data warehouse.
2. One implementation for master data that serves as the hub for all reference and master data for all Online Transactional Processing (OLTP) applications is the A:federated database. B:data bus C:Operational Data Store (ODS) D:messaging system E:none 正确答案:C 你的答案:B 解析: 10.3: 主数据管理需要一些专门被设计用于实现标识管理的工具。主数据管理可以通过数据整合工具、数据修复工具、操作型数据存储(ODS)、数据共享中心(DSH)或专门的主数据管理应用来实现。
5. Data marts: 数据集市
Data marts provide data prepared for analysis. This data is often a sub-set of warehouse data designed to support particular kinds of analysis or a specific group of data consumers. For example, marts can aggregate data to support faster analysis. Dimensional modeling (using denormalization techniques) is often used to design user-oriented data marts.
20. A design approach for managing the risk of errors in data marts is A:purge the data in the data marts and reload from the data warehouse B:purge the data in the source system and reload the data warehouse C:purge the data generally and reload from the best system D:purge the data in the data warehouse and reload from the source systems E:purge the data in the data warehouse and copy back to the data mart 正确答案:B 你的答案:A 解析:11.1.3. 题解:5)数据集市。数据集市为后续的数据分析提供数据。这里说的数据通常是数据仓库的子集,用于支持特定分析或特定种类的消费者。例如,数据集市可以聚合数据,以支持更快的分析。多维模型(用反范式的技术)通常针对面向用户类型的数据集市。
6. Operational Data Mart (OpDM): 操作型数据集市
An OpDM is a data mart focused on tactical decision support. It issourced directly from an ODS, rather than from a DW. It shares characteristics of the ODS: it containscurrent or near-term data. Its contents are volatile.
7. Data Warehouse: 数据仓库
The DW provides a single integration point for corporate data to support management decision-making, and strategic analysis and planning. The data flows into a DW from theapplication systems and ODS, and flows out to the data marts, usually in one direction only. Data thatneeds correction is rejected, corrected at its source, and ideally re-fed through the system.
8. Operational reports: 运营报告
Reports are output from the data stores.
9. Reference, Master, and external data: 参考,主,外部数据
In addition to transactional data from applications, the CIF also includes data required to understand transactions, such as reference and Master Data. Access tocommon data simplifies integration in the DW. While applications consume current master andReference Data, the DW also requires historical values and the timeframes during which they werevalid (see Chapter 10).
Figure 80 depicts movement within the CIF,

Movement from left to right
1. The purpose shifts from execution of operational functions to analysis
2. End users of systems move from front line workers to decision-makers
3. System usage moves from fixed operations to ad hoc uses
4. Response time requirements are relaxed (strategic decisions take more time than do daily operations)
5. Much more data is involved in each operation, query, or process
The data in DW and marts differs from that in applications
1. Data is organized by subject rather than function
2. Data is integrated data rather than ‘siloed’
3. Data is time-variant vs. current-valued only
4. Data has higher latency in DW than in applications
5. Significantly more historical data is available in DW than in applications
1.4.6. Dimensional DW (Kimball) 多维数据仓库
Kimball defines a data warehouse simply as “a copy of transaction data specifically structured for query and analysis”
Often referred to as Star Schema 星型模型, dimensional models are comprised:
facts 事实表, which contain quantitative data about business processes (e.g., sales numbers),
dimensions 维度表, which store descriptive attributes related to fact data and allow data consumers to answer questions about the facts (e.g., how many units of product X were sold this quarter?)
5. The star schema is A:a common data model for data warehouse applications B:built on a central dimension table and a set of fact table C:very efficient for transaction processing D:based on dimensions such as "customer" and "product" E:All 正确答案:A 你的答案:A 解析:11.1.3:多维模型通常称为星型模型,由事实表(包含有关业务流程的定量数据,如销售数据)和维度表(存储与事实表数据相关的描述性属性,为数据消费者解答关于事实表的问题,如这个季度产品X卖了多少)组成。
The DW bus 总线 matrix shows the intersection of business processes that generate fact data and data subject areas that represent dimensions.
9. A data warehouse bus matrix represents A:conformed dimensions and their keys B:an ETL schema C:business processes and common facts D:business processes and common dimensions E:All 正确答案:D 你的答案:C 解析:表11-1数据仓库总线矩阵示例
The enterprise DW bus matrix can be used to represent the long-term data content requirements for the DW/BI system, independent of technology. This tool enables an organization to scope manageable development efforts.
Figure 81 represents Kimball’s Data Warehouse Chess Pieces view of DW/BI architecture

1. Operational source systems: 业务源系统
Operational / transactional applications of the Enterprise. These createthe data that is integrated into the ODS and DW. This component is equivalent to the applicationsystems in the CIF diagram.
2. Data staging area: 数据暂存区域
Kimball’s staging includes the set of processes needed to integrate and transformdata for presentation. It can be compared to a combination of CIF’s integration, transformation, andDW components. Kimball’s focus is on efficient end-delivery of the analytical data, a scope smallerthan Inmon’s corporate management of data. Kimball’s enterprise DW can fit into the architecture ofthe data staging area.
3. Data presentation area: 数据展示区域
Similar to the Data Marts in the CIF. The key architectural difference beingan integrating paradigm of a ‘DW Bus,’ such as shared or conformed dimensions unifying the multipledata marts.
4. Data access tools: 数据访问工具
Kimball’s approach focuses on end users’ data requirements. These needs drive theadoption of appropriate data access tools.
1.4.7. DW Architecture Components 数据仓库架构组件
Figure 82 depicts the architectural components of the DW/BI and Big Data Environment discussed in this section.

The evolution of Big Data has changed the DW/BI landscape by adding another path through which data may be brought into an enterprise.
Figure 82 also depicts aspects of the data lifecycle. Data moves from source systems, into a staging area where it may be cleansed and enriched as it is integrated and stored in the DW and/or an ODS.
From the DW it may be accessed via marts or cubes and used for various kinds of reporting. Big Data goes through a similar process, but with a significant difference: while most warehouses integrate data before landing it in tables, Big Data solutions ingest data before integrating it.
Source Systems 源系统
Source Systems, on the left side of Figure 82, include the operational systems and external data to be brought into the DW/BI environment.
These typically include operational systems such as CRM, Accounting, and Human Resources applications, as well as operational systems that differ based on industry. Data from vendors and external sources may also be included, as may DaaS, web content, and any Big Data computation results.
Data Integration 数据集成
Data integration covers Extract, Transform, and Load (ETL), data virtualization, and other techniques of getting data into a common form and location. In a SOA environment, the data services layers are part of this component.
Data Storage Areas 数据存储区域
1. Staging area: 暂存区
A staging area is an intermediate data store between an original data source and thecentralized data repository. Data is staged so that it can be transformed, integrated, and prepped forloading to the warehouse.
2. Reference and Master Data conformed dimensions: 参考数据和主数据一致性维度
Reference and Master Data may be stored in separate repositories. The data warehouse feeds new Master Data and is fed by conformed dimensioncontents from the separate repositories.
3. Central Warehouse: 中央数据仓库
Once transformed and prepped, the DW data usually persists in the central oratomic layer. This layer maintains all historical atomic data as well as the latest instance of the batchrun. The data structure of this area is developed and influenced based on performance needs and usepatterns. Several design elements are brought to bear:
The relationship between the business key and surrogate keys for performance
Creation of indices and foreign keys to support dimensions
Change Data Capture (CDC) techniques that are used to detect, maintain, and store history
4. Operational Data Store (ODS): 操作型数据存储
The ODS is a version of a central persisted store that supports lower latencies, and therefore operational use. Since the ODS contains a time window of data and not thehistory, it can be refreshed much more quickly than a warehouse. Sometimes real-time streams aresnapshotted at predefined intervals into the ODS to enable integrated reporting and analysis.
5. Data marts: 数据集市
A data mart is a type of data store often used to support presentation layers of the datawarehouse environment. It is also used for presenting a departmental or functional sub-set of the DWfor integrated reporting, query, and analysis of historical information. The data mart is oriented to aspecific subject area, a single department, or a single business process. It can also form the basis of avirtualized warehouse where the combined marts comprise the resulting warehouse entity. Dataintegration processes will refresh, update or expand the contents of the various marts from thepersistence layer.
6. Cubes: 数据立方体
Three classic implementation approaches support Online Analytical Processing (OLAP). Their names relate to underlying database types, such as Relational, Multi-dimensional, and Hybrid.
1.4.8. Types of Load Processing 加载处理的方式
Data warehousing involves two main types of data integration processes: historical loads and ongoing updates. Historical data is usually loaded only once, or a few times while working out data issues, and then never again. Ongoing updates are consistently scheduled and executed to keep the data in the warehouse up-to-date.
Historical Data 历史数据
One advantage of a data warehouse is that it can capture detailed history of the data it stores. There are different methods to capture this detail. An organization that wants to capture history should design based on requirements. Being able to reproduce point-in-time snapshots requires a different approach than simply presenting current state.
The Inmon data warehouse suggests that all data is stored in a single data warehouse layer
Being able to reproduce point-in-time snapshots requires a different approach than simply presenting current state.
The Kimball data warehouse suggests that the data warehouse is composed of a combination of departmental data marts containing cleansed, standardized, and governed data.
The data marts will store the history at the atomic level. Conformed dimensions and conformed facts will deliver enterprise level information.
Another approach, the Data Vault, also cleanses and standardizes as part of the staging process. History is stored in a normalized atomic structure, dimensional surrogate, primary and alternate keys are defined. Ensuring that the business and surrogate key relationship remains intact becomes the secondary role of the vault – this is the data mart history.
By retaining the history inside the vault, reloading facts is possible when later increments introduce grain changes. It is possible to virtualize the presentation layer, facilitating agile incremental delivery and collaborative development with the business community. A final materialization process can implement a more traditional star data mart for production end user consumption.
Batch Change Data Capture 批量变更数据捕获
Data Warehouses are often loaded daily and serviced by a nightly batch window. The load process can accommodate a variety of change detection, as each source system may require differing change capture techniques.

Time stamped Delta Load 时间戳增量加载
Changes in the source system are stamped with the system date and time.
Log Table Delta Load 日志表增量加载
Source system changes are captured and stored in log tables
Database Transaction Log 数据库交易日志
Database captures changes in the transaction log
Message Delta 消息增量
Source system changes are published as [near] real-time messages
Full Load 全量加载
No change indicator, tables extracted in full and compared to identify change
Easiest,lowest
Near-real-time and Real-time 准实时和实时数据加载
With the onset of Operational BI (or Operational Analytics) pushing for lower latency and more integration of real-time or near-real-time data into the data warehouse, new architectural approaches emerged to deal with the inclusion of volatile data.
The impact of the changes from new volatile data must be isolated from the bulk of the historical, non-volatile DW data. Typical architectural approaches for isolation include a combination of building partitions and using union queries for the different partitions
Trickle feeds (Source accumulation): 涓流式加载(源端累积)
Rather than run on a nightly schedule, trickle feeds executebatch loads on a more frequent schedule (e.g., hourly, every 5 minutes) or when a threshold is reached(e.g., 300 transactions, 1G of data). This allows some processing to happen during the day, but not asintensely as with a dedicated nightly batch process. Care is needed to ensure that if a trickle feed batchtakes longer to complete than the time between feeds, the next feed is delayed so that the data is stillloaded in proper order.
Messaging (Bus accumulation): 消息传送(总线累积)
Message interaction in real-time or near-real-time is useful whenextremely small packets of data (messages, events, or transactions) are published to a bus as theyoccur. Target systems subscribe to the bus, and incrementally process the packets into the warehouseas needed. Source systems and target systems are independent of each other. Data-as-a-Service (DaaS)frequently uses this method.
Streaming (Target accumulation): 流式传送(目标端累积)
Rather than wait on a source-based schedule or threshold, a targetsystem collects data as it is received into a buffer area or queue, and processes it in order. The resultinteraction or some aggregate may later appear as an additional feed to the warehouse.
2. Activities
2.1. Understand Requirements
2.1.1. Data warehouses bring together data that will be used in a range of different ways. Moreover, usage will evolve over time as users analyze and explore data.
begin with business goals and strategy.
Identify and scope the business areas,
then identify and interview the appropriate business people.
Ask what they do and why.
Capture specific questions they are asking now, and those they want to ask of the data.
Document how they distinguish between and categorize important aspects of the information.
Where possible, define and capture key performance metrics and calculations.
These can uncover business rules that provide the foundation for automation of data quality expectations.
Catalog requirements and prioritize them into those necessary
for production go-live and adoption of the warehouse and those that can wait.
Look for items that are simple and valuable to jump-start the productivity of the initial project release.
A DW/BI project requirements write-up should frame the whole context of the business areas and / or processes that are in scope.
2.2. Define and Maintain the DW/BI Architecture
1. The ‘how’ includes the hardware and software detail and the organizing framework to bring all the activities together.
2. Define DW/BI Technical Architecture
The best DW/BI architectures will design a mechanism to connect back to transactional level and operational level reports in an atomic DW.
A conceptual architecture is a starting point.
Prototyping can quickly prove or disprove key points before making expensive commitments to technologies or architectures.
A natural extension to this transformation process is the maintenance, or at least validation,
check the physical deployment against the logical model. Make any updates if omissions or errors arise.
3. Define DW/BI Management Processes
Address production management with a coordinated and integrated maintenance process, delivering regular releases to the business community.
It is crucial to establish a standard release plan
2.3. Develop the Data Warehouse and Data Marts
2.3.1. Typically, DW/BI projects have three concurrent development tracks:
Data: 数据
The data necessary to support the analysis the business wants to do. This track involvesidentifying the best sources for the data and designing rules for how the data is remediated,transformed, integrated, stored, and made available for use by the applications. This step also includesdeciding how to handle data that doesn’t fit expectations.
Technology: 技术
The back-end systems and processes supporting the data storage and movement.Integration with the existing enterprise is fundamental, as the warehouse is not an island unto itself.Enterprise Architectures, specifically Technology and Application specialties, usually manage thistrack.
Business Intelligence tools: 商务智能工具
The suite of applications necessary for data consumers to gain meaningfulinsight from deployed data products.
2.3.2. Map Sources to Targets 将源映射到目标
Source-to-target mapping establishes transformation rules for entities and data elements from individual sources to a target system. Such mapping also documents lineage for each data element available in the BI environment back to its respective source(s).
The most difficult part of any mapping effort is determining valid links or equivalencies between data elements in multiple systems.
2.3.3. Remediate and Transform Data 修正和转换数据
Data remediation or cleansing activities enforce standards and correct and enhance the domain values of individual data elements.
To reduce the complexity of the target system, source systems should be made responsible for data remediation and correction.
Data transformation focuses on activities that implement business rules within a technical system. Data transformation is essential to data integration.
2.4. Populate the Data Warehouse 加载数据仓库
2.4.1. The largest part of the work in any DW/BI effort is the preparation and processing of the data.
2.4.2. The key factors to consider when defining a population approach are
required latency
availability of sources,
batch windows or upload intervals
target databases
dimensional aspects
timeframe consistency of the data warehouse and data mart.
2.4.3. The approach must also address
data quality processing
time to perform transformations
late-arriving dimensions
data rejects
2.5. Implement the Business Intelligence Portfolio
2.5.1. Group Users According to Needs 根据需要给用户分组
Users may move from one class to another as their skills increase or as they perform different functions
2.5.2. Match Tools to User Requirements 将工具与用户要求相匹配
Remember that every BI tool comes with a price, requiring system resources, support, training, and architectural integration.
2.6. Maintain Data Products
2.6.1. Release Management 发布管理

Release Management is critical to an incremental development processes that grows new capabilities, enhances the production deployment, and ensures provision of regular maintenance across the deployed assets.
This process will keep the warehouse up-to-date, clean, and operating at its best.
17. Critical to the incremental development of the data warehouse is A:a strong release management process B:a strong incident management process C:a agile development team D:the assurance to include velocity,variety and veracity measurement E:a strong capacity management process 正确答案:A 你的答案:C 解析:11.2.6题解:1.发布管理发布管理对增量的开发过程至关重要,增加新功能,增强生产部署,并确保为已部署的资产提供定期维护。这个过程将使数据仓库保持是最新的、清洁的,并以最佳状态运行。
2.6.2. Manage Data Product Development Lifecycle 管理数据产品开发生命周期
While data consumers are using the existing DW, the DW team is preparing for the next iteration, with the understanding that not all items will go to production. Align
2.6.3. Monitor and Tune Load Processes 监控和调优加载过程
Monitor load processing across the system for bottlenecks and dependencies. Employ database tuning techniques where and when needed, including partitioning, tuned backup, and recovery strategies. Archiving is a difficult subject in data warehousing.
2.6.4. Monitor and Tune BI Activity and Performance 监控和调优商务智能活动和性能
A best practice for BI monitoring and tuning is to define and display a set of customer-facing satisfaction metrics. Average query response time and the number of users per day, week, or month are examples of useful metrics. In addition to the statistical measures available from the systems, it is useful to survey DW/BI customers regularly.
Transparency and visibility are the key principles that should drive DW/BI monitoring. The more one can expose the details of the DW/BI activities, the more data consumers can see and understand what is going on (and have confidence in the BI), and less direct end-customer support will be required. Providing a dashboard that exposes the high-level status of data delivery activities, with drill-down capability, is a best practice that allows an on-demand-pull of information by both support personnel and customers.
3. Tools
3.1. Choosing the initial set of tools can be a long process. It includes attempting to satisfy near-term requirements, non-functional specifications, and the as-yet to be created next generation requirements.
3.1.1. 3. The following are all data warehousing and business intelligence tools EXCEPT A:performance management workbenches B:volumetric measures. C:database management systems D:data security devices. E:none 正确答案:B 你的答案:C 解析:B和数据仓库与商务智能无关
3.2. Metadata Repository 元数据存储库
3.2.1. Key to this effort is the ability to stitch Metadata together from a variety of sources. Automating and integrating population of this repository can be achieved with a variety of techniques.
3.2.2. Data Dictionary / Glossary 数据字典和术语
A data dictionary is necessary to support the use of a DW. The dictionary describes data in business terms and includes other information needed to use the data (e.g., data types, details of structure, security restrictions).
Often the content for the data dictionary comes directly from the logical data model. Plan for high quality Metadata by ensuring modelers take a disciplined approach to managing definitions as part of the modeling process.
In some organizations, business users actively participate in the development of the data dictionary by supplying, defining, and then stewarding corrections to definitions of subject area data elements.
19. A data warehouse deployment with multiple ETL storage and querying tools often suffers 变糟 due to the lack of A:conflict between software vendors B:integration of the dictionaries to achieve common understanding C:disk space on the big data platform D:common data types in the source datasets E:quality data modelers 正确答案:B 你的答案:D 解析:题解:1.数据字典和术语数据字典是支撑数据仓库使用的必需组件。字典用业务术语来描述数据,包括使用该数据所需的其他信息(如数据类型、结构细节、安全限制)。通常,数据字典的内容直接来自逻辑数据模型。在建模过程中,应要求建模人员采用严格的定义管理方法,以规划高质量的元数据。
3.2.3. Data and Data Model Lineage 数据和数据模型的血缘关系
Documented data lineage serves many purposes:
Investigation of the root causes of data issues
Impact analysis for system changes or data issues
Ability to determine the reliability of data, based on its origin
It is critical to ensure that this information is not discarded and that the logical and physical models are updated after deployment and are in sync.
3.3. Data Integration Tools 数据集成工具
3.3.1. In selecting a tool, also account for these features that enable management of the system:
Process audit, control, restart, and scheduling
The ability to selectively extract data elements at execution time and pass that extract to a downstreamsystem for audit purposes
Controlling which operations can or cannot execute and restarting a failed or aborted run
3.3.2. variety of data integration tools also offer integration capabilities with the BI portfolio, supporting import and export of workflow messages, email, or even semantic layers.
3.4. Business Intelligence Tools Types 商务智能工具的类型
3.4.1. Operational reporting 运营报表
is the application of BI tools to analyze business trends, both short-term(month-over-month) and longer-term (year-over-year). Operational reporting can also help discovertrends and patterns. Use Tactical BI to support short-term business decisions.
Operational Reporting involves business users generating reports directly from transactional systems, operational applications, or a data warehouse.
Data exploration and reporting tools 数据检索和报表工具, sometimes called ad-hoc query tools 即席查询工具, enable users to author their own reports or create outputs for use by others. They
Often the reports created by business users become standard reports 标准报表, not exclusively used for ad hoc business questions.
3.4.2. Business performance management (BPM) 业务绩效管理
includes the formal assessment of metrics aligned withorganizational goals. This assessment usually happens at the executive level. Use Strategic BI tosupport long-term corporate goals and objectives.
Performance management is a set of integrated organizational processes and applications designed to optimize execution of business strategy; applications include budgeting, planning, and financial consolidation
Measurement and a feedback loop with positive reinforcement are key elements.
Another specialization has formed in this area: creating scorecards driven by dashboards for user interaction.
3.4.3. Descriptive, self-service analytics 描述性的自助分析
provides BI to the front lines of the business, where analyticalcapabilities guide operational decisions. Operational analytics couples BI applications with operationalfunctions and processes, to guide decisions in near-real-time.
3.4.4. Operational Analytic Applications 经营分析应用
Henry Morris of IDC coined the term Analytic Applications in the 1990s, clarifying how they are different from general OLAP and BI tools (Morris, 1999).
3.4.5. Multi-dimensional Analysis – OLAP 多维分析 - 在线分析处理
Online Analytical Processing (OLAP) refers to an approach to providing fast performance for multi-dimensional analytic queries.
16. Which approach is considered most effective when supporting multi-dimensional business report requests? A:Bl B:OLAP C:CEDI D:OLTP E:ODS 正确答案:B 你的答案:B 解析:11.3.3:在线分析处理(OLAP)是一种为多维分析查询提供快速性能的方法。OLAP这一术语在某种程度上源于对OLTP(在线交易处理)的明确区别,OLAP查询的典型输出采用矩阵格式,维度构成矩阵的行和列,因子或度量是矩阵内的值。
Typically, OLAP tools have both a server component and an end user client-facing component installed on the desktop, or available on the web. Some desktop components are accessible from within a spreadsheet appearing as an embedded menu or function item. The architecture selected (ROLAP, MOLAP, HOLAP) will guide the development efforts but common to all will be definition of cube structure, aggregate needs, Metadata augmentation and analysis of data sparsity. 数据稀疏性分析
Relational Online Analytical Processing (ROLAP): 关系型联机分析处理
ROLAP supports OLAP by using techniquesthat implement multi-dimensionality in the two-dimensional tables of relational database managementsystems (RDBMS). Star schema joins are a common database design technique used in ROLAPenvironments.
Multi-dimensional Online Analytical Processing (MOLAP): 多维矩阵型联机分析处理
MOLAP supports OLAP by using proprietary and specialized multi-dimensional database technology.
Hybrid Online Analytical Processing (HOLAP): 混合型联机分析处理
This is simply a combination of ROLAP andMOLAP. HOLAP implementations allow part of the data to be stored in MOLAP form and anotherpart of the data to be stored in ROLAP. Implementations vary on the control a designer has to vary themix of partitioning.
The value of On Line Analytical Processing (OLAP) Tools and cubes is reduction of the chance of confusion and erroneous interpretation, by aligning the data content with the analyst’s mental model. Common OLAP operations include
1. Slice: 切片
A slice is a subset of a multi-dimensional array corresponding to a single value for one or moremembers of the dimensions not in the subset.
2. Dice: 切块
The dice operation is a slice on more than two dimensions of a data cube, or more than twoconsecutive slices.
3. Drill down / up: 向下/向上钻取
Drilling down or up is a specific analytical technique whereby the user navigatesamong levels of data, ranging from the most summarized (up) to the most detailed (down).
4. Roll-up: 向上卷积
A roll-up involves computing all of the data relationships for one or more dimensions. To dothis, define a computational relationship or formula.
5. Pivot: 旋转
A pivot changes the dimensional orientation of a report or page display.
6. 18. Slice,Dice,Roll-up and Pivot are terms used in what kind of data processing? A:OLAP B:OLTP C:EIEIO D:EDI E:ODS 正确答案:A 你的答案:A 解析:11.3.3. 题解:常见的OLAP操作包括切片和切块、向下钻取、向上钻取、向上卷积和透视。
4. Techniques
4.1. Prototypes to Drive Requirements 驱动需求的原型
4.1.1. Quickly prioritize requirements before the implementation activities begin by creating a demonstration set of data and applying discovery steps in a joint prototype effort. Advances in data virtualization technologies can alleviate some of the traditional implementation pains through collaborative prototyping techniques.
4.1.2. Profiling the data contributes to prototyping and helps reduces risk associated with unexpected data. The DW is often the first place where the pain of poor quality data in source systems or data entry functions becomes apparent. Profiling also discloses differences between sources that may present obstacles to data integration. Data may be of high quality within its sources, but because sources, differ the data integration process becomes more complicated.
4.1.3. Evaluation of the state of the source data leads to more accurate up-front estimates for feasibility and scope of effort. The evaluation is also important for setting appropriate expectations. Plan to collaborate with the Data Quality and Data Governance team(s) and to draw on the expertise of other SMEs to understand data discrepancies and risks
4.2. Self-Service BI 自助式商务智能
4.2.1. Self-service is a fundamental delivery channel within the BI portfolio.
4.2.2. Visualization and statistical analysis tooling allows for rapid data exploration and discovery.
4.2.3. Evaluation of the state of the source data leads to more accurate up-front estimates for feasibility and scope of effort. The evaluation is also important for setting appropriate expectations. Plan to collaborate with the Data Quality and Data Governance team(s) and to draw on the expertise of other SMEs to understand data discrepancies and risks
4.3. Audit Data that can be Queried 可查询的审计数据
4.3.1. In order to maintain lineage, all structures and processes should have the capability to create and store audit information at a grain useful for tracking and reporting.
4.3.2. Allowing users to query this audit data enables the users to verify for themselves the condition and arrival of the data, which improves user confidence.
5. Implementation Guidelines
5.1. A stable architecture that can scale to meet future requirements is paramount to the success of a data warehouse. A production support team capable of dealing with the daily loading, analysis and end user feedback is mandatory. In addition, to sustain success, ensure that the warehouse and the business unit teams are aligned.
5.2. Readiness Assessment / Risk Assessment
5.2.1. Successful projects start with a Prerequisite Checklist. All IT projects should have business support, be aligned with strategy, and have a defined architectural approach. In addition, a DW should:
Define data sensitivity and security constraints
Perform tool selection
23. When performing an evaluation of analytic applications which of the following questions is least relevant to identify the level of effort needed? A:How much of the tool infrastructure meets our organizational infrastructure B:How much do the canned processes in the tool match our business C:No. of source systems we need to integrate into the tool D:The standard source systems for which ETL is supplied E:Annual costs such as license, maintenance,etc. 正确答案:E 你的答案:D 解析:11.5.1就绪评估/风险评估一个组织准备接受一项新风险,与它有能力承担这个风险之间可能会有一定的差距。成功的项目从先决条件清单开始。所有IT项目都应该有业务支持,与战略保持一致,并有一个定义好的架构方法。此外,数据仓库应该能够实现以下几点:1)明确数据敏感性和安全性约束。2)选择工具。3)保障资源安全。4)创建抽取过程以评估和接收源数据。
Secure resources
Create an ingestion process to evaluate and receive source data
5.2.2. Identify and inventory sensitive or restricted data elements in the warehouse.
5.2.3. Account for security constrains before selecting tools and assigning resources.
5.3. Release Roadmap
5.3.1. Because they require a large development effort, warehouses are built incrementally. Whatever method chosen to implement, be it waterfall, iterative or agile, it should account for the desired end state. That is why a roadmap is a valuable planning tool.
22. During the implementation of a data warehouse a roadmap is used to A:construct intricate 错综复杂 security authorization B:demonstrate alignment to the project plan C:articulate user requirements D:demonstrate progress towards the desired end state E:articulate 连接 data quality checkpoints 正确答案:D 你的答案:D 解析:11.5.2.版本路线图因为需要进行大量的开发工作,所以数据仓库是逐步构建的。无论选择何种实现方法,不管是瀑布式、迭代式,还是敏捷开发,都应该考虑到想要实现的最终状态。这就是为什么路线图是一种有价值的规划工具。
5.3.2. An incremental approach leveraging the DW bus matrix as a communication and marketing tool is suggested.
5.4. Configuration Management 配置管理
5.4.1. Configuration management aligns with the release roadmap and provides the necessary back office stitching and scripts to automate development, testing, and transportation to production.
5.5. Organization and Cultural Change
5.5.1. Starting with and keeping a consistent business focus throughout the DW/BI lifecycle is essential to success.
Business sponsorship: 业务倡议
Is there appropriate executive sponsorship,
Business goals and scope: 业务目标和范围
Is there a clearly identified business need, purpose, and scope for theeffort?
Business resources: 业务资源
Is there a commitment by business management to the availability andengagement of the appropriate business subject matter experts?
Business readiness: 业务准备情况
Is the business partner prepared for a long-term incremental delivery? Have theycommitted themselves to establishing centers of excellence to sustain the product in future releases?How broad is the average knowledge or skill gap within the target community and can that be crossedwithin a single increment?
Vision alignment: 愿景一致
How well does the IT Strategy support the Business Vision? It is vital to ensurethat desired functional requirements correspond to business capabilities that are or can be sustained inthe immediate IT roadmap. Any significant departures or material gaps in capability alignment canstall or stop a DW/BI program.
5.5.2. Many organizations have a dedicated team to manage the ongoing operations of the production environment.
6. DW/BI Governance
6.1. Enabling Business Acceptance 业务接受度
6.1.1. A key success factor is business acceptance of data, including the data being understandable, having verifiable quality, and having a demonstrable lineage.
Sign-off by the Business on the data should be part of the User Acceptance Testing.
6.1.2. a few critically important architectural sub-components, along with their supporting activities:
1. Conceptual Data Model: 概念数据模型
What information is core to the organization? What are the key business concepts and how are they related to each other?
2. Data quality feedback loop: 数据质量反馈循环
How are data issues identified and remediated? How are owners of thesystems in which issues originate informed about problems and held accountable for fixing them?What is the remediation process for issues that are caused by the DW data integration processes?
3. End-to-end Metadata: 端到端元数据
How does the architecture support the integrated end-to-end flow of Metadata?In particular, is access to meaning and context designed into the architecture? How do data consumersanswer basic questions like "What does this report mean?" or "What does this metric mean?"
4. End-to-end verifiable data lineage: 端到端可验证数据血缘
Are the items exposed to business users traceable to the sourcesystems in an automated, maintained manner? Is a system of record identified for all data?
6.2. Customer / User Satisfaction
6.2.1. Perceptions of the quality of data will drive customer satisfaction but satisfaction is dependent on other factors as well, such as data consumers’ understanding of the data and the operations team’s responsiveness to identified issues.
6.3. Service Level Agreements
6.3.1. Business and technical expectations for the environments should be specified in Service Level Agreements (SLAs). Often the response time, data retention, and availability requirements differ greatly between classes of business needs and their respective supporting systems (e.g., ODS versus DW versus data mart).
6.4. Reporting Strategy 报表策略
6.4.1. Ensure that a reporting strategy exists within and across the BI Portfolio. A reporting strategy includes standards, processes, guidelines, best practices, and procedures. It will ensure users have clear, accurate, and timely information. The reporting strategy must address111
1. Security access to ensure that only entitled users will gain access to sensitive data elements
2. Access mechanisms to describe how users want to interact, report, examine or view their data
3. User community type and appropriate tool to consume it with
4. Nature of the reports summary, detailed, exception as well as frequency, timing, distribution andstorage formats
5. Potential use of visualization capabilities to provision graphical output
6. Trade-offs between timeliness and performance
6.4.2. Standard reports 标准报表 should be evaluated periodically to ensure they are still providing value as just executing reports incurs cost in storage and processing.
6.4.3. Data source governance monitoring and control are also vital. Ensure that appropriate levels of data are provisioned securely for authorized personnel, and that subscription data is accessible according to agreed-upon levels.
6.4.4. A Center of Excellence 卓越中心 can provide training, start-up sets, design best practices, data source tips and tricks and other point solutions or artifacts to help empower business users towards a self-service model. In addition to knowledge management, this center can provide timely communications across the developer, designer, analyst and subscribing user communities.
6.5. Metrics
6.5.1. Usage Metrics 使用指标
DW usage metrics typically include the number of registered users, as well as connected users or concurrent connected users.
6.5.2. Subject Area Coverage Percentages 主题域覆盖率
Subject area coverage percentages measure how much of the warehouse (from a data topology拓扑 perspective) is being accessed by each department.
6.5.3. Response and Performance Metrics 响应时间和性能指标
Most query tools measure response time.
Harvest load times 收集加载时间 for each data product in raw format from the population processes.
Most tools will retain, in a log or repository, query records, data refresh, and data extract times for the objects provided to the users.
7. Works Cited / Recommended
7.1. 1. Service oriented architecture services enable data warehousing in the following scenarios EXCEPT A:making Data Warehousing batch processing unnecessary B:assisting near-real time Data Warehousing C:assisting in data integration D:making OLAP and OLTP integration possible E:All 正确答案:A 你的答案:A 解析:题解:A逻辑错误。批处理依然必要
7.2. 4. Which is NOT true of the Information Supply Chain (IS? A:A developer can trace relationships and transformations. B:An analyst can drill down to the transactional level but can NOT review business event C:An analyst can drill down to the transactional level and review business events D:An analyst can drill down to the dimensional level E:none 正确答案:B 你的答案:B 解析:BC冲突
7.3. 6. Components of a Business Intelligence strategy do NOT include A:simplified data delivery B:zero latency. C:seamless data and technology integration D:end user training modules. E:All 正确答案:D 你的答案:B 解析:D与商务智能无关
7.4. 7. The purpose of affinity analysis 亲和性 is to A:support development of an event model B:develop object groupings for project definitions C:normalize a logical data model D:determine the granularity in a data model E:all 正确答案:B 你的答案:B 解析:11:Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set.
7.5. 10. The assumption that the data in a data warehouse is accurate at a specific point in time is:__ A:always understood between the business and lt B:NOT required C:time variant. 时间变量 D:real time E:None 正确答案:C 你的答案:D 解析:OLTP的数据可能随时间变化并隔一定时间同步到OLAP中
7.6. 13. A comparatively new architectural approach is where volatile data 不稳定数据 is provisioned in a data warehouse structure to provide transactional systems with a combination of historical and near real time data to meet customer needs this is a definition of A:Behavioral Decision Support Systems B:Operational Data store C:Active Data Warehousing 主动数据仓库 D:On Line Analytical Processing Cube E:On Line Transactional Processing System 正确答案:C 你的答案:B 解析:题解:Active Data Warehouse(ADW)is a data warehouse designed to provide real time or near-real time operational decision support whereas traditional data warehouses are aimed at providing decision making support to business executives for strategic purposes. An ADW is also differentiated from traditional data warehouse by its event driven actions and queries &triggers designed to operate continuously on large amounts of data as and when a change is occurred.
7.7. 24. Top down and 'bottom up' data analysis and profiling is best done in concert 同时进行because A:Data quality tools are more productive when they are effectively configured B:It gets everyone involved C:It allows the profiler to show the business the true state of the data D:It balances business relevance and the actual state of the data E:It gives something for the architects to do while the profilers get on with the work. 正确答案:D 你的答案:A 解析:Top和业务相关,down和数据状态相关
Chapter 12: Metadata Management 元数据管理

1. Introduction
1.1. Definition
1.1.1. The most common definition of Metadata, “data about data,” is misleadingly simple.
1.1.2. The kind of information that can be classified as Metadata is wide-ranging. Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures
1.1.3. It describes the data itself (e.g., databases, data elements, data models), the concepts the data represents (e.g., business processes, application systems, software code, technology infrastructure), and the connections (relationships) between the data and concepts.
18. which of these statements are true of a metadata repository? A:lt is always decentralized B:None of the these C:lt is always centralized D:Data models are components of a Metadata repository E:lt is always a hybrid architecture 正确答案:D 你的答案:D 解析:12.1:元数据最常见的定义是“关于数据的数据"。这个定义非常简单,但也容易引起误解。可以归类为元数据的信息范围很广,不仅包括技术和业务流程、数据规则和约束,还包括逻辑数据结构与物理数据结构等。它描述了数据本身(如数据库、数据元素、数据模型),数据表示的概念(如业务流程、应用系统、软件代码、技术基础设施),数据与概念之间的联系(关系)。元数据可以帮助组织理解其自身的数据、系统和流程,同时帮助用户评估数据质量,对数据库与其他应用程序的管理来说是不可或缺的。它有助于处理、维护、集成、保护和治理其他数据。
29. Critical to the success of the data warehouse is the ability to explain the data. the DMBOK knowledge area that practices these techniques is A:Document Content Management B:Metadata Management C:Reference and master data D:Data Storage and Operations E:Data Architecture 正确答案:B 你的答案:B 解析:12.1引言元数据最常见的定义是“关于数据的数据”。这个定义非常简单,但也容易引起误解。可以归类为元数据的信息范围很广,不仅包括技术和业务流程、数据规则和约束,还包括逻辑数据结构与物理数据结构等。它描述了数据本身(如数据库、数据元素、数据模型)
38. The role of metadata in data management is A:to help organizations understand its data,its systems and its workflows B:to group common data concepts C:to display appropriate data on screens and reports D:to build a big data solution E:to provide effective decision making 正确答案:A 你的答案:A 解析:12.1:元数据最常见的定义是“关于数据的数据”。这个定义非常简单,但也容易引起误解。可以归类为元数据的信息范围很广,不仅包括技术和业务流程、数据规则和约束,还包括逻辑数据结构与物理数据结构等。它描述了数据本身(如数据库、数据元素、数据模型),数据表示的概念(如业务流程、应用系统、软件代码、技术基础设施),数据与概念之间的联系(关系)。元数据可以帮助组织理解其自身的数据、系统和流程,同时帮助用户评估数据质量,对数据库与其他应用程序的管理来说是不可或缺的。它有助于处理、维护、集成、保护和治理其他数据。
1.1.4. For
Metadata is essential to data management as well as data usage
it is also a risk management necessity.
Metadata is necessary to ensure an organization can identify private or sensitive data and that it can manage the data lifecycle for its own benefit and in order to meet compliance requirements and minimize risk exposure.
Without reliable Metadata, an organization does not know what data it has, what the data represents,
1.1.5. DCMM
1.2. Business Drivers
1.2.1. Reliable, well-managed Metadata helps:
1. Increase confidence in data by providing context and enabling the measurement of data quality
2. Increase the value of strategic information (e.g., Master Data) by enabling multiple uses
3. Improve operational efficiency by identifying redundant data and processes
4. Prevent the use of out-of-date or incorrect data
5. Reduce data-oriented research time
6. Improve communication between data consumers and IT professionals
7. Create accurate impact analysis thus reducing the risk of project failure
8. Improve time-to-market by reducing system development life-cycle time
9. Reduce training costs and lower the impact of staff turnover through thorough documentation of datacontext, history, and origin
10. Support regulatory compliance
1.2.2. Metadata assists in representing information consistently, streamlining workflow capabilities, and protecting sensitive information, particularly when regulatory compliance is required.
25. The library of information about our data (our metadata) is built so that A:we can have a shared formalized view of requirements (eg what data quality we need) B:we can be consistent in our use of terminology C:All of these D:we can better manage it E:we can better understand it 正确答案:C 你的答案:C 解析:暂无解析
1.2.3. Poorly managed Metadata leads to:
1. Redundant data and data management processes
2. Replicated and redundant dictionaries, repositories, and other Metadata storage
3. Inconsistent definitions of data elements and risks associated with data misuse
4. Competing and conflicting sources and versions of Metadata which reduce the confidence of data consumers
5. Doubt about the reliability of Metadata and data
1.3. Goals and Principles
1.3.1. Goals
1. Document and manage organizational knowledge of data-related business terminology in order toensure people understand data content and can use data consistently
39. A goal of metadata management is to manage data related business terminology in order to A:ensure people understand data content and can use data consistently B:ensure the business processes align to the data model C:ensure people understand data definition in Bl systems D:successfully size the database E:ensure accurate data requirements are gathered for reporting 正确答案:A 你的答案:A 解析:暂无解析
2. Collect and integrate Metadata from diverse sources to ensure people understand similarities and differences between data from different parts of the organization
3. Ensure Metadata quality, consistency, currency, and security
4. Provide standard ways to make Metadata accessible to Metadata consumers (people, systems, andprocesses)
5. Establish or enforce the use of technical Metadata standards to enable data exchange
1.3.2. principles
1. Organizational commitment:
Secure organizational commitment (senior management support and funding) to Metadata management as part of an overall strategy to manage data as an enterprise asset.
2. Strategy:
Develop a Metadata strategy that accounts for how Metadata will be created, maintained,integrated, and accessed. The strategy should drive requirements, which should be defined beforeevaluating, purchasing, and installing Metadata management products. The Metadata strategy must align with business priorities.
3. Enterprise perspective: 企业视角
Take an enterprise perspective to ensure future extensibility, but implement through iterative and incremental delivery to bring value.
4. Socialization: 潜移默化
Communicate the necessity of Metadata and the purpose of each type of Metadata; socialization of the value of Metadata will encourage business use and, as importantly, the contribution of business expertise.
5. Access:
Ensure staff members know how to access and use Metadata.
6. Quality:
Recognize that Metadata is often produced through existing processes (data modeling, SDLC,business process definition) and hold process owners accountable for the quality of Metadata.
元数据的产出人负责,不是管理人负责
7. Audit:
Set, enforce, and audit standards for Metadata to simplify integration and enable use.
8. Improvement:
Create a feedback mechanism so that consumers can inform the Metadata Managementteam of Metadata that is incorrect or out-of-date.
1.4. Essential Concepts
1.4.1. Metadata vs. Data 元数据与数据
Metadata is a kind of data, and it should be managed as such.
A rule of thumb might be that one person’s Metadata is another’s data.
To manage their Metadata, organizations should not worry about the philosophical distinctions. Instead they should define Metadata requirements focused on what they need Metadata for (to create new data, understand existing data, enable movement between systems, access data, to share data) and source data to meet these requirements.
1.4.2. Types of Metadata 元数据的类型
1. Metadata is often categorized into three types: business, technical, and operational. These categories enable people to understand the range of information that falls under the overall umbrella of Metadata, as well as the functions through which Metadata is produced.
1. 1. Types of meta-data include all of the following EXCEPT A:Operational meta-data B:technical meta-data C:business meta-data D:executive meta-data E:All 正确答案:D 你的答案:D 解析:12.1.3:2.元数据的类型元数据通常分为三种类型:业务元数据、技术元数据和操作元数据。
2. 22. The Meta Data repository enables us to establish multiple perspectives of data. These are A:The Business and Technical Perspective B:Structured and unstructured C:Dimensional and non dimensional perspective D:Internal and external E:3rd normal form and un normalized 正确答案:A 你的答案:A 解析:题解:(1)业务元数据(Business Metadata)主要关注数据的内容和条件,另包括与数据治理相
3. 36. Metadata is often categorized into three types, they are A:technical,infrastructure and instance B:business, technical and operational C:business, strategic and meta-metadata D:business, technical and strategic E:operational, reporting and analytical 正确答案:B 你的答案:B 解析:解析:12.1.3:2.元数据的类型元数据通常分为三种类型:业务元数据、技术元数据和操作元数据
4. Business Metadata 业务元数据
Business Metadata focuses largely on the content and condition of the data and includes details related to data governance.
1. Definitions and descriptions of data sets, tables, and columns
2. Business rules, transformation rules, calculations, and derivations
3. Data models
4. Data quality rules and measurement results
15. Data quality rules and measurement results are examples of: A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:C 你的答案:E 解析:12.1.3:业务元数据的示例包括:1)数据集、表和字段的定义和描述。2)业务规则、转换规则、计算公式和推导公式。3)数据模型(4)数据质量规则和检核结果。5)数据的更新计划。6)数据溯源和数据血缘。7)数据标准。
5. Schedules by which data is updated
6. Data provenance and data lineage
30. Data provenance and data lineage are examples of A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:C 你的答案:A 解析:12.1.3:(1)业务元数据(Business Metadata)主要关注数据的内容和条件,另包括与数据治理相关的详细信息,业务元数据包括主题域、概念、实体、属性的非技术名称和定义、属性的数据类型和其他特征,如范围描述、计算公式、算法和业务规则、有效的域值及其定义。业务元数据的示例包括:1)数据集、表和字段的定义和描述。2)业务规则、转换规则、计算公式和推导公式。3)数据模型。4)数据质量规则和检核结果。5)数据的更新计划。6)数据溯源和数据血缘。7)数据标准。8)特定的数据元素记录系统。9)有效值约束。10)利益相关方联系信息(如数据所有者、数据管理专员)。11)数据的安全/隐私级别。12)已知的数据问题,13)数据使用说明。
7. Data standards
8. Designations of the system of record for data elements
9. Valid value constraints
11. Data standards and Valid value constraints are examples of: A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:C 你的答案:E 解析:12.1.3:(1)业务元数据(Business Metadata)主要关注数据的内容和条件,另包括与数据治理相关的详细信息。业务元数据包括主题域、概念、实体、属性的非技术名称和定义、属性的数据类型和其他特征,如范围描述、计算公式、算法和业务规则、有效的域值及其定义。业务元数据的示例包括:
10. Stakeholder contact information (e.g., data owners, data stewards)
11. Security/privacy level of data
12. Known issues with data
13. Data usage notes
5. Technical Metadata 技术元数据
Technical Metadata provides information about the technical details of data, the systems that store data, and the processes that move it within and between systems
1. Physical database table and column names
10. Physical database table and column names are examples of: A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:E 你的答案:E 解析:解析:12.1.3:(2)技术元数据(Technical Metadata)提供有关数据的技术细节、存储数据的系统以及在系统内和系统之间数据流转过程的信息。技术元数据示例包括:1)物理数据库表名和字段名
2. Column properties
35. The Family name of a person is recorded in a system the column name is pname. pname is an example of A:Normalized data B:Data C:Poor table design D:Metadata E:Megadata 正确答案:D 你的答案:D 解析:12.1.3:(2)技术元数据(Technical Metadata)提供有关数据的技术细节、存储数据的系统以及在系统内和系统之间数据流转过程的信息。技术元数据示例包括:1)物理数据库表名和字段名。
3. Database object properties
4. Access permissions
5. Data CRUD (create, replace, update and delete) rules
9. Access permissions and Data CRUD(create,replace,update and delete) rules are examples of A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:E 你的答案:E 解析:12.1.3:(2)技术元数据(Technical Metadata)提供有关数据的技术细节、存储数据的系统以及在系统内和系统之间数据流转过程的信息。技术元数据示例包括:1)物理数据库表名和字段名。2)字段属性。3)数据库对象的属性。4)访问权限。5)数据CRUD(增、删、改、查)规则。6)物理数据模型,包括数据表名、键和索引。
6. Physical data models, including data table names, keys, and indexes
7. Documented relationships between the data models and the physical assets
8. ETL job details
9. File format schema definitions
10. Source-to-target mapping documentation
11. Data lineage documentation, including upstream and downstream change impact information
12. Program and application names and descriptions
13. Content update cycle job schedules and dependencies
14. Recovery and backup rules
15. Data access rights, groups, roles
27. What type of Meta-Data provides developers and administrators with knowledge and information about systems? A:Business meta-Data B:Data Stewardship Meta-Data C:Technical Operational Meta-Data D:Unstructured Meta-Data E:Process Meta-Data 正确答案:C 你的答案:C 解析:暂无解析
6. Operational Metadata 操作元数据
Operational Metadata describes details of the processing and accessing of data. For example
1. Logs of job execution for batch programs
13. Logs of job execution for batch programs are examples of A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:A 你的答案:A 解析:12.1.3:(3)操作元数据(Operational Metadata)描述了处理和访问数据的细节,例如:1) 任务或批处理日志 3)调度异常处理。4)审计、平衡、控制度量的结果。5)错误日志。
2. History of extracts and results
3. Schedule anomalies
4. Results of audit, balance, control measurements
12. Results of audit balance control measurements 审计余额控制计量结果 are examples of: A:operational metadata B:industry metadata C:business metadata D:strategic metadata E:technical metadata 正确答案:A 你的答案:C 解析:12.1.3:(3)操作元数据(Operational Metadata)描述了处理和访问数据的细节,例如:3)调度异常处理。4)审计、平衡、控制度量的结果。5)错误日志。
5. Error Logs
6. Reports and query access patterns, frequency, and execution time
7. Patches and Version maintenance plan and execution, current patching level
8. Backup, retention, date created, disaster recovery provisions
9. SLA requirements and provisions
10. Volumetric and usage patterns
11. Data archiving and retention rules, related archives
12. Purge criteria
13. Data sharing rules and agreements
14. Technical roles and responsibilities, contacts
2. Outside of information technology, for example, in library or information science, Metadata is described using a different set of categories:
Descriptive Metadata 描述元数据 (e.g., title, author, and subject) describes a resource and enables identificationand retrieval.
Structural Metadata 结构元数据 describes relationships within and among resources and their component parts(e.g., number of pages, number of chapters).
Administrative Metadata 管理元数据 (e.g., version numbers, archive dates) is used to manage resources over theirlifecycle.
1.4.3. ISO / IEC 11179 Metadata Registry Standard 元数据的注册标准
ISO’s Metadata Registry Standard, ISO/IEC 11179, provides a framework for defining a Metadata registry. It is designed to enable Metadata-driven data exchange, based on exact definitions of data, beginning with data elements. The standard is structured in several parts:
Part 1: Framework for the Generation and Standardization of Data Elements
Part 2:数据元数据分类
Part 3: Basic Attributes of Data Elements
Part 4: Rules and Guidelines for the Formulation of Data Definitions
Part 5: Naming and Identification Principles for Data Elements
Part 6: Registration of Data Elements
37. The ISO Metadata Registry standard that provides a framework for defining a metadata registry A:lSO 4590 B:IsO 9001 C:ISO MD 1 D:ISO 4-20-99 E:ISO/IEC11179 正确答案:E 你的答案:E 解析:12.1.3 ISO/IEC 11179元数据注册标准ISO的元数据注册标准ISO/IEC11179中提供了用于定义元数据注册的框架,旨在基于数据的精确定义,从数据元素开始,实现元数据驱动的数据交换。
GB/T 18391.1~18391.6 2009 与之对应
ID3,Exif, ISO3166-1, ISO19115, Dublin Core, ICD
Dublin
1.4.4. Metadata for Unstructured Data 非结构化数据的元数据
1. descriptive Metadata 描述元数据 such as catalog information and thesauri keywords;
2. structural Metadata 结构元数据 such as tags, field structures, format;
3. administrative Metadata 管理元数据 such as sources, update schedules, access rights, and navigation information; bibliographic Metadata 书目元数据 such as library catalog entries;
4. record keeping Metadata 记录元数据 such as retention policies
5. preservation Metadata 保存元数据 such as storage, archival condition, and rules for conservation.
21. which of the following is a Meta-Data scheme focused specifically on documents? A:Business Meta-Data B:Descriptive Meta-Data C:Structural Meta-Data D:Administrative Meta-Data E:Preservation Meta-Data 正确答案:E 你的答案:B 解析:12.1.3. 保存元数据,如存储、归档条件和保存规则
1.4.5. Sources of Metadata 元数据来源
1. The majority of operational Metadata is generated as data is processed.
2. Much of the technical Metadata required to manage databases and the business Metadata required to use data can be collected and developed as part of project work.
3. Well-defined business Metadata is reusable from project-to-project and can drive a consistent understanding of how business concepts are represented in different data sets.
4. Application Metadata Repositories 应用程序的元数据存储库
A Metadata repository refers to the physical tables in which the Metadata is stored. Often these are built into modeling tools, BI tools, and other applications. As an organization matures, it will want to integrate Metadata from repositories in these applications to enable data consumers to look across the breadth of information.
5. Business Glossary 业务术语表
1. The purpose of a business glossary is to document and store an organization’s business concepts and terminology, definitions, and the relationships between those terms.
23. A business perspective product in the meta Data repository is A:Systems Inventory B:Physical Data Model C:Data Dictionary D:ETL flow E:Data Glossary 正确答案:E 你的答案:C 解析:12.1.3:(13)参考数据库参考数据记录各种类型的枚举数据(值域)的业务价值和描述,在系统中的上下文中使用。用于管理参考数据的工具,还能够管理相同或不同业务领域内不同编码值之间的关系。这些工具套件通常提供将收集的参考数据发送到元数据存储库的功能,元数据存储库则提供将参考数据与业务词汇表以及物理实现该数据的位置(如列或字段)相关联的机制。
2. The business glossary application is structured to meet the functional requirements of the three core audiences:
Business users 业务用户
Data analysts, research analysts, management, and executive staff use the businessglossary to understand terminology and data.
Data Stewards 数据管理专员
Data Steward use the business glossary to manage the lifecycle of terms anddefinitions and to enhance enterprise knowledge by associating data assets with glossary terms
Technical users 技术用户
Technical users use the business glossary to make architecture, systems design, anddevelopment decisions, and to conduct impact analysis
3. The business glossary should capture business terms attributes such as:
1. Term name, definition, acronym or abbreviation, and any synonyms
2. Business unit and or application responsible for managing the data associated with the terminology
3. Name of the person identifying the term, and date updated
4. Categorization or taxonomy association for the term (business functional association)
5. Conflicting definitions that need resolution, nature of the problem, action timeline
6. Common misunderstandings in terms
7. Algorithms supporting definitions
8. Lineage
9. Official or authoritative source for the data supporting the term
4. Every business glossary implementation should have a basic set of reports to support the governance processes.
5. Ease of use and functionality can vary widely. 易用性和功能性背道而驰
6. Business Intelligence (BI) Tools 商务智能工具
Business Intelligence tools produce various types of Metadata relevant to the Business Intelligence design including overview information, classes, objects, derived and calculated items, filters, reports, report fields, report layout, reports users, report distribution frequency, and report distribution channels.
7. Configuration Management Tools 配置管理工具
Configuration management tools or databases (CMDB) provide the capability to manage and maintain Metadata specifically related to the IT assets, the relationships among them, and contractual details of the asset. Each asset in the CMDB database is referred to as a configuration item (CI). Standard Metadata is collected and managed for each CI type. Many organizations integrate the CMDB with the change management processes to identify the related assets or applications impacted by a change to a specific asset. Repositories provide mechanisms to link the assets in the Metadata repository to the actual physical implementation details in CMDB to give a complete picture of the data and the platforms.
8. Data Dictionaries 数据字典
A data dictionary defines the structure and contents of data sets, often for a single database, application, or warehouse. The dictionary can be used to manage the names, descriptions, structure, characteristics, storage requirements, default values, relationships, uniqueness, and other attributes of every data element in a model. It should also contain table or file definitions.
9. Data Integration Tools 数据集成工具
Many of these tools generate transient files, which might contain copies or derived copies of the data. These tools are capable of loading data from various sources and then operating on the loaded data, through grouping, remediation, re-formatting, joining, filtering, or other operations, and then generating output data, which is distributed to the target locations. They document the lineage as data as it moves between systems. Any successful Metadata solution should be able to use the lineage Metadata as it is moves through the integration tools and expose it as a holistic lineage from the actual sources to the final destinations.
Data integration tools also provide Metadata about the execution of the various data integration jobs, including last successful run, duration, and job status. Some Metadata repositories can extract the data integration runtime statistics and Metadata and expose it alongside the data elements.
10. Database Management and System Catalogs 数据库管理和系统目录
Database catalogs are an important source of Metadata. They describe the content of databases, along with sizing information, software versions, deployment status, network uptime, infrastructure uptime, availability, and many other operational Metadata attributes. The most common form of database is relational. Relational databases manage the data as a set of tables and columns, where a table contains one or more columns, indexes, constraints, views, and procedures.
32. Where is the best place to find the following metadata: database table names, column names and indexes A:Logical data model B:Enterprise data model C:Detailed business processes D:Security access authorization E:Database catalogue 正确答案:E 你的答案:A 解析:12.1.3.:(7)数据库管理和系统目录数据库目录是元数据的重要来源,它们描述了数据库的内容、信息大小、软件版本、部署状态、网络正常运行时间、基础架构正常运行时间、可用性,以及许多其他操作元数据属性。最常见的数据库形式是关系型的,关系型数据库将数据作为一组表和列进行管理,其中表包含一个或多个列、索引、约束、视图和存储过程。元数据解决方案应该能够连接到各种数据库和数据集,并读取数据库公开的所有元数据。一些元数据存储库工具可以集成系统管理工具中公开的元数据,以提供描述物理资产的更全面的图像。
A Metadata solution should be able to connect to the various databases and data sets and read all of the Metadata exposed by the database. Some of the Metadata repository tools can integrate the exposed Metadata from the system management tools to provide a more holistic picture about the captured physical assets.
11. Data Mapping Management Tools 数据映射管理工具
Mapping management tools are used during the analysis and design phase of a project to transform requirements into mapping specifications, which can then be consumed directly by a data integration tool or used by the developers to generate data integration code.
Mapping documentation is also often held in excel documents across the enterprise.
12. Data Quality Tools 数据质量工具
Data quality tools assess the quality of data through validation rules. Most of these tools provide the capability to exchange the quality scores and profiles patterns with other Metadata repositories, enabling the Metadata repository to attach the quality scores to the relevant physical assets.
13. Directories and Catalogs 字典和目录
a directory or catalog contains information about systems, sources, and locations of data within an organization.
14. Event Messaging Tools 事件消息工具
Event messaging tools move data between diverse systems. To do so, they require a lot of Metadata. They also generate Metadata that describes this movement. These tools include graphic interfaces through which they manage the logic of data movement. They can export the interfaces implementation details, movement logic, and processing statistics to other Metadata repositories.
MQ
15. Modeling Tools and Repositories 建模工具和存储库
Data modeling tools are used to build various types of data models: conceptual, logical, and physical. These tools produce Metadata relevant to the design of the application or system model, like subject areas, logical entities, logical attributes, entity and attribute relationships, super types and subtypes, tables, columns, indexes, primary and foreign keys, integrity constraints, and other types of attribution from the models. Metadata repositories can ingest the models created by these tools and integrate the imported Metadata into the repository. Modeling tools are often the source of data dictionary content.
设计态 vs 生产态
16. Reference Data Repositories 参考数据库
Reference Data documents the business values and descriptions of the various types of enumerated data (domains) and their contextual use in a system. Tools used to manage Reference Data are also capable of managing relationships between the various codified values within the same or across domains. These suites of tools normally provide capabilities to send the collected Reference Data to a Metadata repository, which in turn will provide mechanisms to associate the Reference Data to the business glossary and to the locations where it is physically implemented like columns or fields.
2. A set of allowable data values is A:called codes B:a value domain C:considered conformed D:called valid values E:all 正确答案:B 你的答案:D 解析:12.1.3:参考数据记录各种类型的枚举数据(值域)的业务价值和描述,在系统中的上下文中使用。
40. Which of the following statements regarding a value domain is False A:Conforming value domains across the organization facilitates data quality B:A value domain is a set of allowed values for a given code set C:Value domains are defined by external standard organizations D:More than one set of reference data value domains may refer to the same conceptual domain E:a value domain provides a set of permissible values by which a data element can be implemented 正确答案:C 你的答案:B 解析:12.1.3.:(13)参考数据库参考数据记录各种类型的枚举数据(值域)的业务价值和描述,在系统中的上下文中使用。用于管理参考数据的工具,还能够管理相同或不同业务领域内不同编码值之间的关系。这些工具套件通常提供将收集的参考数据发送到元数据存储库的功能,元数据存储库则提供将参考数据与业务词汇表以及物理实现该数据的位置(如列或字段)相关联的机制。
17. Service Registries 服务注册
A service registry manages and stores the technical information about services and service end-points from a service oriented architecture (SOA) perspective. For example, definitions, interfaces, operations, input and output parameters, policies, versions, and sample usage scenarios. Some of the most important Metadata related to services includes service version, location of service, data center, availability, deployment date, service port, IP address, stats port, connection timeout, and connection retry timeout
18. Other Metadata Stores 其他元数据存储
Other Metadata stores include specialized lists such as event registries, source lists or interfaces, code sets, lexicons, spatial and temporal schema, spatial reference, and distribution of digital geographic data sets, repositories of repositories, and business rules
1.4.6. Types of Metadata Architecture 元数据架构的类型
1. Conceptually, all Metadata management solutions include architectural layers that correspond to points in the Metadata lifecycle:
1. Metadata creation and sourcing
2. Metadata storage in one or more repositories
3. Metadata integration
4. Metadata delivery
5. Metadata usage
6. Metadata control and management
2. Non-Managed Metadata Architecture 没有管理的元数据架构
P2P 点对点
3. Point to point meta data architecture requires a separate usually bi-directional___ for each unique tool pair. A:bridge B:data flow C:environment D:database E:storage 正确答案:A 你的答案:A 解析:12.1.3.(4)双向元数据架构另一种高级架构方法是双向元数据架构,它允许元数据在架构的任何部分(源、数据集成、用户界面)中进行更改,然后将变更从存储库(代理)
3. Centralized Metadata Architecture 集中式元数据架构

A centralized architecture consists of a single Metadata repository that contains copies of Metadata from the various sources.
Organizations with limited IT resources, or those seeking to automate as much as possible, may choose to avoid this architecture option. Organizations seeking a high degree of consistency within the common Metadata repository can benefit from a centralized architecture.
Advantages of a centralized repository include:
1. High availability, since it is independent of the source systems
2. Quick Metadata retrieval, since the repository and the query reside together
3. Resolved database structures not affected by the proprietary nature of third party or commercialsystems
4. Extracted Metadata may be transformed, customized, or enhanced with additional Metadata that maynot reside in the source system, improving quality
Some limitations of the centralized approach include:
1. Complex processes are necessary to ensure that changes in source Metadata are quickly replicated intothe repository
2. Maintenance of a centralized repository can be costly
3. Extraction could require custom modules or middleware
4. Validation and maintenance of customized code can increase the demands on both internal IT staff andthe software vendors
4. Distributed Metadata Architecture 分布式元数据架构

A completely distributed architecture maintains a single access point. The Metadata retrieval engine responds to user requests by retrieving data from source systems in real time; there is no persistent repository.
Advantages of distributed Metadata architecture include:
1. Metadata is always as current and valid as possible because it is retrieved from its source
2. Queries are distributed, possibly improving response and process time
3. Metadata requests from proprietary systems are limited to query processing rather than requiring adetailed understanding of proprietary data structures, therefore minimizing the implementation andmaintenance effort required
4. Development of automated Metadata query processing is likely simpler, requiring minimal manualintervention
5. Batch processing is reduced, with no Metadata replication or synchronization processes
Distributed architectures also have limitations:
1. No ability to support user-defined or manually inserted Metadata entries since there is no repository in which to place these additions
2. Standardization of presenting Metadata from various systems
3. Query capabilities are directly affected by the availability of the participating source systems
4. The quality of Metadata depends solely on the participating source systems
5. Hybrid Metadata Architecture 混合式元数据架构

A hybrid architecture combines characteristics of centralized and distributed architectures
Many organizations can benefit from a hybrid architecture,
Organizations with more static Metadata and smaller Metadata growth profiles may not see the maximum potential from this architecture alternative.
6. Bi-Directional Metadata Architecture 双向元数据架构
which allows Metadata to change in any part of the architecture (source, data integration, user interface) and then feedback is coordinated from the repository (broker) into its original source.
Various challenges are apparent in this approach.
2. Activities
2.1. Define Metadata Strategy
2.1.1. Initiate Metadata strategy planning:
The goal of initiation and planning is to enable the Metadatastrategy team to define its short- and long-term goals. Planning includes drafting a charter, scope, andobjectives aligned with overall governance efforts and establishing a communications plan to supportthe effort. Key stakeholders should be involved in planning.
2.1.2. Conduct key stakeholder interviews:
Interviews with business and technical stakeholder provide a foundation of knowledge for the Metadata strategy.
2.1.3. Assess existing Metadata sources and information architecture:
Assessment determines the relativedegree of difficulty in solving the Metadata and systems issues identified in the interviews and documentation review. During this stage, conduct detailed interviews of key IT staff and reviewdocumentation of the system architectures, data models, etc.
2.1.4. Develop future Metadata architecture:
Refine and confirm the future vision, and develop the long-term target architecture for the managed Metadata environment in this stage. This phase must accountfor strategic components, such as organization structure, alignment with data governance andstewardship, managed Metadata architecture, Metadata delivery architecture, technical architecture,and security architecture.
2.1.5. Develop a phased implementation plan:
Validate, integrate, and prioritize findings from the interviews and data analyses. Document the Metadata strategy and define a phased implementation approach to move from the existing to the future managed Metadata environment.
2.2. Understand Metadata Requirements
2.2.1. Metadata requirements start with content: What Metadata is needed and at what level.
2.2.2. There are also many functionality-focused requirements associated with a comprehensive Metadata solution
1. Volatility 更新频次: How frequently Metadata attributes and sets will be updated
2. Synchronization 同步情况: Timing of updates in relation to source changes
3. History: 历史信息 Whether historical versions of Metadata need to be retained
4. Access rights 访问权限: Who can access Metadata and how they access, along with specific user interfacefunctionality for access
5. Structure 存储结构: How Metadata will be modeled for storage
6. Integration 集成要求: The degree of integration of Metadata from different sources; rules for integration
7. Maintenance 运维要求: Processes and rules for updating Metadata (logging and referring for approval)
8. Management 管理要求: Roles and responsibilities for managing Metadata
9. Quality 质量要求: Metadata quality requirements
10. Security 安全要求: Some Metadata cannot be exposed because it will reveal the existence of highly protecteddata
2.3. Define Metadata Architecture
2.3.1. Rules
A Metadata Management system must be capable of extracting Metadata from many sources.
A managed Metadata environment should isolate the end user from the various and disparate Metadata sources.
Design of the architecture depends on the specific requirements of the organization. Three technical architectural approaches to building a common Metadata repository mimic the approaches to designing data warehouses: centralized, distributed, and hybrid
2.3.2. Steps
1. Create MetaModel 创建元模型
Create a data model for the Metadata repository, or metamodel 元模型, as one of the first design steps after the Metadata strategy is complete and the business requirements are understood.
the metamodel is in itself a valuable source of Metadata.
Metadata repository metamodel

2. Apply Metadata Standards 应用元数据标准
The Metadata solution should adhere to the agreed-upon internal and external standards as identified in the Metadata strategy.
Metadata should be monitored for compliance by governance activities. Organization internal Metadata standards include naming conventions, custom attributions, security, visibility, and processing documentation.
3. Manage Metadata Stores 管理元数据存储
Implement control activities to manage the Metadata environment
Control activities should have data governance oversight.
Control activities
1. Job scheduling and monitoring
2. Load statistical analysis
3. Backup, recovery, archive, purging
4. Configuration modifications
5. Performance tuning
6. Query statistics analysis
7. Query and report generation
8. Security management
Quality control activities include:
1. Quality assurance, quality control
2. Frequency of data update – matching sets to timeframes
3. Missing Metadata reports
4. Aging Metadata report
Metadata management activities include:
1. Loading, scanning, importing and tagging assets
2. Source mapping and movement
3. Versioning
4. User interface management
5. Linking data sets Metadata maintenance – for NOSQL provisioning
6. Linking data to internal data acquisition – custom links and job Metadata
7. Licensing for external data sources and feeds
8. Data enhancement Metadata, e.g., Link to GIS
And training, including:
1. Education and training of users and data stewards
2. Management metrics generation and analysis
3. Training on the control activities and query and reporting
2.4. Create and Maintain Metadata
2.4.1. Several general principles of Metadata management describe the means to manage Metadata for quality:
Accountability 责任: Recognize that Metadata is often produced through existing processes (data modeling,SDLC, business process definition) and hold process owners accountable 流程的执行者负责 for the quality of Metadata.
Standards 标准: Set, enforce, and audit standards for Metadata to simplify integration and enable use.
31. By setting enforcing and auditing metadata standards organizations hope to A:ensure the appropriate classification or meta-metadata B:simplify integration and enable use C:standardize business rules in operational processes D:ease of understanding data dictionaries E:provide activities for the data governance office 正确答案:B 你的答案:B 解析:正确答案:B来源:12.2.4题解:如12.2.4节所述,元数据是通过一系列过程创建的,并存储在组织中的不同地方。为保证高质量的元数据,应把元数据当作产品来进行管理。好的元数据不是偶然产生的,而是认真计划的结果(参见第13章)。元数据管理的几个一般原则描述了管理元数据质量的方法:1)责任(Accountability)。认识到元数据通常通过现有流程产生(数据建模,SDLC,业务流程定义),因此流程的执行者对元数据的质量负责。2)标准(Standards)。制定、执行和审计元数据标准,简化集成过程,并且适用。3)改进(Improvement)。建立反馈机制保障用户可以将不准确或已过时的元数据通知元数据管理团队。如其他类型数据一样,可以对元数据进行剖析和质量的检查。作为项目工作的可审计部分,元数据维护工作应按计划进行或完成。
Improvement 改进: Create a feedback mechanism so that consumers can inform the Metadata Managementteam of metadata that is incorrect or out-of-date.
2.4.2. Integrate Metadata 整合元数据
Integration processes gather and consolidate Metadata from across the enterprise, including Metadata from data acquired outside the enterprise.
The Metadata repository should integrate extracted technical Metadata with relevant business, processes, and stewardship Metadata.
Challenges arise in integration that will require governance.
Accomplish repository scanning in two distinct approaches.
Proprietary interface 专用接口:
In a single-step scan and load process, a scanner collects the Metadata from asource system, then directly calls the format-specific loader component to load the Metadata into therepository. In this process, there is no format-specific file output and the collection and loading ofMetadata occurs in a single step.
Semi-proprietary interface 半专用接口:
In a two-step process, a scanner collects the Metadata from a sourcesystem and outputs it into a format-specific data file. The scanner only produces a data file that thereceiving repository needs to be able to read and load appropriately. The interface is a more openarchitecture, as the file is readable by many methods.
A scanning process uses and produces several types of files during the process.
1. Control file 控制文件: Containing the source structure of the data model
2. Reuse file 重用文件: Containing the rules for managing reuse of process loads
3. Log files 日志文件: Produced during each phase of the process, one for each scan or extract and one for eachload cycle
4. Temporary and backup files 临时和备份文件: Use during the process or for traceability
Data Integration tools used for data warehousing and Business Intelligence applications are often used effectively in Metadata integration processes.
2.4.3. Distribute and Deliver Metadata 分发和传递元数据
1. Metadata is delivered to data consumers and to applications or tools that require Metadata feeds. Delivery mechanisms include:
1. Metadata intranet websites for browse, search, query, reporting, and analysis
2. Reports, glossaries and other documents
3. Data warehouses, data marts, and BI (Business Intelligence) tools
4. Modeling and software development tools
5. Messaging and transactions
6. Web services and Application Programming Interfaces (APIs)
7. External organization interface solutions (e.g., supply chain solutions)
2.5. Query, Report, and Analyze Metadata
2.5.1. Metadata guides the use of data assets. Use Metadata in Business Intelligence (reporting and analysis), business decisions (operational, tactical, strategic), and in business semantics (what they say, what they mean – business lingo’).
2.5.2. A Metadata repository must have a front-end application that supports the search-and-retrieval functionality required for all this guidance and management of data assets.
2.5.3. Some reports facilitate future development such as change impact analysis, or trouble shoot varying definitions for data warehouse and Business Intelligence projects, such as data lineage reports.
19. we would expect to consult the Metadata Library when: A:Accessing the internet B:Implementing a data Quality tool C:Selecting a Data Storage device D:Formulating a Governance policy E:Assessing the impact of change 正确答案:E 你的答案:A 解析:12.2.5查询、报告和分析元数据 元数据指导如何使用数据资产:在商务智能(报表和分析)、商业决策(操作型、运营型和战略型)以及业务语义(业务所述内容及其含义)方面使用元数据。元数据存储库应具有前端应用程序,并支持查询和获取功能,从而满足以上各类数据资产管理的需要。提供给业务用户的应用界面和功能与提供给技术用户和开发人员的界面和功能有所不同,后者可能会包括有助于新功能开发(如变更影响分析)或有助于解决数据仓库和商务智能项目中数据定义问题(如数据血缘关系报告)的功能
2.5.4. 24. We do not expect to consult the metadata repository when A:Undertaking a data quality assessment B:Investigating a data issue C:Updating the operating system that the Master Data management toolset is running on D:None of the these E:Assessing the impact of change 正确答案:C 你的答案:C 解析:12.2.5:12.2.5查询、报告和分析元数据元数据指导如何使用数据资产:在商务智能(报表和分析)、商业决策(操作型、运营型和战略型)以及业务语义(业务所述内容及其含义)方面使用元数据。元数据存储库应具有前端应用程序,并支持查询和获取功能,
3. Tools
3.1. Metadata Repository Management Tools 元数据存储库管理工具
3.1.1. Metadata Management tools provide capabilities to manage Metadata in a centralized location (repository). The Metadata can be either manually entered or extracted from various other sources through specialized connecters. Metadata repositories also provide capabilities to exchange Metadata with other systems.
3.1.2. Metadata management tools and repositories themselves are also a source of Metadata, especially in a hybrid Metadata architectural model or in large enterprise implementations. Metadata management tools allow for the exchange of the collected Metadata with other Metadata repositories, enabling the collection of various and diverse Metadata from different sources into a centralized repository, or enabling the enriching and standardization of the diverse Metadata as it moves between the repositories.
4. Techniques
4.1. Data Lineage and Impact Analysis
4.1.1. A key benefit of discovering and documenting Metadata about the physical assets is to provide information on how data is transformed as it moves between systems.
33. Discovering and documenting metadata about physical data assets provides A:insights into the temporal data quality B:an estimation of balance sheet value of enterprise data C:information on how data is transformed as it moves between systems D:effective project scope management E:scoping boundaries of the data dictionary 正确答案:C 你的答案:C 解析:1.2.5:组织中数据生命周期的细节可能非常复杂,因为数据不仅具有生命周期,而且具有血缘(它从起点移动到使用点的路径,也称为数据链)。了解数据血缘需要记录数据集的起源,以及它们在访问和使用它们的系统中的移动和转换。生命周期和血缘相互交叉,有助于相互理解。一个组织越了解数据的生命周期和血缘关系,管理数据的能力就越强。
Many Metadata tools carry information about what is happening to the data within their environments and provide capabilities to view the lineage across the span of the systems or applications they interface.
The current version of the lineage based on programming code is referred to as ‘As Implemented Lineage’. 实现态
In contrast, lineage describe in mapping specification documents is referred to as ‘As Designed Lineage’. 设计态
7. Data lineage provides A:data content-specifying for example kinds of data to be collected, i. B:data sources-derivations,calculations cleansing and other processes C:data structure-that describes the data objects D:data meaning-business definitions and distinction between similar kinds of thingsE.none E:codes,text. numbers currencies. dates, etc 正确答案:B 你的答案:B 解析::B是描述数据变更的生命周期,符合血缘所代表的含义12.4.1数据血缘和影响分析发现和记录数据资产的元数据的一个重要意义在于提供了数据如何在系统间转移的信息。许多元数据工具中存储着某个环境中数据现况的信息,并提供查看跨系统或应用程序接口的血缘功能。
4.1.2. The limitations of a lineage build are based on the coverage of the Metadata management system. Function-specific Metadata repositories or data visualization tools have information about the data lineage within the scope of the environments they interact with but will not provide visibility to what is happening to the data outside their environments.
4.1.3. The process of connecting the pieces of the data lineage referred to as stitching 拼接. It results in a holistic visualization 全景视图 of the data as it moves from its original locations (official source or system of record) until it lands in its final destination.
4.1.4. As the number of data elements in a system grows, the lineage discovery becomes complex and difficult to manage. In order to successfully achieve the business goals, a strategy for discovering and importing assets into the Metadata repository requires planning and design. Successful lineage discovery needs to account for both business and technical focus:
Business focus 业务焦点: Limit the lineage discovery to data elements prioritized by the business. Start from thetarget locations and trace back to the source systems where the specific data originates. By limiting thescanned assets to those that move, transfer, or update the selected data elements, this approach willenable business data consumers to understand what is happening to the specific data element as itmoves through systems. If coupled with data quality measurements, lineage can be used to pinpointwhere system design adversely impacts the quality of the data.
Technical focus 技术焦点: Start at the source systems and identify all the immediate consumers, then identify all the subsequent consumers of the first set identified and keep repeating these steps until all systems areidentified. Technology users benefit more from the system discovery strategy in order to help answerthe various questions about the data. This approach will enable technology and business users toanswer question about discovering data elements across the enterprise, like “Where is social securitynumber?” or generate impact reports like “What systems are impacted if the width of a specific columnis changed?” This strategy can, however, be complex to manage.
4.1.5. Documented lineage helps both business and technical people use data.
4.2. Metadata for Big Data Ingest 应用于大数据采集的元数据
4.2.1. Some unstructured sources will be internal to the organization, and some will be external. In either case, there is no longer a need to physically bring the data to one place.
4.2.2. Through the new technologies, the program will go to the data as opposed to moving the data to the program, reducing the amount of data movement, and speeding up the execution of the process.
4.2.3. Proceed
Metadata tags should be applied to data upon ingestion 采集. Metadata then can be used to identify data content available for access in the data lake.
Data profiling 剖析 can identify data domains, relationships, and data quality issues.
On ingestion, Metadata tags can be added 打标 to identify sensitive or private (like Personally Identifiable Information – PPI) data
Data scientists may add confidence, textual identifiers, and codes representing behavior clusters 关联簇
5. Implementation Guidelines
5.1. Readiness Assessment / Risk Assessment
5.1.1. the lack of high quality Metadata might result in:
1. Errors in judgment due to incorrect, incomplete or invalid assumptions or lack of knowledge about thecontext of the data
2. Exposure of sensitive data, which may put customers or employees at risk, or impact the credibility ofthe business and lead to legal expenses
3. Risk that the small set of SMEs who know the data will leave and take their knowledge with them
5.1.2. Risk is reduced when an organization adopts a solid Metadata strategy. Organizational readiness is addressed by a formal assessment of the current maturity in Metadata activities. The assessment should include the critical business data elements, available Metadata glossaries, lineage, data profiling and data quality processes, MDM (Master Data Management) maturity, and other aspects. Findings from the assessment, aligned with business priorities, will provide the basis for a strategic approach to improvement of Metadata Management practices. A formal assessment also provides the basis for a business case, sponsorship and funding.
5.1.3. The Metadata strategy may be part of an overall data governance strategy or it may be the first step in implementing effective data governance. A Metadata assessment should be conducted via objective inspection of existing Metadata, along with interviews with key stakeholders. The deliverables from a risk assessment include a strategy and roadmap.
5.2. Organizational and Cultural Change
5.2.1. Like other data management efforts, Metadata initiatives often meet with cultural resistance.
5.2.2. Metadata Management is a low priority in many organizations. An essential set of Metadata needs coordination and commitment in an organization
5.2.3. Implementation of an enterprise data governance strategy needs senior management support and engagement. It requires that business and technology staff be able to work closely together in a cross-functional manner.
6. Metadata Governance
6.1. Process Controls 过程控制
6.1.1. Integration of the Metadata strategy into the SDLC is needed to ensure that changed Metadata is collected when it is changed. This helps ensure Metadata remains current.
6.2. Documentation of Metadata Solutions 元数据解决方案的文档
6.2.1. A master catalog of Metadata will include the sources and targets currently in scope. This is a resource for IT and business users and can be published out to the user community as a guide to ‘what is where’ and to set expectations on what they will find:
1. Metadata implementation status
2. Source and the target Metadata store
3. Schedule information for updates
4. Retention and versions kept
5. Contents
6. Quality statements or warnings (e.g., missing values)
7. System of record and other data source statuses (e.g., data contents history coverage, retiring orreplacing flags)
8. Tools, architectures, and people involved
9. Sensitive information and removal or masking strategy for the source
6.3. Metadata Standards and Guidelines 元数据标准和指南
6.3.1. Metadata standards are essential in the exchange of data with operational trading partners. Companies realize the value of information sharing with customers, suppliers, partners, and regulatory bodies. The need for sharing common Metadata to support the optimal usage of shared information has spawned many sector-based standards
6.3.2. Tool vendors provide XML and JSON or REST support to exchange data for their data management products.
6.4. Metrics
6.4.1. Metadata repository completeness 元数据存储库完整性
Compare ideal coverage of the enterprise Metadata (all artifactsand all instances within scope) to actual coverage. Reference the strategy for scope definitions.
6.4.2. Metadata Management Maturity 元数据管理成熟度
Metrics developed to judge the Metadata maturity of theenterprise, based on the Capability Maturity Model (CMM-DMM) approach to maturity assessment.(See Chapter 15.)
6.4.3. Steward representation 专职人员配置
Organizational commitment to Metadata as assessed by the appointment ofstewards, coverage across the enterprise for stewardship, and documentation of the roles in jobdescriptions.
6.4.4. Metadata usage 元数据使用情况
User uptake on the Metadata repository usage can be measured by repository logincounts. Reference to Metadata by users in business practice is a more difficult measure to track.Anecdotal measures on qualitative surveys may be required to capture this measure.
6.4.5. Business Glossary activity 业务术语活动
Usage, update, resolution of definitions, coverage.
6.4.6. Master Data service data compliance 主数据服务数据遵从性
Shows the reuse of data in SOA solutions. Metadata on thedata services assists developers in deciding when new development could use an existing service.
6.4.7. Metadata documentation quality 元数据文档质量
Assess the quality of Metadata documentation through bothautomatic and manual methods. Automatic methods include performing collision logic on two sources,measuring how much they match, and the trend over time. Another metric would measure thepercentage of attributes that have definitions, trending over time. Manual methods include random orcomplete survey, based on enterprise definitions of quality. Quality measures indicate thecompleteness, reliability, currency, etc., of the Metadata in the repository.
6.4.8. Metadata repository availability 元数据存储库可用性
Uptime, processing time (batch and query).
7. Works Cited / Recommended
7.1. 4. Each meta-data software tool must be integrated _ before it can be integrated _ A:by the vendor; by the developer B:at the selection process; at the implementation process C:at the meta-data level; at the data level D:at the requirements stage; at the user acceptance stage E:metadata, metadata 正确答案:C 你的答案:C 解析:正确答案:C来源:12.4.1题解: C合理。许多元数据工具中存储着某个环境中数据现况的信息,并提供查看跨系统或应用程序接口的血缘功能。
7.2. 5. Meta-data should be stored _ and not stored _ A:in one huge set, categorized B:categorized, in one huge set C:remotely, in the enterprise D:in spreadsheets, a central database E:all 正确答案:B 你的答案:B 解析:AB冲突
7.3. 6. Taxonomy meta-data is used to A:enforce constraints upon the data B:specify roles,classify descriptions, guide and control C:help people find data items or group of data items. D:help the blind lead the blind through the data mind-field. E:None 正确答案:C 你的答案:C 解析:分类是为了分组
7.4. 8. A piece of software that can translate meta-data from one product into that of another is called a meta-data A:interface B:router C:bridge D:translator E:all 正确答案:D 你的答案:D 解析:转换是转换器
7.5. 14. Data and information are A:intertwined and dependent on each other. B:used only in the context of business intelligence C:pillars of the modern organizational Parthenon D:representations of truth E:completely separate things 正确答案:A 你的答案:A 解析:正确答案:A来源:12.2.2题解:3)金字塔模型意味着数据和信息是分开的,但事实上这两个概念是相互交织并相互依赖的。数据是信息的一种形式,信息也是数据的一种形式。
7.6. 17. what would you not expect to find in the Meta Data repository A:Data Dictionary B:Data storage devices C:Data models D:Data Lineage diagrams and models E:Data Requirements 正确答案:B 你的答案:B 解析:无设备
7.7. 20. The number of artefacts 工件 that must be searched in the metadata repository for all business change projects are A:Conceptual data models and the Business data glossary must be examined B:The Business Data glossary data Dictionary must be examined C:conceptual,Logical and Physical models must be examined D:The Business Data Glossary Systems Inventory must be consulted E:There is no mandatory number of artefacts to be searched but it is highly recommended that the library is examined 正确答案:E 你的答案:A
7.8. 26. Metadata repository processes will not include Managing change to data products (eg Data Dictionary or Business Data Glossary) A:entries eg new data term to be defined,new data requirement,new database tables added new system included into the technical landscape. B:"Selecting Data Management Library software,search, and storage technologies C:Controlling versions of data product will be required to manage the required single published master copy in conjunction with the variants potentially established as work in progress You Answered D:All of these E:Assessing impact where change to existing data product entries are proposed eg the impact of change on related data on other systems 正确答案:B 你的答案:B
7.9. 28. what is the difference between an industry and a consensus meta-Data standard? A:The terms are used interchangeably to describe the same concept B:Consensus standards are formed by an international panel of experts whereas industry standards are dictated by a panel of vendors C:Industry standards are determined by regulators within a given global region and consensus standards are agreed onthe Data Governance Council within an organization D:Industry Standards refer to internationally approved global standards such as IS0 whereas consensus standards refer to those agreed to within an organization E:Consensus standards are formed by government legislation whereas industry standards evolve from best practice 正确答案:A 你的答案:D
7.10. 34. The search function associated with a document management store is failing to return known artefacts this is due to a failure of : A:maintaining public access to all documents in the document management store B:effective data quality metrics C:data privacy and confidentiality procedures D:maintaining appropriate metadata on each document E:business intelligence implementation 正确答案:D 你的答案:D
7.11. 41. Which of these statements has the most meaningful relationship label? A:An order line contains orders B:An order is composed of order lines. 组成 C:An order is related to order lines D:An order is associated with order lines E:An order is connected with order lines 正确答案:B 你的答案:D 解析:正确答案:B来源:12.4.2题解:元数据标签应在采集时应用于数据,然后元数据可以用来识别可访问的数据湖中的数据内容。大部分采集引擎采集数据后进行数据剖析,数据剖析可以识别出数据域、数据关系和数据质量问题,并打上标签。采集数据时,识别到敏感或隐私(如个人身份信息,PPI)数据时应添加元数据标签。例如,数据科学家会添加关于置信度、文本标识符和表示集群行为的代码(参见第14章)。
7.12. M.97. A type of Master data architecture is A:Virtualized B:Repository C:Hybrid D:All of the above E:Registry 正确答案:D 你的答案:C
7.13. M.43. These are examples of which type of Meta-Data: Data Stores data lnvolved ,Government/Regulatory Bodies Roles Responsibilities; Process Dependencies and Decomposition A:Process Meta-Data B:operational Meta-Data C:Business Meta-Data D:Data Stewardship Meta-Data E:Technical Meta-Data 正确答案:A 你的答案:B