导图社区 DAMA DMBOK2.0全知识点总结（第7-9章数据安全数据集成和互操作文件和内容管理）

DAMA DMBOK2.0全知识点总结（第7-9章数据安全数据集成和互操作文件和内容管理）

CDMP，全称Certified for Data Management Professional，即数据管理专业人士认证，由数据管理国际协会DAMA International建立，是一项涵盖学历教育、工作经验和专业知识考试在内的综合认证。总结了CDMP英文考试的所有知识点，考点，以及历史真题。适用于从事数据管理，数据治理，数字转型等方面的高级职业认证。章节和知识点较多，因此分章节和完成时间分发。 1-3章 4-6章 7-9章 10-12章 13-17章考证 CDMP 数据管理 DMBOK 数字化转型 DAMA 数字化数据管理专家

编辑于2023-03-18 20:21:50 北京市

数字化
DAMA
CDMP
DMBOK
数据管理专家

七日

他的近期作品查看更多>>

DAMA DMBOK2.0全知识点总结（第7-9章数据安全数据集成和互操作文件和内容管理）

社区模板帮助中心，点此进入>>

七日

他的近期作品查看更多>>

相似推荐
大纲

论语孔子简单思维导图
- 73.9k
- 825
- 999
- 417
MindMaster
《傅雷家书》思维导图
- 121.8k
- 1.6k
- 2.6k
- 1.2k
MindMaster
《童年》读书笔记
- 41.8k
- 460
- 957
- 331
MindMaster
《茶馆》思维导图
- 10.2k
- 171
- 181
- 39
MindMaster
《朝花夕拾》篇目思维导图
- 22.8k
- 490
- 1.1k
- 288
MindMaster
《昆虫记》思维导图
- 27.9k
- 255
- 769
- 269
MindMaster
《安徒生童话》思维导图
- 15.2k
- 259
- 259
- 64
MindMaster
《鲁滨逊漂流记》读书笔记
- 19.0k
- 289
- 544
- 163
MindMaster
《这样读书就够了》读书笔记
- 93.1k
- 12.2k
- 8.9k
- 2.2k
Ethan
妈妈必读：一张0-1岁孩子认知发展的精确时间表
- 7.7k
- 1.6k
- 395
- 43
Ethan

DAMA知识点第7-9章

Chapter 7: Data Security 数据安全

1. Introduction

1.1. Definition:

1.1.1. Definition, planning, development, and execution of security policies and procedures to provide proper authentication 身份验证, authorization, access, and auditing of data and information assets.

31. which of these statements best defines data security management? A：The planning ，implementation ， and testing of security technologies, authentication mechanisms，and other controls to prevent access to information B：The implementation and execution of checkpoints，checklists, controls，and technical mechanisms to govern the access to information in an enterprise C：None of these D：The planning，development, and execution of security policies and procedures to provide proper authentication， authorization, access, and auditing of data and information assets E：The definition of controls technical standards frameworks and audit trail capabilities to identify who has or has had access to information 正确答案：D 你的答案：D 解析：7.1.1. 数据安全包括安全策略和过程的规划、建立与执行，为数据和信息资产提供正确的身份验证、授权、访问和审计。

1.1.2. These requirements come from:

Stakeholders

Organizations must recognize the privacy and confidentiality needs of their stakeholders, including clients, patients, students, citizens, suppliers, or business partners.

38. stakeholders whose concerns must be addressed in data security management include A：External Standards organizations Regulators or the Media B：Media analysts, Internal Risk Management, Suppliers, or Regulators C：The Internal Audit and risk committees of the organization D：All of these E：Clients，Patients，Citizens，Suppliers, or Business Partners 正确答案：E 你的答案：E 解析：7.1：虽然数据安全的详细情况（如哪些数据需要保护）因行业和国家有所不同，但是数据安全实线的目标是相同的，即根据隐私和保密法规、合同协议和业务要求来保护信息资产，这些要求来自以下几个方面。（1）利益相关方应识别利益相关方的隐私和保密需求，包括客户、病人、学生、公民、供应商或商业伙伴等。组织中的每个人必须是对利益相关方数据负有责任的受托人。

Government regulations:

Government regulations are in place to protect the interests of somestakeholders.

Proprietary business concerns

Each organization has proprietary data to protect. An organization’sdata provides insight into its customers and, when leveraged effectively, can provide a competitiveadvantage.

Legitimate access needs

Business processes require individuals in certain roles be able to access, use, and maintain data.

Contractual obligations

Contractual and non-disclosure agreements also influence data security requirements.

1.1.3. Effective data security policies and procedures ensure that the right people can use and update data in the right way, and that all inappropriate access and update is restricted

39. which of these are characteristics of an effective data security policy A：None of these B：The procedures defined are benchmarked，supported by technology framework based and peer reviewed C：The defined procedures are tightly defined with rigid and effective enforcement sanctions 制裁, and alignment with technology capabilities D：The defined procedures ensure that the right people can use and update data in the right way, and that all inappropriate access and update is restricted E：The policies are specific，measurable，achievable，realistic，and technology aligned 正确答案：D 你的答案：D 解析：7.1：有效的数据安全策略和过程确保合法用户能以正确的方式使用和更新数据，并且限制所有不适当的访问和更新。了解并遵守所有利益相关方隐私、保密需求，符合每个组织的最高利益。客户、供应商和各相关方都信任并依赖数据的可靠使用。

1.2. Business Drivers

1.2.1. Risk reduction and business growth are the primary drivers of data security activities.

Data security risks are associated with regulatory compliance, fiduciary responsibility for the enterprise and stockholders, reputation, and a legal and moral responsibility to protect the private and sensitive information of employees, business partners, and customers.

Business growth includes attaining and sustaining operational business goals. Data security issues, breaches, and unwarranted restrictions on employee access to data can directly impact operational success.

1.2.2. The goals of mitigating risks and growing the business can be complementary and mutually supportive if they are integrated into a coherent strategy of information management and protection

1.2.3. Risk Reduction

As data regulations increase — usually in response to data thefts and breaches — so do compliance requirements.

As with other aspects of data management, it is best to address data security as an enterprise initiative.

Information security begins by classifying an organization’s data in order to identify which data requires protection.

Steps:

Identify and classify sensitive data assets:

Locate sensitive data throughout the enterprise

Determine how each asset needs to be protected

Identify how this information interacts with business processes:

In addition to classifying the data itself, it is necessary to assess external threats (such as those from hackers and criminals) and internal risks (posed by employees and processes).

1.2.4. Business Growth

Product and service quality relate to information security in a quite direct fashion: Robust 强健的 information security enables transactions and builds customer confidence.

1.2.5. Security as an Asset

One approach to managing sensitive data is via Metadata.

Security classifications and regulatory sensitivity can be captured at the data element and data set level. Technology exists to tag data so that Metadata travel with the information as it flows across the enterprise. Developing a master repository of data characteristics means all parts of the enterprise can know precisely what level of protection sensitive information requires.

Standard security Metadata can optimize data protection and guide business usage and technical support processes, leading to lower costs.

When sensitive data is correctly identified as such, organizations build trust with their customers and partners. Security-related Metadata itself becomes a strategic asset, increasing the quality of transactions, reporting, and business analysis, while reducing the cost of protection and associated risks that lost or stolen information cause.

1.2.6. 54. Primary drivers of data security activities are A：data quality and intellectual property protection B：risk reduction and business growth C：data protection and flexible database design D：risk control and content management. E：glossary management and risk reduction 正确答案：B 你的答案：B 解析：7.1.1业务驱动因素降低风险和促进业务增长是数据安全活动的主要驱动因素。确保组织数据安全，可降低风险并增加竞争优势。安全本身就是宝贵的资产。

1.3. Goals and Principles

1.3.1. Goals:

1. Enable appropriate, and prevent inappropriate, access to enterprise data assets.

2. Understand and comply with all relevant regulations and policies for privacy, protection, and confidentiality.

3. Ensure that the privacy and confidentiality needs of all stakeholders are enforced and audited.

36. A part from security requirements internal to the organization what other strategic goals should a Data security management system address? A：Ensuring the organization doesn't engage in SPAM marketing B：Regulatory requirements for privacy and confidentiality AND Privacy and Confidentiality needs of all stakeholders C：compliance with ISO27001 and HIPPA D：Compliance with ISO29100 and PCI-DSS E：None of these 正确答案：B 你的答案：B 解析：·7.1.2目标和原则1.目标数据安全活动目标，包括以下几个方面：1）支持适当访问并防止对企业数据资产的不当访问。2）支持对隐私、保护和保密制度、法规的遵从。3）确保满足利结相关方对隐私和保密的要求。

42. A security mechanism that searches for customer bank account details in outgoing emails is achieving the goal of A：ensuring stakeholder requirements for openness and transparency are met B：ensuring stakeholder requirements for service design and experience are met C：ensuring stakeholder requirements for confidentiality and privacy are met. D：ensuring stakeholder requirements for concise definitions and usage are met. E：ensuring stakeholder requirements for response time and availability levels are met. 正确答案：C 你的答案：C 解析：7.1.2：1.目标数据安全活动目标，包括以下几个方面：1）支持适当访问并防止对企业数据资产的不当访问。2）支持对隐私、保护和保密制度、法规的遵从。3）确保满足利益相关方对隐私和保密的要求。

53. The stakeholder requirements for privacy and confidentiality are goals found in: A：data quality. B：data security C：data architecture D：document and content management. E：metadata management. 正确答案：B 你的答案：B 解析：7.1.2：1.目标数据安全活动目标，包括以下几个方面：1）支持适当访问并防止对企业数据资产的不当访问。2）支持对隐私、保护和保密制度、法规的遵从。3）确保满足利益相关方对隐私和保密的要求。

1.3.2. Principles

1. Collaboration

Data Security is a collaborative effort involving IT security administrators, datastewards/data governance, internal and external audit teams, and the legal department.

2. Enterprise approach 企业统筹

Data Security standards and policies must be applied consistently across theentire organization.

3. Proactive management

Success in data security management depends on being proactive anddynamic, engaging all stakeholders, managing change, and overcoming organizational or culturalbottlenecks such as traditional separation of responsibilities between information security, informationtechnology, data administration, and business stakeholders.

4. Clear accountability

Roles and responsibilities must be clearly defined, including the ‘chain ofcustody’ for data across organizations and roles.

5. Metadata-driven

Security classification for data elements is an essential part of data definitions.

6. Reduce risk by reducing exposure 减少接触

Minimize sensitive/confidential data proliferation, especially tonon-production environments.

1.4. Essential Concepts

1.4.1. Vulnerability 脆弱性

A vulnerability is a weaknesses or defect in a system that allows it to be successfully attacked and compromised –essentially a hole in an organization’s defenses. Some vulnerabilities are called exploits 漏洞敞口

20. A weaknesses or defect of system that allows it to be successfully attacked and compromised This is called A：analysis B：replication C：archiving D：auditing E：vulnerability 正确答案：E 你的答案：E 解析： 7.1.3题解：脆弱性（Vulnerability）是系统中容易遭受攻击的弱点或缺陷，本质上是组织防御中的漏洞。某些脆弱性称为漏洞敞口。

43. A weakness or defect in a system that allows it to be successfully attacked and compromised A：vulnerability. B：feature C：chasm D：risk E：threat 正确答案：A 你的答案：A 解析：7.1.3.：1.脆弱性脆弱性（Vulnerability）是系统中容易遭受攻击的弱点或缺陷，本质上是组织防御中的漏洞。某些脆弱性称为漏洞敞口。例如，存在过期安全补丁的网络计

Examples include network computers with out-of-date security patches, web pages not protected with robust passwords, users not trained to ignore email attachments from unknown senders, or corporate software unprotected against technical commands that will give the attacker control of the system.

non-production environments are more vulnerable to threats than production environments

1.4.2. Threat 威胁

A threat is a potential offensive action that could be taken against an organization. Threats can be internal or external.

19. A ___ is a potential offensive action that could be taken against an organization A：analysis B：replication C：archiving D：auditing E：threat 正确答案：E 你的答案：E 解析：7.1.3：威胁（Threat）是一种可能对组织采取的潜在进攻行动。威胁包括发送到组织感染病毒的电子邮件附件、使网络服务器不堪重负以致无法执行业务（拒绝服务攻击）的进程，以及对已知漏洞的利用等。威胁可以是内部的，也可以是外部的。他们并不总是恶意的。

They are not always malicious. An uniformed insider can take offensive actions again the organization without even knowing it. Threats may relate to specific vulnerabilities, which then can be prioritized for remediation. Each threat should match to a capability that either prevents the threat or limits the damage it might cause. An occurrence of a threat is also called an attack surface. 攻击面

Examples of threats include virus-infected email attachments being sent to the organization, processes that overwhelm network servers and result in an inability to perform business transactions (also called denial-of-service attacks),

47. A denial of service attack 拒绝服务攻击 is typically accomplished by A：emailing virus laden attachments B：interrupting the mains electricity supply. C：corrupting the user name and password D：a stop-work action by the workforce E：flooding the target machine with superfluous requests 正确答案：E 你的答案：E 解析：7.1.3：2.威胁威胁（Threat）是一种可能对组织采取的潜在进攻行动。威胁包括发送到组织感染病毒的电子邮件附件、使网络服务器不堪重负以致无法执行业务（拒绝服务攻击）的进程，以及对已知漏洞的利用等。威胁可以是内部的，也可以是外部的。

1.4.3. Risk 风险

The term risk refers both to the possibility of loss and to the thing or condition that poses the potential loss.

17. The term refers both to the possibility of loss and to the thing or condition that poses the potential loss. This term is called A：analysis B：replication C：archiving D：auditing E：Risk 正确答案：E 你的答案：E 解析：7.1.3. 风险（Risk）既指损失的可能性，也指构成潜在损失的事物或条件。

calculated by following factors.

1. Probability that the threat will occur and its likely frequency

2. The type and amount of damage created each occurrence might cause, including damage to reputation

3. The effect damage will have on revenue or business operations

4. The cost to fix the damage after an occurrence

5. The cost to prevent the threat, including by remediation of vulnerabilities

6. The goal or intent of the probable attacker

7. Risks can be prioritized by potential severity of damage 损害程度 to the company, or by likelihood of occurrence 发生可能性,

Risks can be prioritized by potential severity of damage to the company, or by likelihood of occurrence, with easily exploited vulnerabilities creating a higher likelihood of occurrence. Often a priority list combines both metrics. Prioritization of risk must be a formal process among the stakeholders.

1.4.4. Risk Classifications 风险分类

Risk classifications describe the sensitivity of the data and the likelihood that it might be sought after for malicious purposes.

The highest security classification of any datum within a user entitlement determines the security classification of the entire aggregation.

17. Documents and records should be classified based on the level of confidentiality for information found in the record A：Average B：Highest C：overall D：General E：General 正确答案：B 你的答案：B 解析：9.2.2：（3）处理敏感数据组织有义务通过识别和保护敏感数据来保护隐私。数据安全或数据治理通常会建立保密方案，并确定哪些资产是机密的或受限制的。制作或拼装内容的人必须要应用这些分类。必须根据制度和法律要求将文件、网页和其他内容组件标记为是否敏感。一旦被标记为敏感，机密数据要么被屏蔽，要么在适当的情况下被删除（参见第7章 7.1.3.用户权限内所有数据中的最高安全分类决定了整体的安全分类）。

Include

1. Critical Risk Data (CRD) 关键风险数据

Personal information aggressively sought for unauthorized use by bothinternal and external parties due to its high direct financial value. Compromise of CRD would not onlyharm individuals, but would result in financial harm to the company from significant penalties, costs toretain customers and employees, as well as harm to brand and reputation.

2. High Risk Data (HRD) 高风险数据

HRD is actively sought for unauthorized use due to its potential directfinancial value. HRD provides the company with a competitive edge. If compromised, it could exposethe company to financial harm through loss of opportunity. Loss of HRD can cause mistrust leading tothe loss of business and may result in legal exposure, regulatory fines and penalties, as well as damageto brand and reputation.

3. Moderate Risk Data (MRD) 中风险数据

Company information that has little tangible value to unauthorizedparties; however, the unauthorized use of this non-public information would likely have a negativeeffect on the company.

1.4.5. Data Security Organization 数据安全组织

Depending on the size of the enterprise, the overall Information Security function may be the primary responsibility of a dedicated Information Security group, usually within the Information Technology (IT) area. Larger enterprises often have a Chief Information Security Officer (CISO) who reports to either the CIO or the CEO. In organizations without dedicated Information Security personnel, responsibility for data security will fall on data managers.

In all cases, data managers need to be involved in data security efforts.

1.4.6. Security Processes 数据安全过程

Data security requirements and procedures are categorized into four groups, known as the four A’s:

The Four A's

1. Access 访问

Enable individuals with authorization to access systems in a timely manner.

2. Audit 审计

Review security actions and user activity

Information security professionals periodically review logs and documents to validate compliance with security regulations, policies, and standards.Results of these audits are published periodically.

48. A staff member has been detected inappropriately accessing client records from usage logs. the security mechanism being used is an: A：entitlement B：audit. C：authorization D：access E：authentication 正确答案：B 你的答案：B 解析：7.1.3 2）审计（Audit），审查安金操作和用户活动，以确保符合法规和遵守公司制度和标准.信息安全专业人员金定期查看日志和文档，以脸证是否符合安全法规、策略和标准。这些审核的结果会定期发布。

3. Authentication 验证

Validate users’ access. When a user tries to log into a system, the system needs to verify that the person is who he or she claims to be. Passwords are one way of doing this.

16. Validate users' access. When a user tries to log into a system the system needs to verify that the person is who he or she claims to be. passwords are one way of doing this This is called A：analysis B：replication C：archiving D：Authentication E：threat 正确答案：D 你的答案：D 解析：7.1.3：3）验证（Authentication）。验证用户的访问权限。当用户试图登录到系统时，系统需要验证此人身份是否属实。除密码这种方式外，更严格的身份验证方法包括安全令牌、回答问题或提交指纹。

4. Authorization 授权

Grant individuals privileges to access specific views of data, appropriate to their role.

4. Authorization is the process of determining A：the identity of a user trying to access a network system or resource B：what information should be stored about users that occur on the firewall or network C：the capability that allows the receiver of an electronic message to prove who the send D：what type of data and functions an individual has access to within the enterprise E：none 正确答案：D 你的答案：D 解析： 7.1.3题解：4）授权（Authorization）。授予个人访问与其角色相适应的特定数据视图的权限。在获得授权后，访问控制系统在每次用户登录时都会检查授权令牌的有效性。从技术上讲，这是公司活动目录中数据字段中的一个条目，表示此人已获得授权访问数据。它进一步表明，用户凭借其工作或公司地位有权获得此权限，这些权限由相关负责人授予。

6. Protecting the data in a database is the function of management A：privacy B：redundancy C：authorization D：transaction E：None 正确答案：C 你的答案：A 解析：7.1.3：C授权.4）授权（Authorization）。授予个人访问与其角色相适应的特定数据视图的权限。在获得授权后，访问控制系统在每次用户登录时都会检查授权令牌的有效性。从技术上讲，这是公司活动目录中数据字段中的一个条目，表示此人已获得授权访问数据。它进一步表明，用户凭借其工作或公司地位有权获得此权限，这些权限由相关负责人授予。

15. Grant individuals privileges to access specific views of data appropriate to their role this is called A：analysis B：replication C：archiving D：Authorization E：threat 正确答案：D 你的答案：D 解析：7.1.3：C授权.4）授权（Authorization）。授予个人访问与其角色相适应的特定数据视图的权限。在获得授权后，访问控制系统在每次用户登录时都会检查授权令牌的有效性。从技术上讲，这是公司活动目录中数据字段中的一个条目，表示此人已获得授权访问数据。它进一步表明，用户凭借其工作或公司地位有权获得此权限，这些权限由相关负责人授予。

5. Entitlement 权限

An Entitlement is the sum total of all the data elements that are exposed to a user by asingle access authorization decision

6. 51. In data security ,which of the following is not one of the four A's A：Audit B：Authentication C：Authorization D：Agile E：Access 正确答案：D 你的答案：D 解析： 7.1.3题解：（1）4A1）访问（Access）2）审计 3 验证（Authentication）4）授权（Authorization）.

Security Monitoring is also essential for proving the success of the other processes.

Monitoring

Systems should include monitoring controls that detect unexpected events, including potential security violations.

actively interrupt activities 主动中断访问活动

Some security systems will actively interrupt activities that do not follow specific access profiles. The account or activity remains locked until security support personnel evaluate the details.

passive monitoring by taking snapshots 被动定期快照

passive monitoring tracks changes over time by taking snapshots of the system at regular intervals, and comparing trends against a benchmark or other criteria.

1.4.7. Data Integrity 数据完整性

data integrity is the state of being whole – protected from improper alteration, deletion, or addition.

For example, in the U.S., Sarbanes-Oxley regulations are mostly concerned with protecting financial information integrity by identifying rules for how financial information can be created and edited.

1.4.8. Encryption 加密

Encryption is the process of translating plain text into complex codes to hide privileged information, verify complete transmission, or verify the sender’s identity.

44. The process of translating plain text into complex codes to hide privileged information is A：enhancement. B：exaggeration C：elimination D：encryption E：encapsulation 正确答案：D 你的答案：D 解析：7.1.3：8.加密（Encryption）是将施文本转换为复杂代稠，以隐藏特权信息、验证传送完整性或验证发送者身份的过程。加密数据不能在没有解密密钥或门法的情况下该取，解密密钥或门法通常单独存储，不能基于同一数据集中的其他数据元素来进行计算，加密方法主要有3种类型，即哈希，对称加密、非对称加密，其复杂程度和密钥结构各不相同。

Encrypted data cannot be read without the decryption key or algorithm, which is usually stored separately and cannot be calculated based on other data elements in the same data set.

hash 哈希

Hash encryption uses algorithms to convert data into a mathematical representation.

hashing is used as verification of transmission integrity or identity

Message Digest 5 (MD5)

Secure Hashing Algorithm (SHA).

symmetric 对称加密 private-key 私钥

Private-key encryption uses one key to encrypt the data.

Data Encryption Standard (DES)

Triple DES (3DES)

Advanced Encryption Standard (AES)

International Data Encryption Algorithm (IDEA)

Cyphers Twofish

Serpent

Asymmetric 非对称加密 public-key 公钥

In public-key encryption, the sender and the receiver have different keys. The sender uses a public key that is freely available, and the receiver uses a private key to reveal the original data.

Rivest-Shamir-Adelman (RSA) Key Exchange

Diffie-Hellman Key Agreement

PGP (Pretty Good Privacy) is a freely available application of public-key encryption.

1.4.9. Obfuscation or Masking 混淆和脱敏

Data can be made less available by obfuscation (making obscure or unclear) or masking, which removes, shuffles, or otherwise changes the appearance of the data, without losing the meaning of the data or the relationships the data has to other data sets, such as foreign key relationships to other objects or systems.

21. Obfuscation 混淆 or redaction 编校 of data is the practice of A：selling data B：making information available to the public C：reducing the size of large databases D：making information anonymous or removing sensitive information E：organizing data into meaningful groups 正确答案：D 你的答案：D 解析：7.1.3：9.混淆或脱敏可通过混处理（变得模糊或不明确）或脱敏（删除、打乱或以其他方式更改数据的外观等）的方式来降低数据可用性，同时避免丢失数据的含义或数据与其他数据集的关系。

52. Obfuscation of data is to A：put it in different databases B：make the result clear. C：collect data from obscure sources D：use synonyms for the same term E：make it obscure 晦涩难懂 or unclear. 正确答案：E 你的答案：E 解析：7.1.3：9.混淆或脱敏可通过混淆处理（变得模糊或不明确）或脱敏（删除、打乱或以其他方式更改数据的外观等）的方式来降低数据可用性，同时避免丢失数据的含义或数据与其他数据集的关系。

Obfuscation is useful when displaying sensitive information on screens for reference, or creating test data sets from production data that comply with expected application logic.

Data masking is a type of data-centric security. There are two types

1. Persistent Data Masking 静态数据脱敏

Persistent data masking permanently and irreversibly 不可逆 alters the data This type of masking is not typically used in production environments, but rather between a production environment and development or test environments.

In-flight persistent masking 不落地静态脱敏

occurs when the data is masked or obfuscated while it is movingbetween the source (typically production) and destination (typically non-production) environment. In-flight masking is very secure when properly executed because it does not leave an intermediate file ordatabase with unmasked data. Another benefit is that it is re-runnable if issues are encountered partway through the masking.

In-place persistent masking 落地静态脱敏

is used when the source and destination are the same. The unmasked datais read from the source, masked, and then used to overwrite the unmasked data. In-place masking assumes the sensitive data is in a location where it should not exist and the risk needs to be mitigated,or that there is an extra copy of the data in a secure location to mask before moving it to the non-securelocation.

There are risks to this process. If the masking process fails mid-masking, it can be difficult torestore the data to a useable format. This technique has a few niche uses, but in general, in-flight masking will more securely meet project needs

2. Dynamic Data Masking 动态数据脱敏

Dynamic data masking changes the appearance of the data to the end user or system without changing the underlying 基础的 data.

This can be extremely useful when users need access to some sensitive production data, but not all of it.

3. Masking Methods 脱敏方法

1. Substitution 替换

Replace characters or whole values with those in a lookup or as a standard pattern

2. Shuffling 混排

Swap data elements of the same type within a record, or swap data elements of one attributebetween rows. Temporal variance 时空变异

Move dates +/– a number of days – small enough to preserve trends, butsignificant enough to render them non-identifiable.

3. Value variance 数值变异

Apply a random factor +/– a percent, again small enough to preserve trends, butsignificant enough to be non-identifiable.

4. Nulling or deleting 空值或删除

Remove data that should not be present in a test system.

5. Randomization 随机选择

Replace part or all of data elements with either random characters or a series of a single character.

6. Encryption 加密技术

Convert a recognizably meaningful character stream to an unrecognizable characterstream by means of a cipher code.

7. Expression masking 表达式脱敏

Change all values to the result of an expression. For example, a simple expression would just hard code all values in a large free form database field

8. Key masking 键值脱敏

Designate that the result of the masking algorithm/process must be unique andrepeatable because it is being used mask a database key field (or similar).

1.4.10. Network Security Terms 网络安全术语

1. Backdoor 后门

A backdoor refers to an overlooked or hidden entry into a computer system or application.

Any backdoor is a security risk.

2. Bot or Zombie 机器人或僵尸

A bot (short for robot) or Zombie is a workstation that has been taken over by a malicious hacker using a Trojan, a Virus, a Phish, or a download of an infected file.

A Bot-Net is a network of robot computers (infected machines)

3. Cookie

A cookie is a small data file that a website installs on a computer’s hard drive, to identify returning visitors and profile their preferences.

Cookies are used for Internet commerce.

4. Firewall 防火墙

A firewall is software and/or hardware that filters network traffic to protect an individual computer or an entire network from unauthorized attempts to access or attack the system.

5. Perimeter 周界

A perimeter is the boundary between an organization’s environments and exterior systems. a firewall will be in place between all internal and external environments.

50. A term is the boundary between an organization's environments and exterior systems. This term is called A：analysis B：replication C：archiving D：perimeter E：threat 正确答案：D 你的答案：D 解析：7.1.3：（5）周界（Perimeter）是指组织环境与外部系统之间的边界。通常将防火墙部署在所有内部和外部环境之间。

6. DMZ 非军事区

a DMZ is an area on the edge or perimeter of an organization, with a firewall between it and the organization.

DMZ environments are used to pass or temporarily store data moving between organizations.

25. A term is an area on the edge or perimeter of an organization，with a firewall between it and the organization. This term is called A：analysis B：replication C：archiving D：DMZ E：threat 正确答案：D 你的答案：D 解析：7.1.3：（6）DMZ是非军事区（De-militarized Zone）的简称，指组织边缘或外围区域，在DMZ和组织之间设有防火墙。DMZ环境与Internet互联网之间始终设有防火墙。DMZ环境用于传递或临时存储在组织之间移动的数据。

46. A DMZ is bordered by 2 firewalls. These are between the DMZ and the: A：internet and extranet B：internet; for added security. C：internet and intranet. D：Korean peninsula E：internet and internal systems 正确答案：E 你的答案：C 解析：7.1.3：（6）DMZ是非军事区（De-militarized Zone）的简称，指组织边缘或外围区域。在DMZ和组织之间设有防火墙。DMZ环境与Internet互联网之间始终设有防火墙（图7-3）。DMZ环境用于传递或临时存储在组织之间移动的数据。

7. Super User Account 超级用户

A Super User Account is an account that has administrator or root access to a system to be used only in an emergency.

only released in an emergency with appropriate documentation and approvals, and expire within a short time.

24. A term is an account that has administrator or root access to a system to be used only in an emergency. This term is called A：analysis B：replication C：super User Account D：DMZ E：threat 正确答案：C 你的答案：C 解析：7.1.3：（7）超级用户账户超级用户（Super User）账户是具有系统管理员或超级用户访问权限的账户，仅在紧急情况下使用。这些账户的凭据保存要求具有高度安全性，只有在紧急情况下才能通过适当的文件和批准发布，并在短时间内到期。

8. Key Logger 键盘登录器

Key Loggers are a type of attack software that records all the keystrokes that a person types into their keyboard, then sends them elsewhere on the Internet.

Thus, every password, memo, formula, document, and web address is captured.

30. A term is a type of attack software that records all the keystrokes that a person types into their keyboard, then sends them else where on the Internet. This term is called A：analysis B：Key Loggers C：super User Account D：DMZ E：Threat 正确答案：B 你的答案：B 解析：7.1.3：（8）键盘记录器（Key Logger）是一种攻击软件，对键盘上键入的所有击键进行记录，然后发送到互联网上的其他地方。

9. Penetration Testing 渗透测试

In Penetration Testing (sometimes called ‘penn test’), an ethical hacker, either from the organization itself or hired from an external security firm, attempts to break into the system from outside, as would a malicious hacker, in order to identify system vulnerabilities.

10. Virtual Private Network (VPN) 虚拟专用网络

VPN connections use the unsecured internet to create a secure path or ‘tunnel’ into an organization’s environment

It allows communication between users and the internal network by using multiple authentication elements to connect with a firewall on the perimeter of an organization’s environment.

The tunnel is highly encrypted.

1.4.11. Types of Data Security 数据安全类型

1. ‘Least Privilege’ is an important security principle. A user, process, or program should be allowed to access only the information allowed by its legitimate purpose.

2. Facility Security 设施安全

Facility security is the first line of defense against bad actors.

3. Device Security 设备安全

Mobile devices often contain corporate emails，spreadsheets, addresses, and documents that, if exposed, can be damaging to the organization, its employees, or its customers.

Device security standards include:

1. Access policies regarding connections using mobile devices

2. Storage of data on portable devices such as laptops, DVDs, CDs, or USB drives

3. Data wiping and disposal of devices in compliance with records management policies

4. Installation of anti-malware and encryption software

5. Awareness of security vulnerabilities

4. Credential Security 凭证安全

Each user is assigned credentials to use when obtaining access to a system.

Identity Management Systems 身份管理系统

between the heterogeneous resources to ease user password management，usually when logging into the workstation, after which all authentication and authorization executes through a reference to the enterprise user directory. An identity management system implementing this capability is known as ‘single-sign-on’ 单点登录

User ID Standards for Email Systems 邮件系统的用户ID标准

User IDs should be unique within the email domain.

Password Standards 密码标准

Passwords are the first line of defense in protecting access to data.

Do not permit blank passwords.

Multiple Factor Identification 多因素识别

Some systems require additional identification procedures. These can include a return call to the user’s mobile device that contains a code, the use of a hardware item that must be used for login, or a biometric factor such as fingerprint, facial recognition, or retinal scan.

5. Electronic Communication Security 电子通信安全

These insecure methods of communication can be read or intercepted by outside sources

Social media also applies here. Blogs, portals, wikis, forums, and other Internet or Intranet social media should be considered insecure and should not contain confidential or restricted information.

1.4.12. Types of Data Security Restrictions 数据安全制约因素

Two concepts drive security restrictions

Confidentiality level 保密等级

Confidential means secret or private.

only on a ‘need-to-know’ basis

Regulation 监管要求

Regulatory categories are assigned based on external rules

shared on an ‘allowed-to-know’ basis

The main difference between confidential and regulatory restrictions is where the restriction originates: confidentiality restrictions originate internally, while regulatory restrictions are externally defined.

Another difference is that any data set, such as a document or a database view, can only have one confidentiality level. This level is established based on the most sensitive (and highest classified) item in the data set. however, are additive. A single data set may have data restricted based on multiple regulatory categories. To assure regulatory compliance, enforce all actions required for each category, along with the confidentiality requirements.

Confidential Data 机密数据

1. For general audiences 普通受众公开

Information available to anyone, including the public.

2. Internal use only 仅内部使用

Information limited to employees or members, but with minimal risk if shared. For internal use only; may be shown or discussed, but not copied, outside the organization.

3. Confidential 机密

Information that cannot be shared outside the organization without a properly executednon-disclosure agreement or similar in place. Client confidential information may not be shared withother clients.

4. Restricted confidential 受限机密

Information limited to individuals performing certain roles with the ‘need toknow.’ Restricted confidential may require individuals to qualify through clearance.

5. Registered confidential 绝密

Information so confidential that anyone accessing the information must sign a legal agreement to access the data and assume responsibility for its secrecy.

Regulated Data 监管限制的数据

1. Certain types of information are regulated by external laws, industry standards, or contracts that influence how data can be used, as well as who can access it and for what purposes. As there are many overlapping regulations, it is easier to collect them by subject area into a few regulatory categories or families to better inform data managers of regulatory requirements.

2. Each enterprise, of course, must develop regulatory categories that meet their own compliance needs. Further, it is important that this process and the categories be as simple as possible to allow for an actionable protection capability. When category protective actions are similar, they should be combined into a regulation ‘family’. Each regulatory category should include auditable protective actions. This is not an organizational tool but an enforcement method.

3. Sample Regulatory Families 法规系列举例

1. Personal Identification Information (PII) 个人身份信息

Also known as Personally Private Information (PPI)

EU PrivacyDirectives, Canadian Privacy law (PIPEDA), PIP Act 2003 in Japan, PCI standards, US FTCrequirements, GLB, FTC standards, and most Security Breach of Information Acts

2. Financially Sensitive Data 财务敏感数据

All financial information, including what may be termed ‘shareholder’ or‘insider’ data, including all current financial information that has not yet been reported publicly

SOX(Sarbanes-Oxley Act), or GLBA (Gramm-Leach-Bliley/Financial Services Modernization Act).

3. Medically Sensitive Data/Personal Health Information (PHI) 医疗敏感数据/个人健康信息

In the US, this is covered by HIPAA (Health Information Portability andAccountability Act).

4. Educational Records 教育记录

In the US, this is covered byFERPA (Family Educational Rights and Privacy Act).

4. Industry or Contract-based Regulation 行业法规或基于合同的法规

Payment Card Industry Data Security Standard (PCI-DSS) 支付卡行业数据安全标准

Competitive advantage or trade secrets 竞争优势和商业秘密

Contractual restrictions 合同限制

1.4.13. System Security Risks 系统安全风险

1. Abuse of Excessive Privilege 滥用特权

In granting access to data, the principle of least privilege should be applied. A user, process, or program should be allowed to access only the information allowed by its legitimate purpose. The risk is that users with privileges that exceed the requirements of their job function may abuse these privileges for malicious purpose or accidentally.

The DBA may not have the time or Metadata to define and update granular access privilege control mechanisms for each user entitlement

This lack of oversight to user entitlements is one reason why many data regulations specify data management security.

Query-level access control is useful for detecting excessive privilege abuse by malicious employees.

automated tools are usually necessary to make real query-level access control functional.

2. Abuse of Legitimate Privilege 滥用合法特权

Users may abuse legitimate database privileges for unauthorized purposes.

There are two risks to consider: intentional 故意 and unintentional 无意 abuse.

Intentional abuse occurs when an employee deliberately misuses organizational data.

Unintentional abuse is a more common risk: The diligent employee who retrieves and stores large amounts of patient information to a work machine for what he or she considers legitimate work purposes.

The partial solution to the abuse of legitimate privilege is database access control that not only applies to specific queries, but also enforces policies for end-point machines using time of day, location monitoring, and amount of information downloaded, and reduces the ability of any user to have unlimited access to all records containing sensitive information unless it is specifically demanded by their job and approved by their supervisor.

3. Unauthorized Privilege Elevation 未经授权的特权升级

Attackers may take advantage of database platform software vulnerabilities to convert access privileges from those of an ordinary user to those of an administrator. Vulnerabilities may occur in stored procedures, built-in functions, protocol implementations, and even SQL statements

Prevent privilege elevation exploits with a combination of traditional intrusion prevention systems (IPS) 入侵防护系统 and query-level access control intrusion prevention

4. Service Account or Shared Account Abuse 服务账户或共享账户滥用

Use of service accounts (batch IDs) and shared accounts (generic IDs) increases the risk of data security breaches and complicates the ability to trace the breach to its source. Some organizations further increase their risk when they configure monitoring systems to ignore any alerts related to these accounts. Information security managers should consider adopting tools to manage service accounts securely

Service Accounts 服务账户

Service accounts are convenient because they can tailor enhanced access for the processes that use them.

12. An application uses a single service account for all database access. One of the risks of this approach is A：the ability to trace who made changes to the data B：the data becomes out of order. C：the application freezes more often D：the database runs out of threads E：it constrains the application from running parallel processes 正确答案：A 你的答案：E 解析：同一账号无法追踪数据更改操

Restrict the use of service accounts to specific tasks or commands on specific systems, and require documentation and approval for distributing the credentials

Shared Accounts 共享账户

Shared accounts are created when an application cannot handle the number of user accounts needed or when adding specific users requires a large effort or incurs additional licensing costs

They should never be used by default.

5. Platform Intrusion Attacks 平台入侵攻击

Software updates and intrusion prevention protection of database assets requires a combination of regular software updates (patches) and the implementation of a dedicated Intrusion Prevention Systems (IPS). 入侵防御系统

An IPS is usually, but not always, implemented alongside of an Intrusion Detection System (IDS). 入侵检测系统

The most primitive form of intrusion protection is a firewall, but with mobile users, web access, and mobile computing equipment a part of most enterprise environments, a simple firewall, while still necessary, is no longer sufficient.

6. SQL Injection Vulnerability 注入漏洞

In a SQL injection attack, a perpetrator inserts (or ‘injects’) unauthorized database statements into a vulnerable SQL data channel, such as stored procedures and Web application input spaces. These injected SQL statements are passed to the database, where they are often executed as legitimate commands. Using SQL injection, attackers may gain unrestricted access to an entire database

Mitigate this risk by sanitizing all inputs before passing them back to the server.

7. Default Passwords 默认密码

It is a long-standing practice in the software industry to create default accounts during the installation of software packages. Some are used in the installation itself. Others provide users with a means to test the software out of the box.

Eliminating the default passwords is an important security step after every implementation.

8. Backup Data Abuse 备份数据滥用

Backups are made to reduce the risks associated with data loss, but backups also represent a security risk.

Encrypt all database backups. Encryption prevents loss of a backup either in tangible media or in electronic transit. Securely manage backup decryption keys. Keys must be available off-site to be useful for disaster recovery.

1.4.14. Hacking / Hacker 黑客行为和黑客

The term hacking came from an era when finding clever ways to perform some computer task was the goal.

29. A term came from an era when finding clever ways to perform some computer task was the goal. This term is called A：analysis B：Hacking C：super User Account D：DMZ E：threat 正确答案：B 你的答案：B 解析：7.1.3：14.黑客行为/黑客“黑客行为“一词产生于以寻找执行某些计算机任务的聪明方法为目标的时代。黑客是在复杂的计算机系统中发现未知操作和路径的人。黑客有好有坏。

An ethical or ‘White Hat’ hacker works to improve a system.

A malicious hacker is someone who intentionally breaches or ‘hacks’ into a computer system to steal confidential information or to cause damage.

1.4.15. Social Threats to Security / Phishing 社工威胁和网络钓鱼

Social engineering 社会工程 refers to how malicious hackers try to trick people into giving them either information or access.

49. A term refers to how malicious hackers try to trick people into giving them either information or access this term is called A：analysis B：Hacking C：social engineering D：DMZ E：threat 正确答案：C 你的答案：C 解析：7.1.3.：社会工程（Social Engineering）是指恶意黑客试图诱骗人们提供信息或访问信息的方法。黑客利用所获得的各种信息来说服有关员工他们有合法的请求。有时，黑客会按顺序联系几个人，在每一步收集信息以用于获得下一个更高级别员工的信任。

Phishing refers to a phone call, instant message, or email meant to lure recipients into giving out valuable or private information without realizing they are doing so.

1.4.16. Malware 恶意软件

Malware refers to any malicious software created to damage, change, or improperly access a computer or network.

28. A term refers to any malicious software created to damage change or improperly access a computer or network. This term is called A：Malware B：Hacking C：Social engineering D：DMZ E：threat 正确答案：A 你的答案：A 解析：7.1.3：16.恶意软件是指为损坏、更改或不当访问计算机或网络而创建的软件。

Adware 广告软件

Adware is not illegal, but is used to develop complete profiles

Adware is a form of spyware that enters a computer from an Internet download

27. A term is a form of spyware that enters a computer from an Internet download This term is called A：Adware B：Hacking C：social engineering D：DMZ E：threat 正确答案：A 你的答案：A 解析：7.1.3：1）广告软件（Adware）是一种从互联网下载至计算机的间课软件。广告软件监控计算机的使用，如访问了哪些网站。广告软件也可能在用户的浏览器中插入对象和工具栏。广告软件并不违法，但它用于收集完整的用户浏览和购买习惯的个人资料并出售给其他营销公司。恶意软件也很容易利用它来窃取身份信息。

Spyware 间谍软件

install tracking spyware, which is a form of Adware

Spyware refers to any software program that slips into a computer without consent, in order to track online activity.

26. A term refers to any software program that slips into a computer without consent，in order to rack online activity. This term is called A：Adware B：Spyware C：Social engineering D：DMZ E：threat 正确答案：B 你的答案：B 解析：7.1.3：2）间谍软件（Spyware）是指未经同意而潜入计算机以跟踪在线活动的任何软件程序。这些程序倾向于搭载在其他软件程序上，当用户从互联网站点下载并安装免费软件时，通常用户不知情时就安装了

Trojan Horse 特洛伊木马

a Trojan horse refers to a malicious program that enters a computer system disguised or embedded within legitimate software.

Virus 病毒

A virus is a program that attaches itself to an executable file or vulnerable application and delivers a payload that ranges from annoying to extremely destructive

14. A term is a program that attaches itself to an executable file or vulnerable application and delivers a payload that ranges from annoying to extremely destructive. This term is called A：Adware B：virus C：Social engineering D：DMZ E：threat 正确答案：B 你的答案：B 解析： 7.1.3题解：（4）病毒（Virus）是一种计算机程序，它将自身附加到可执行文件或易受攻击的应用程序上，能造成从让人讨厌到极具破坏性的后果。一旦受感染文件被打开就可执行病毒文件。病毒程序总是需要依附于另一个程序上。下载打开这些受感染的程序可能会释放病毒。

Worm 蠕虫

A computer worm is a program built to reproduce and spread across a network by itself

consuming large amounts of bandwidth,

34. A term is a program built to reproduce and spread across a network by itself. This term is called A：Adware B：computer worm C：social engineering D：DMZ E：threat 正确答案：B 你的答案：B 解析：7.1.3：（5）计算机蠕虫（Worm）是一种自己可以在网络中进行复制和传播的程序。受蠕虫感染的计算机将源源不断地发送感染信息。其主要功能是通过消耗大量带宽来危害网络，从而导致网络中断。蠕虫也可能会执行多种其他恶意的活动。

Malware Sources 恶意软件来源

1. Instant Messaging (IM) 即时消息

2. Social Networking Sites 社交网站

3. Spam 垃圾邮件

Domains known for spam transmission

CC: or BCC: address count above certain limits

Email body has only an image as a hyperlink

Specific text strings or words

33. A term refers to unsolicited 来历不明 commercial email messages sent out in bulk usually to tens of millions of users in hopes that a few may reply. this term is called A：Adware B：Spam C：Social engineering D：DMZ E：threat 正确答案：B 你的答案：B 解析7.1.3：3）垃圾邮件。垃圾邮件（Spam）是指批量发送那些未经请求的商业电子邮。通常发送给数干万用户，希望获得一些用户回复

2. Activities

2.1. Identify Data Security Requirements

2.1.1. Business Requirements 业务需求

Implementing data security within an enterprise begins with a thorough understanding of business requirements. The business needs of an enterprise, its mission, strategy and size, and the industry to which it belongs define the degree of rigidity required for data security.

Analyze business rules and processes to identify security touch points. Every event in the business workflow may have its own security requirements.

37. which of the following define the data security touch points in an organization? A：Industry standards wered B：Internal Audit C：Risk Assessment D：Legislation E：Business rules and process workflow 正确答案：E 你的答案：E 解析：7.2.1.：1.业务需求在组织内实施数据安全的第一步是全面了解组织的业务需求。组织的业务需求、使命、战略和规模以及所属行业，决定了所需数据安全的严格程度，例如，美国的金融证券行业受到高度监管，需要保持严格的数据安全标准。相比之下，一个小型零售企业可能不大会选择大型零售商的同类型数据安全功能，即使他们都具有相似的核心业务活动。通过分析业务规则和流程，确定安全接触点。业务工作流中的每个事件都可能有自己的安全需求。

Data-to-process and data-to-role relationship matrices are useful tools to map these needs and guide definition of data security role-groups, parameters, and permissions

2.1.2. Regulatory Requirements 监管需求

Today’s fast changing and global environment requires organizations to comply with a growing set of laws and regulations. The ethical and legal issues facing organizations in the Information Age are leading governments to establish new laws and standards. These have all imposed strict security controls on information management. Create a central inventory of all relevant data regulations and the data subject area affected by each regulation.

Sarbanes-Oxley Act of 2002

Health Information Technology for Economic and Clinical Health (HITECH) Act, enacted aspart of the American Recovery and Reinvestment Act of 2009

Health Insurance Portability and Accountability Act of 1996 (HIPAA) Security Regulations

Gramm-Leach-Bliley I and II

SEC laws and Corporate Information Security Accountability Act

Homeland Security Act and USA Patriot Act

Federal Information Security Management Act (FISMA)

California: SB 1386, California Security Breach Information Act

Data Protection Directive (EU DPD 95/46/) AB 1901, Theft of electronic files or databases

Canada

Canadian Bill 198

Australia

The CLERP Act of Australia

Regulations that impact data security include

Payment Card Industry Data Security Standard (PCI DSS), in the form of a contractual agreement forall companies working with credit cards

EU: The Basel II Accord, which imposes information controls for all financial institutions doingbusiness in its related countries

US: FTC Standards for Safeguarding Customer Info

2.2. Define Data Security Policy 制定数据安全策略

2.2.1. Organizations should create data security policies based on business and regulatory requirements.

2.2.2. A policy is a statement of a selected course of action and high-level description of desired behavior to achieve a set of goals.

2.2.3. Security Policy Contents 安全政策的内容

Enterprise Security Policy

Global policies for employee access to facilities and other assets, emailstandards and policies, security access levels based on position or title, and security breach reportingpolicies

IT Security Policy

Directory structures standards, password policies, and an identity managementframework

Data Security Policy

Categories for individual application, database roles, user groups, andinformation sensitivity

Commonly, the IT Security Policy and Data Security Policy are part of a combined security policy. The preference, however, should be to separate them. Data security policies are more granular in nature, specific to content, and require different controls and procedures. The Data Governance Council should review and approve the Data Security Policy. The Data Management Executive owns and maintains the policy

32. Definition of data security policies should be A：Based on defined standards and templates B：A collaborative effort between Business and lT C：Reviewed by external Regulators D：Conducted by external consultants E：Determined by external Regulators 正确答案：B 你的答案：B 解析：正确答案：B来源：7.2.2题解：公司的制度通常具有法律含义。法院可认为，为支持法律监管要求而制定的制度是该组织为法律遵从而努力的内在组成部分。如发生数据泄露事件，未能遵守公司制度可能会带来负面的法律后果。制定安全制度需要IT安全管理员、安全架构师、数据治理委员会、数据管理专员、内部和外部审计团队以及法律部门之间的协作。数据管理专员还必须与所有隐私官（萨班斯-奥克斯利法案主管、HIPAA官员等）以及具有数据专业知识的业务经理协作，以开发监管类元数据并始终如一地应用适当的安全分类。所有数据法规遵从行动必须协调一致，以降低成本、工作指令混乱和不必要的本位之争。

Data security policies, procedures, and activities should be periodically reevaluated to strike the best possible balance between the data security requirements of all stakeholders

2.3. Define Data Security Standards 定义数据安全细则

2.3.1. Policies provide guidelines for behavior. They do not outline every possible contingency. Standards supplement policies and provide additional detail on how to meet the intention of the policies.

Define Data Confidentiality Levels 定义数据保密等级

Confidentiality classification is an important Metadata characteristic, guiding how users are granted access privileges.

Define Data Regulatory Categories 定义数据监管类别

A growing number of highly publicized data breaches, in which sensitive personal information has been compromised, have resulted in data-specific laws to being introduced.

Define Security Roles 定义安全角色

Role groups enable security administrators to define privileges by role and to grant these privileges by enrolling users in the appropriate role group.

There are two ways to define and organize roles: as a grid (starting from the data), or in a hierarchy (starting from the user).

Role Assignment Grid 角色分配矩阵

Role Assignment Hierarchy 角色分配层次结构

2.4. Assess Current Security Risks 评估当前安全风险

2.4.1. Security risks include elements that can compromise a network and/or database. The first step in identifying risk is identifying where sensitive data is stored, and what protections are required for that data. Evaluate each system for the following:

The sensitivity of the data stored or in transit

The requirements to protect that data, and

The current security protections in place

2.4.2. Document the findings, as they create a baseline for future evaluations

2.4.3. In larger organizations, white-hat hackers may be hired to assess vulnerabilities. A white hat exercise can be used as proof of an organization’s impenetrability, which can be used in publicity for market reputation.

2.5. Implement Controls and Procedures 实施控制和规程

2.5.1. Controls and procedures should (at a minimum) cover:

1. How users gain and lose access to systems and/or applications

2. How users are assigned to and removed from roles

3. How privilege levels are monitored

4. How requests for access changes are handled and monitored

5. How data is classified according to confidentiality and applicable regulations

6. How data breaches are handled once detected

2.5.2. a policy to ‘maintain appropriate user privileges’ could have a control objective of ‘Review DBA and User rights and privileges on a monthly basis’.

Validate assigned permissions against a change management system used for tracking all userpermission requests

Require a workflow approval process or signed paper form to record and document each changerequest

Include a procedure for eliminating authorizations for

2.5.3. Assign Confidentiality Levels 分配密级

The classification for documents and reports should be based on the highest level of confidentiality for any information found within the document.

Label each page or screen with the classification in the header or footer. Information products classified as least confidential

2.5.4. Assign Regulatory Categories 分配监管类别

Organizations should create or adopt a classification approach to ensure that they can meet the demands of regulatory compliance.

2.5.5. Manage and Maintain Data Security 管理和维护数据安全

Control Data Availability / Data-centric Security

An enterprise data model is essential to identifying and locating sensitive data.

Data masking can protect data even if it is inadvertently exposed

Relational database views can used to enforce data security levels

Monitor User Authentication and Access Behavior

Reporting on access is a basic requirement for compliance audits.

Monitoring also helps detect unusual, unforeseen, or suspicious transactions that warrant investigation.

It can be implemented within a system or across dependent heterogeneous systems.

Monitoring can be automated or executed manually or executed through a combination of automation and oversight.

Lack of automated monitoring represents serious risks:

1. Regulatory risk 监管风险

2. Detection and recovery risk 检测和恢复风险

3. Administrative and audit duties risk 管理和审计职责风险

4. Risk of reliance on inadequate native audit tools 依赖不适当的本地审计工具风险

implement a network-based audit appliance，has the following benefits

1. High performance 高性能

2. Separation of duties 职责分离

3. Granular transaction tracking 精细事务跟踪

2.5.6. Manage Security Policy Compliance 管理安全制度遵从性

Manage Regulatory Compliance 管理法规遵从性

1. Measuring compliance with authorization standards and procedures

2. Ensuring that all data requirements are measurable and therefore auditable (i.e., assertions like “becareful” are not measurable)

3. Ensuring regulated data in storage and in motion is protected using standard tools and processes

4. Using escalation procedures and notification mechanisms when potential non-compliance issues arediscovered, and in the event of a regulatory compliance breach

Audit Data Security and Compliance Activities 审计数据安全和合规活动

Internal audits of activities to ensure data security and regulatory compliance policies are followed should be conducted regularly and consistently.

Internal or external auditors may perform audits.

auditors must be independent of the data and / or process involved in the audit to avoid any conflict of interest and to ensure the integrity of the auditing activity and results.

Auditing is not a fault-finding mission. The goal of auditing is to provide management and the data governance council with objective, unbiased assessments, and rational, practical recommendations.

audits often include performing tests and checks, such as:

1. Analyzing policy and standards to assure that compliance controls are defined clearly and fulfillregulatory requirements

2. Analyzing implementation procedures and user-authorization practices to ensure compliance withregulatory goals, policies, standards, and desired outcomes

3. Assessing whether authorization standards and procedures are adequate and in alignment withtechnology requirements

4. Evaluating escalation procedures and notification mechanisms to be executed when potential non-compliance issues are discovered or in the event of a regulatory compliance breach

5. Reviewing contracts, data sharing agreements, and regulatory compliance obligations of outsourcedand external vendors, that ensure business partners meet their obligations and that the organizationmeets its legal obligations for protecting regulated data

6. Assessing the maturity of security practices within the organization and reporting to seniormanagement and other stakeholders on the ‘State of Regulatory Compliance’

7. Recommending Regulatory Compliance policy changes and operational compliance improvements

Auditing data security is not a substitute for management of data security. It is a supporting process that objectively assesses whether management is meeting goals.

3. Tools

3.1. Anti-Virus Software / Security Software

3.1.1. update security software regularly

3.2. HTTPS

3.3. Identity Management Technology

3.3.1. Lightweight Directory Access Protocol (LDAP) 轻量级目录访问协议

3.4. Intrusion Detection and Prevention Software 入侵侦测和入侵防御

3.4.1. IDS

3.4.2. IPS

3.5. Firewalls (Prevention)

3.6. Metadata Tracking

3.6.1. Tools that track Metadata can help an organization track the movement of sensitive data.

3.6.2. These tools create a risk that outside agents can detect internal information from metadata associated with documents. Identification of sensitive information using Metadata provides the best way to ensure that data is protected properly. Since the largest number of data loss incidents result from the lack of sensitive data protection due to ignorance of its sensitivity, Metadata documentation completely overshadows any hypothetical risk that might occur if the Metadata were to be somehow exposed from the Metadata repository. This risk is made more negligible since it is trivial for an experienced hacker to locate unprotected sensitive data on the network.

3.6.3. The people most likely unaware of the need to protect sensitive data appear to be employees and managers.

3.7. Data Masking/Encryption

4. Techniques

4.1. CRUD Matrix Usage

4.1.1. Creating and using data-to-process and data-to-role relationship (CRUD–Create, Read, Update, Delete) matrices help map data access needs and guide definition of data security role groups, parameters, and permissions. Some versions add an E for Execute to make CRUDE

1. You are engaged in a consulting position to advice a company on how best to understand how its data is used in the company by its applications as a best approach, you would recommend A：the development of an enterprise data model B：conducting an inventory of data C：the development of CRUD matrices for all application D：the development of a conceptual model E：the development of RACl matrices for all applications 正确答案：C 你的答案：C 解析： 7.4.1应用CRUD矩阵创建和使用数据-流程矩阵和数据-角色关系（CRUD-创建、读取、更新、删除）矩阵有助于映射数据访问需求，并指导数据安全角色组、参数和权限的定义。某些版本中添加E（Execute）执行，以创建CRUDE矩阵。负责、批注、咨询、通知（RACI）矩阵也有助于明确不同角色的角色、职责分离和职责，包括他们的数据安全义务。

35. A CRUD matrix helps organizations map responsibilities for data changes in the business process workflow. CRUD stands for A：Cost，Revenue，Uplift, Depreciate B：Create，Read，Update，Delete C：create，Review, Use，Destroy D：Create, React, Utilize, Delegate E：Confidential，Restricted，Unclassified, Destroy 正确答案：B 你的答案：B 解析：7.4.1应用CRUD矩阵创建和使用数据-流程矩阵和数据-角色关系（CRUD一创建、读取、更新、删除）矩阵有助于缺射数据访问需求，并指导数据安全角色组、参数和权限的定义，某些版本中添加E（Execute）执行，以创建CRUDE矩阵。

4.2. Immediate Security Patch Deployment 即时安全修复程序部署

4.2.1. A process for installing security patches as quickly as possible on all machines should be in place. A malicious hacker only needs root access to one machine in order to conduct his attack successfully on the network. Users should not be able to delay this update.

4.3. Data Security Attributes in Metadata 元数据中的数据安全属性

4.3.1. A Metadata repository is essential to assure the integrity and consistent use of an Enterprise Data Model across business processes. Metadata should include security and regulatory classifications for data.

4.4. Security Needs in Project Requirements 项目需求中的安全需求

4.4.1. Every project that involves data must address system and data security. Identify detailed data and application security requirements in the analysis phase. Identification up front guides the design and prevents having to retrofit security processes.

4.5. Efficient Search of Encrypted Data 高效搜索加密数据

4.5.1. Searching encrypted data obviously includes the need to decrypt the data.

4.5.2. encrypt the search criteria 用同样的密文先搜索再解密，比用明文搜索更快

4.6. Document Sanitization 文件清理

4.6.1. Document sanitization is the process of cleaning Metadata, such as tracked change history, from documents before sharing. Sanitization mitigates the risk of sharing confidential information that might be embedded in comments.

5. Implementation Guidelines

5.1. Readiness Assessment / Risk Assessment

5.1.1. Training 培训

Promotion of standards through training on security initiatives at all levels of theorganization.

5.1.2. Consistent policies 制度的一致性

Definition of data security policies and regulatory compliance policies forworkgroups and departments that complement and align with enterprise policies

5.1.3. Measure the benefits of security 衡量收益

Link data security benefits to organizational initiatives.

5.1.4. Set security requirements for vendors 为供应商设置安全要求，SLA

Include data security requirements in service levelagreements and outsourcing contractual obligations.

5.1.5. Build a sense of urgency 增强紧迫感

Emphasize legal, contractual, and regulatory requirements to build a senseof urgency and an internal framework for data security management.

5.1.6. Ongoing communications 持续沟通

Supporting a continual employee security-training program informingworkers of safe computing practices and current threats.

5.2. Organization and Cultural Change

5.2.1. Data Stewards are generally responsible for data categorization.

5.2.2. Information security teams assist with compliance enforcement and establish operational procedures based on data protection policies, and security and regulatory categorization.

5.3. Visibility into User Data Entitlement 用户数据授权的可见性

5.3.1. Each user data entitlement, which is the sum total of all the data made available by a single authorization, must be reviewed during system implementation to determine if it contains any regulated information

5.4. Data Security in an Outsourced World

5.4.1. Anything can be outsourced except liability.

5.4.2. Outsourcing IT operations introduces additional data security challenges and responsibilities. Outsourcing increases the number of people who share accountability for data across organizational and geographic boundaries. Previously informal roles and responsibilities must be explicitly defined as contractual obligations. Outsourcing contracts must specify the responsibilities and expectations of each role.

5.4.3. Any form of outsourcing increases risk to the organization, including some loss of control over the technical environment and the people working with the organization’s data. Data security measures and processes must look at the risk from the outsource vendor as both an external and internal risk.

5.4.4. The maturity of IT outsourcing has enabled organizations to re-look at outsourced services. A broad consensus has emerged that architecture and ownership of IT, which includes data security architecture, should be an in-sourced function. In other words, the internal organization owns and manages the enterprise and security architecture. The outsourced partner may take the responsibility for implementing the architecture

5.4.5. Transferring control, but not accountability, requires tighter risk management and control mechanisms. Some of these mechanisms include:

1. Service level agreements

2. Limited liability provisions in the outsourcing contract

3. Right-to-audit clauses in the contract

4. Clearly defined consequences 后果 to breaching 违反 contractual obligations 合同义务

5. Frequent data security reports from the service vendor

6. Independent monitoring of vendor system activity

7. Frequent and thorough 定期且彻底的 data security auditing

8. Constant communication with the service vendor

9. Awareness of legal differences in contract law should the vendor be located in another country and adispute arises

5.4.6. Outsourcing organizations especially benefit from developing CRUD (Create, Read, Update, and Delete) matrices that map data responsibilities across business processes, applications, roles, and organizations, tracing the transformation, lineage, and chain of custody for data

5.4.7. Responsible, Accountable, Consulted, and Informed (RACI) matrices also help clarify roles, the separation of duties, and responsibilities of different roles, including their data security obligations.

11. RACl matrices also help clarify roles，the separation of duties, and responsibilities of different roles，including their data security obligations. RAIC stands for A：Responsible，Answer，Consulted, and Inquiry B：Responsible，Accountable，Consulted and Informed C：Responsible，Accountable，Charge and lnformed D：Responsible，answer，Consulted and lnformed E：Responsible，accountable，Conscientious and lnformed 正确答案：B 你的答案：B 解析：7.5.4.负责、批注、咨询、通知（RACI）矩阵也有助于明确不同角色的角色、职责分离和职责，包括他们的数据安全义务。RACI矩阵可成为合同协议和数据安全制度的一部分。通过定义责任矩阵（如RACI）在参与外包的各方之间确立明确的问责制和所有权，从而支持总体数据安全制度及其实施。

40. A RACl matrix is a useful tool to support the ______ in an outsourced arrangement A：Segregation of duties SOD 职责隔离 B：Attributing Costs C：Transfer of access controls D：Alignment of Business goals E：Service level Agreement 正确答案：A 你的答案：E 解析：7.5.4：负责、批注、咨询、通知（RACI）矩阵也有助于明确不同角色的角色、职责分离和职责，包括他们的数据安全义务。

5.5. Data Security in Cloud Environments

5.5.1. Data security policies need to account for the distribution of data across the different service models. This includes the need to leverage external data security standards.

5.5.2. Shared responsibility, defining chain of custody 监管链 of data and defining ownership and custodianship 托管权 rights, is especially important in cloud computing.

5.5.3. Fine-tuning or even creating a new data security management policy geared towards cloud computing is necessary for organizations of all sizes.

5.5.4. The same data proliferation 数据扩散 security principles apply to sensitive/confidential production data.

6. Data Security Governance

6.1. Data Security and Enterprise Architecture

6.1.1. Enterprise Architecture defines the information assets and components of an enterprise, their interrelationships, and business rules regarding transformation, principles, and guidelines.

6.1.2. Data Security architecture is the component of enterprise architecture that describes how data security is implemented within the enterprise to satisfy the business rules and external regulations.

6.1.3. Architecture influences:

1. Tools used to manage data security

2. Data encryption standards and mechanisms

3. Access guidelines to external vendors and contractors

4. Data transmission protocols over the internet

5. Documentation requirements

6. Remote access standards

7. Security breach incident-reporting procedures

6.1.4. Security architecture is particularly important for the integration of data between:

1. Internal systems and business units

2. An organization and its external business partners

3. An organization and regulatory agencies

6.2. Metrics

6.2.1. Security Implementation Metrics 安全实施的指标

positive value percentages

1. Percentage of enterprise computers having the most recent security patches installed

2. Percentage of computers having up-to-date anti-malware software installed and running

3. Percentage of new-hires who have had successful background checks

4. Percentage of employees scoring more than 80% on annual security practices quiz

5. Percentage of business units for which a formal risk assessment analysis has been completed

6. Percentage of business processes successfully tested for disaster recovery in the event of fire,earthquake, storm, flood, explosion or other disaster

7. Percentage of audit findings that have been successfully resolved

Trends can be tracked

1. Performance metrics of all security systems

2. Background investigations and results

3. Contingency planning and business continuity plan status

4. Criminal incidents and investigations

5. Due diligence examinations for compliance, and number of findings that need to be addressed

7. An information security due care and due diligence activity should include which of the following? A：Steps can be verified，measured or produce tangible artifacts on a continual basis B：An incident response plan is created C：Leaders are accountable and staff is aware and trained D：Due care are steps are taken to show that company has taken responsibility E：all 正确答案：C 你的答案：C 解析：7.6.2：5）合规的尽职调查以及需要解决的调查结果数量。

6. Informational risk management analysis performed and number of those resulting in actionablechanges

7. Policy audit implications and results, such as clean desk policy checks, performed by evening-shiftsecurity officers during rounds

8. Security operations, physical security, and premises protection statistics

9. Number of documented, accessible security standards (a.k.a. policies)

10. The motivation of relevant parties to comply with security policies can also be measured

11. Business conduct and reputational risk analysis, including employee training

12. Business hygiene and insider risk potential based on specific types of data such as financial, medical,trade secrets, and insider information

13. Confidence and influence indicators among managers and employees as an indication of how datainformation security efforts and policies are perceived

6.2.2. Security Awareness Metrics 安全意识指标

Risk assessment findings

provide qualitative data that needs to be fed back to appropriate business units to make them more aware of their accountability.

Risk events and profiles

identify unmanaged exposures that need correction. Determine the absenceor degree of measurable improvement in risk exposure or conformance to policy by conducting follow-up testing of the awareness initiative to see how well the messages got across.

Formal feedback surveys and interviews

identify the level of security awareness. Also, measure the number of employees who have successfully completed security awareness training within targeted populations.

Incident post mortems, lessons learned, and victim interviews 受害者访谈

provide a rich source of informationon gaps in security awareness. Measures may include how much vulnerability has been mitigated.

Patching effectiveness audits 补丁有效性审计

involve specific machines that work with confidential and regulatedinformation to assess the effectiveness of security patching. (An automated patching system is advisedwhenever possible.)

6.2.3. Data Protection Metrics 数据保护的指标

1. Criticality ranking

of specific data types and information systems that, if made inoperable, would have profound impact on the enterprise.

2. Annualized loss expectancy

of mishaps, hacks, thefts, or disasters related to data loss, compromise, orcorruption.

3. Risk of specific data losses

related to certain categories of regulated information, and remediationpriority ranking.

4. Risk mapping of data to specific business processes.

Risks associated with Point of Sale devices would be included in the risk profile of the financial payment system.

5. Threat assessments

performed based on the likelihood of an attack against certain valuable data resources and the media through which they travel.

6. Vulnerability assessments

of specific parts of the business process where sensitive information couldbe exposed, either accidentally or intentionally.

6.2.4. Security Incident Metrics 安全事件指标

Intrusion attempts 入侵尝试 detected and prevented

Return on Investment for security costs using savings from prevented intrusions

6.2.5. Confidential Data Proliferation 机密数据扩散

The number of copies of confidential data should be measured in order to reduce this proliferation.

7. Works Cited / Recommended

7.1. 2. All of the following are desirable security characteristics of database users EXCEPT A：identifiable B：monitored C：efficient D：authorized E：audit 正确答案：C 你的答案：C 解析：与安全无关。4）有效性。如数据符合其定义的语法（格式、类型、范围），则数据有效。

7.2. 3. Data security policies should address all of the following EXCEPT A：physical security B：encryption schemes. C：levels of control for documents D：access speed E：none 正确答案：D 你的答案：D 解析：D是性能指标

7.3. 5. Two data authorization commands available in SQL are __and__ A：Allow; Disallow. B：Grant 授予; Revoke撤回 C：Lock; Unlock. D：Freeze; Release E：Allow; Release 正确答案：B 你的答案：B 解析：SQL基础知识

7.4. 8. Data security policies include all of the following EXCEPT A：role-based access. B：role-based update C：data classification D：data quality requirements E：role-based delete 正确答案：D 你的答案：D 解析：D是数据质量不是数据安全

7.5. 9. ____and___ are two techniques that must be applied by policy to information to eliminate business risk once Data is no longer needed A：Backup recovery B：Declassification redaction解密修订 C：sanitization 清洗; recovery D：Shredding 分解; expungement 消除 E：Sanitization, recovery 正确答案：D 你的答案：A 解析：排除法，业务风险无法恢复与解密。

7.6. 10. The information risk universe 信息风险宇宙 is informed by all of the following EXCEPT A：internal and external audit findings. B：data quality security, privacy and confidentiality issues C：regulatory 监管 non-compliance issues D：non-conformance 不合格 issues (policies. Standards, architecture, and procedures) E：All 正确答案：D 你的答案：A 解析：：ABC和风险相关。Architecture and procedures较难涉及风险。

7.7. 13. Knowing how data has been used and abused in the past is an indicator of how it might be ___ and ___ in the future A：available: pursued B：compromised 组成 ; disclosed披露 C：reported; used D：ignored; unused E：reported; unused 正确答案：B 你的答案：E 解析：了解滥用，指导未来如何披露数据

7.8. 18. Consent 同意 is given when an organization has A：captured and stored data in its database B：been hacked and information about people has become public C："obtained permission from an individual for the collection，use or disclosure of person all information“ D：reviewed its current policies with its data governance committee E：published specific information about its policies and practices 正确答案：C 你的答案：C 解析：从个人获得收集、使用或披露个人信息的许可

7.9. 22. The disclosure of sensitive addresses may occur through: A：software ignoring privacy tags on the data 隐私标签 B：inappropriate use of photocopier toner C：ineffective implementation of data architecture D：stored procedures being called directly E：cloud-based database 正确答案：A 你的答案：A 解析：数据安全

7.10. 23. De-identifying sensitive data in a data warehouse enables 数据仓库中的敏感数据进行去标识化可以: A：increased utilization without compromising data privacy B：certainty of complete datasets for interrogation C：a reduced complexity in the data models D：a focus on re-identifying records with 3rd party datasets E：a bypass in the need to assess data quality 正确答案：A 你的答案：E 解析：暂无解析

7.11. 41. How does data Security contribute to competitive advantage? A：Data Security helps to protect proprietary information and intellectual property, as well as customer and partner information B：Data Security makes it harder for your competitors to find out about who you do business with C：Stops organizations going out of business due to an information leak D：Governments do not allow organizations to trade if they do not manage Data Security E：Data security makes your competitors invest more effort into trying to find out your trade secrets 正确答案：A 你的答案：A 解析：7.1.3：2）行业法规或基于合同的法规。某些行业对如何记录、保留和加密信息有特定的标准，有些还不允许删除、编辑或分发到禁止的地方。例如，有关药品、危险品、食品、化妆品和先进技术的法规，禁止在原产国之外传送或存储某些信息，或要求在传送过程中对数据进行加密。 ②竞争优势或商业秘密。使用专有方法、组合、方案、来源、设计、工具、配方或操作技术以实现竞争优势的公司，可受到行业法规和/或知识产权法的保护。

7.12. 45. A workforce is locked out of a building due to a threat to personnel. The BCP Business Continuity Plan 业务持续性计划 is stored in the DMS. Luckily A：the workforce are expendable 可牺牲的. B：The metadata repository holds the BCP C：the document is printed and stored off site D：the security policy includes such a scenario E：the data warehouse catalogue is fully operational 正确答案：C 你的答案：B 解析：数据安全

Chapter 8: Data Integration and Interoperability 数据集成与互操作

1. Introduction

1.1. Definition

1.1.1. Data Integration and Interoperability (DII) describes processes related to the movement and consolidation of data within and between data stores, applications and organizations.

Integration consolidates data into consistent forms, either physical or virtual.

Data Interoperability is the ability for multiple systems to communicate.

3. The data architect needs to propagate data across the landscape in real time. This requires the leveraging of the following DMBOK knowledge areas A：data architecture data quality and content and document management. B：data architecture, data governance and metadata management. C：data architecture integration and interoperability data storage and operations. D：data architecture data modelling and design and data security. E：data architecture, metadata management and data security. 正确答案：C 你的答案：C 解析：8.1：数据集成和互操作（DIl）描述了数据在不同数据存储、应用程序和组织这三者内部和之间进行移动和整合的相关过程。

12. The purpose and rationale 根本原因 for data integration should be defined by A：reporting requirements B：the developers C：interoperability requirements 互操作 D：the business E：industry standards 正确答案：C 你的答案：C 解析：8.1.2. 数据集成和互操作（DIl）描述了数据在不同数据存储、应用程序和组织这三者内部和之间进行移动和整合的相关过程。数据集成是将数据整合成物理的或虚拟的一致格式，数据互操作是多个系统之间进行通信的能力。

1.1.2. DII solutions enable basic data management functions on which most organizations depend:

1. Data migration and conversion

2. Data consolidation into hubs or marts

3. Integration of vendor packages into an organization’s application portfolio

4. Data sharing between applications and across organizations

13. The ability of a photo app to share its images with various social media applications is an example of: A：replication B：metadata C：integration D：rendering 呈现 E：interoperability 正确答案：E 你的答案：E 解析：8.1：数据集成和互操作的解决方案提供了大多数组织所依赖的基本数据管理职能：4）在不同应用程序或组织之间数据共享。

5. Distributing data across data stores and data centers

6. Archiving data

7. Managing data interfaces

8. Obtaining and ingesting external data

9. Integrating structured and unstructured data

10. Providing operational intelligence and management decision support

1.1.3. DII is dependent on these other areas of data management:

1. Data Governance: For governing the transformation rules and message structures

2. Data Architecture: For designing solutions

3. Data Security: For ensuring solutions appropriately protect the security of data, whether it is persistent, virtual, or in motion between applications and organizations

4. Metadata: For tracking the technical inventory of data (persistent, virtual, and in motion), the business meaning of the data, the business rules for transforming the data, and the operational history and lineage of the data

5. Data Storage and Operations: For managing the physical instantiation of the solutions

6. Data Modeling and Design: For designing the data structures including physical persistence in databases, virtual data structures, and messages passing information between applications and organizations

1.1.4. Data Integration and Interoperability is critical to Data Warehousing and Business Intelligence, as well as Reference Data and Master Data Management, because all of these focus on transforming and integrating data from source systems to consolidated data hubs and from hubs to the target systems where it can be delivered to data consumers, both system and human.

1.1.5. Data Integration and Interoperability is central to the emerging area of Big Data management.

Big Data seeks to integrate various types of data, including data structured and stored in databases, unstructured text data in documents or files, other types of unstructured data such as audio, video, and streaming data.

1.1.6. This integrated data can be mined, used to develop predictive models, and deployed in operational intelligence activities.

1.2. Business Drivers

1.2.1. The need to manage data movement efficiently is a primary driver for DII.

1. The need to manage complexity and the costs associated with complexity are reasons to architect data integration from an enterprise perspective.

7. The creation of overly complex enterprise integration over time is often a symptom of: A：multiple data warehouses. B：multiple data owners C：multiple integration technologies. D：multiple metadata tags. E：multiple application coding languages 正确答案：C 你的答案：C 解析：8.1.1：对企业来说，管理数据集成的复杂性以及相关成本是建立数据集成架构的原因。企业级的数据集成设计远远比分散的或点对点的解决方案效率更高、成本更低。在应用程序之间采用点对点的解决方案，可能产生出成干上万的接口，即使最有效率和最有能力的IT支撑组织也会被迅速拖垮。

An enterprise design of data integration is demonstrably more efficient and cost effective than distributed or point-to-point solutions.

Data hubs such as data warehouses and Master Data solutions help to alleviate this problem

using enterprise data integration techniques such as hub-and-spoke integration and canonical message models.

2. Another business driver is managing the cost of support.

Standard tool implementations can reduce support and staffing costs and improve the efficiency of troubleshooting efforts.

3. DII also supports an organization’s ability to comply with data handling standards and regulations.

Enterprise-level DII systems enable re-use of code to implement compliance rules and simplify verification of compliance.

1.3. Goals and Principles

1.3.1. Goals

1. Make data available in the format and timeframe needed by data consumers, both human and system

2. Consolidate data physically and virtually into data hubs

3. Lower cost and complexity of managing solutions by developing shared models and interfaces

4. Identify meaningful events (opportunities and threats) and automatically trigger alerts and actions

5. Support Business Intelligence, analytics, Master Data Management, and operational efficiency efforts

1.3.2. Principles

1. Take an enterprise perspective in design to ensure future extensibility, but implement through iterativeand incremental delivery

2. Balance local data needs with enterprise data needs, including support and maintenance.

3. Ensure business accountability for Data Integration and Interoperability design and activity. Business experts should be involved in the design and modification of data transformation rules, both persistentand virtual.

1.4. Essential Concepts

1.4.1. Extract, Transform, and Load 抽取转换和加载

1. Central to all areas in Data Integration and Interoperability is the basic process of Extract, Transform, and Load (ETL).

these are the essential steps in moving data around and between applications and organizations.

2. Depending on data integration requirements, ETL can be performed as a periodically scheduled event (batch) or whenever new or updated data is available (real-time or event-driven). Operational data processing tends to be real-time or near real-time, while data needed for analysis or reporting is often scheduled in batch jobs.

3. Extract 抽取

The extract process includes selecting the required data and extracting it from its source.

Extracted data is then staged, in a physical data store on disk or in memory.

Ideally, if this process executes on an operational system, it is designed to use as few resources as possible, in order to avoid negatively affecting the operational processes.

4. Transform 转换

The transform process makes the selected data compatible with the structure of the target data store.

Transformation includes cases

1. where data is removed from the source when it moves to the target

14. Mapping requirements and rules for moving data from source to target enables A：load B：backups C：extract D：transformation E：analysis 正确答案：D 你的答案：D 解析：8.1.3：（2）转换过程是让选定的数据与目标数据库的结构相兼容。转换包括多种情况。例如，当数据向目标移动时将它从源数据中移除，或是数据复制到多个目标中，或是数据用于触发事件但不会持久化。

2. where data is copied to multiple targets

3. where the data is used to trigger events but is not persisted.

Examples of transformation

1. Format changes 格式变化

Conversion of the technical format of the data; for example, from EBCDIC to ASCIIformat

2. Structure changes 结构变化

Changes to the structure of the data; for example, from denormalized tonormalized records

3. Semantic conversion 语义转换

Conversion of data values to maintain consistent semantic representation. Forexample, the source gender codes might include 0, 1, 2, and 3, while the target gender codes might berepresented as UNKNOWN, FEMALE, MALE, or NOT PROVIDED.

4. De-duping 消除重复

Ensuring that if rules require unique key values or records, a means for scanning thetarget, and detecting and removing duplicate rows, is included

5. Re-ordering 重新排序

Changing the order of the data elements or records to fit a defined pattern

Transformation may be performed in batch or real-time, either physically storing the result in a staging area,

Data resulting from the transformation stage should be ready to integrate with data in the target structure.

5. Load 加载

The load step of ETL is physically storing or presenting the result of the transformations in the target system.

Depending on the transformations performed, the target system’s purpose, and the intended use, the data may need further processing to be integrated with other data, or it may be in a final form, ready to present to consumers.

6. ELT 抽取加载转换

If the target system has more transformation capability than either the source or an intermediary application system, the order of processes may be switched to ELT – Extract, Load, and Transform.

ELT allows transformations to occur after the load to the target system, often as part of the process.

ELT allows source data to be instantiated on the target system as raw data, which can be useful for other processes.

This is common in Big Data environments where ELT loads the data lake.

7. Mapping 映射

A synonym for transformation, a mapping is both the process of developing the lookup matrix from source to target structures and the result of that process.

A mapping defines the sources to be extracted, the rules for identifying data for extraction, targets to be loaded, rules for identifying target rows for update (if any), and any transformation rules or calculations to be applied.

Many data integration tools offer visualizations of mappings that enable developers to use graphical interfaces to create transformation code.

1.4.2. Latency 时延

Latency is the time difference between when data is generated in the source system and when the data is available for use in the target system. Different approaches to data processing result in different degrees of data latency. Latency can be high (batch) or low (event-driven) to very low (real-time synchronous).

1. 10. Integrating two data stores using batch or real-time synchronous approaches results in a difference in A：data quality B：lethargy 昏睡 C：latency 延迟 D：timestamping E：source of truth 正确答案：C 你的答案：C 解析：8.1.3：2.时延（Latency）是指从源系统生成数据到目标系统可用该数据的时间差。不同的数据处理方法会导致不同程度的数据延迟，延迟可以是很高（批处理）或较高（事件驱动），甚至是非常低（实时同步）。

2. Batch 批处理

Most data moves between applications and organizations in clumps or files either on request by a human data consumer or automatically on a periodic schedule. This type of interaction is called batch or ETL.

The set of changed data is called the delta 增量, and the data from a point in time is called a snapshot 快照.

With batch data integration solutions, there is often a significant delay between when data changes in the source and when it is updated in the target, resulting in high latency.

To achieve fast processing and lower latency, some data integration solutions use micro-batch 微批处理 processing which schedules batch processing to run on a much higher frequency than daily, such as every five minutes.

Batch data integration is used for data conversions, migrations, and archiving, as well as for extracting from and loading data warehouses and data marts.

There are risks associated with the timing of batch processing.

To minimize issues with application updates, schedule data movement between applications at the end of logical processing for the business day, or after special processing of the data has occurred at night.

To avoid incomplete data sets, jobs moving data to a data warehouse should be scheduled based on the daily, weekly, or monthly reporting schedule.

3. Change Data Capture 变更数据捕获

Change Data Capture is a method of reducing bandwidth by filtering to include only data that has been changed within a defined timeframe.

data capture may be data-based or log-based. (See Chapter 6.)

There are three techniques for data-based change data capture

1. The source system populates specific data elements, such as timestamps within a range, or codes or flags, which serve as change indicators. The extract process uses rules to identify rows to extract.

2. The source system processes add to a simple list of objects and identifiers when changing data, whichis then used to control selection of data for extraction.

3. The source system processes copy data that has changed into a separate object as part of thetransaction, which is then used for extract processing. This object does not need to be within thedatabase management system.

4. Near-real-time and Event-driven 准实时和时间驱动

Most data integration solutions that are not performed in batches use a near-real-time or event-driven solution. Data is processed in smaller sets spread across the day in a defined schedule, or data is processed when an event happens, such as a data update.

Near-real-time processing has a lower latency than batch processing and often a lower system load as the work is distributed over time, but it is usually slower than a synchronized data integration solution.

Near-real-time data integration solutions are usually implemented using an enterprise service bus.

5. Asynchronous 异步

In an asynchronous data flow, the system providing data does not wait for the receiving system to acknowledge update before continuing processing. Asynchronous implies that either the sending or receiving system could be off-line for some period without the other system also being off-line

Asynchronous data integration does not prevent the source application from continuing its processing, or cause the source application to be unavailable if any of the target applications are unavailable. Since the data updates made to applications in an asynchronous configuration are not immediate, the integration is called near-real-time.

The delay between updates made in the source and relayed to target data sets in a near-real-time environment is usually measured in seconds or minutes.

6. Real-time, Synchronous 实时，同步

In a synchronous integration solution, an executing process waits to receive confirmation from other applications or processes prior to executing its next activity or transaction

This means that the solution can process fewer transactions because it has to spend time waiting for confirmation of data synchronization.

If any of the applications that need the update are not available then the transaction cannot be completed in the primary application. This situation keeps data synchronized but has the potential to make strategic applications dependent on less critical applications.

Data sets may be kept in synch through database capabilities such as two-phase commits 两阶段提交, which ensure that all updates in a business transaction are all successful, or none is made.

7. Low Latency or Streaming 低延迟或流处理

Tremendous advances have been made in developing extremely fast data integration solutions. These solutions require a large investment in hardware and software

Low latency data integration solutions are designed to minimize the response time to events. They may include the use of hardware solutions like solid-state disk or software solutions like in-memory databases

Asynchronous solutions are usually used in low latency solutions

Massive multi-processing, or simultaneous processing, is also a common configuration in low latency solutions

1.4.3. Replication 复制

some applications maintain exact copies of data sets in multiple physical locations.

Such a solution must synchronize the physically distributed data set copies.

Because the benefits of replication solutions — minimal effect on the source data set and minimal amount of data being passed — are very desirable, replication is used in many data integration solutions, even those that do not include long distance physical distribution.

Replication utilities work optimally when source and target data sets are exact copies of each other. Differences between source and target introduce risks to synchronization

Data replication solutions are not optimal if changes to the data may occur at multiple copy sites. If it is possible that the same piece of data is changed at two different sites, then there is a risk that the data might get unsynchronized, or one of the sites may have their changes overwritten without warning.

1.4.4. Archiving 归档

Data that is used infrequently or not actively used may be moved to an alternate data structure or storage solution that is less costly to the organization.

ETL functions can be used to transport and possibly transform the archive data to the data structures in the archive environment.

Use archives to store data from applications that are being retired, as well as data from production operational systems that have not been used for a long time, to improve operational efficiency.

It is critical to monitor archive technology to ensure that the data is still accessible when technology changes. Having an archive in an older structure or format unreadable by newer technology can be a risk, especially for data that is still legally required.

1.4.5. Enterprise Message Format / Canonical Model 企业消息格式/规范格式

A canonical data model is a common model used by an organization or data exchange group that standardizes the format in which data will be shared.

In a hub-and-spoke 中心辐射形 data interaction design pattern, all systems that want to provide or receive data interact only with a central information hub.

limits the number of data transformations needed

a shared message format is a major undertaking, having a canonical model can significantly reduce the complexity of data interoperability in an enterprise, and thus greatly lower the cost of support

more than three systems and critical for managing data interactions in environments of more than 100 application systems.

1.4.6. Interaction Models 交互模型

Interaction models describe ways to make connections between systems in order to transfer data.

1. Point-to-point 点到点

The vast majority of interactions between systems that share data do so ‘point-to-point

risk

Impacts to processing 影响处理

Managing interfaces 管理接口

Potential for inconsistency 潜在的不一致

2. Hub-and-spoke 中心辐射型

The hub-and-spoke model, an alternative to point-to-point, consolidates shared data (either physically or virtually) in a central data hub that many applications can use.

Data Warehouses, Data Marts, Operational Data Stores, and Master Data Management hubs

11. The implementation of a `master data repository 仓库`, which is integrated across the enterprise，Is an example of which integration approach? A：Hub and Spoke B：Point to point C：Change Data Capture D：Replication E：Publish and Subscribe 正确答案：A 你的答案：A 解析：8.1.3（2）中心幅射型（Hub-and-Spoke）横型是点对点的替代方案，它将共事数据（物理或虚拟）整合到应用程序可以使用的一个中央数据中心。所有想交换数据的系统都是通过一个中央公共数据控制系统进行交换的，而不是直接与其他系统（点对点）进行交换。数据仓库、数据集市、操作数据存储和主数据管理中心都是数据中心的最佳示范

Hub-and-spoke interaction is more efficient and can be cost-justified even if the number of systems involved is relatively small, but it becomes critical to managing a portfolio of systems in the hundreds or thousands.

Enterprise Service Buses (ESB) are the data integration solution for near real-time sharing of data between many systems, where the hub is a virtual concept of the standard format or the canonical model for sharing data in the organization.

Hub-and-spoke may not always be the best solution. Some hub-and-spoke model latency is unacceptable or performance is insufficient.

3. Publish - Subscribe 发布与订阅

A publish and subscribe model involves systems pushing data out (publish), and other systems pulling data in (subscribe).

When multiple data consumers want a certain set of data or data in a certain format, developing that data set centrally and making it available to all who need it ensures that all constituents receive a consistent data set in a timely manner.

4. 8. Three common interaction models for data integration are A：point to point，wheel and spoke 车轮和辐条，public and share B：record and pass, copy and send, read and write C：point to point，hub and spoke，publish and subscribe D：plane to point harvest and seed 平面点收播种, publish and subscribe E：straight copy, curved copy round about copy 正确答案：C 你的答案：C 解析：8.1.3

1.4.7. DII Architecture Concepts 数据集成和互操作的架构概念

1. Application Coupling 应用耦合

Coupling describes the degree to which two systems are entwined.

Two systems that are tightly coupled usually have a synchronous interface, where one system waits for a response from the other.

Tight coupling represents a riskier operation:

if one system is unavailable then they are both effectively unavailable,

and the business continuity plan for both have to be the same.

loose coupling is a preferred interface design, where data is passed between systems without waiting for a response and one system may be unavailable without causing the other to be unavailable.

various techniques with services, APIs, or message queues.

Service Oriented Architecture using an Enterprise Service Bus is an example of a loosely coupled data interaction design pattern.

replacement of systems in the application inventory can theoretically be performed without rewriting the systems

2. Orchestration and Process Controls 编排和流程控制

Orchestration 编排 is the term used to describe how multiple processes are organized and executed in a system.

All systems handling messages or data packets must be able to manage the order of execution of those processes, in order to preserve consistency and continuity.

Process Controls 流程控制 are the components that ensure shipment, delivery, extraction, and loading of data is accurate and complete.

Often-overlooked 经常被忽略的

1. Database activity logs

2. Batch job logs

3. Alerts

4. Exception logs 异常日志

5. Job dependence charts 作业依赖图 with remediation options 补救方案, standard responses 标准回复

6. Job ‘clock’ information 作业时钟信息, such as the timing of dependent jobs, the expected length of the jobs, and the computing (available) window time 计算可用窗口时间

3. Enterprise Application Integration (EAI) 企业应用集成

In an enterprise application integration model (EAI), software modules interact with one another only through well-defined interface calls (application programming interfaces – APIs).

Data stores are updated only by their own software modules and other software cannot reach in to the data in an application but only access through the defined APIs.

4. Enterprise Service Bus (ESB) 企业服务总线

An Enterprise Service Bus is a system that acts as an intermediary between systems, passing messages between them. Applications can send and receive messages or files using the ESB, and are encapsulated from other processes existing on the ESB.

5. The acronym ESB stands for A：Enterprise Service Bus B：Enterprise Service Board C：Enterprise Service Business D：Enterprise System Board E：Enterprise System Business 正确答案：A 你的答案：A 解析：8.1.3：（4）企业服务总线企业服务总线（Enterprise Service Bus，ESB）是一个系统，它充当系统之间的中介，在它们之间传送消息。

An example of loose coupling, the ESB acts as the service between the applications.

5. Service-Oriented Architecture (SOA) 面向服务的架构

Most mature enterprise data integration strategies utilize the idea of service-oriented architecture (SOA), where the functionality of providing data or updating data (or other data services) can be provided through well-defined service calls between applications.

15. The acronym SOA stands for: A：Service Oriented Architecture B：Service Oriented Actuaries C：service Oriented Accuracy D：System Oriented Architecture E：System Oriented Accuracy 正确答案：A 你的答案：A 解析：8.1.3：（5）大多数成熟的企业数据集成策略都采用面向服务的架构（Service-Oriented Architecture，SOA）思想，通过在应用程序之间定义良好的服务调用，可以提供推送数据或更新数据（或其他数据服务）的功能，使用这种方法，应用程序不必与其他应用程序直接交互或了解其他应用程序的内部工作。

SOA enables application independence and the ability for an organization to replace systems

The goal of service-oriented architecture is to have well-defined interaction between self-contained software modules.

with various technologies including web services, messaging, RESTful APIs, etc.

Services are usually implemented as APIs (application programming interfaces) that are available to be called by application systems (or human consumers).

6. Complex Event Processing (CEP) 复杂事件处理

Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events), and deriving a conclusion from them.

Complex event processing (CEP) combines data from multiple sources to identify meaningful events (such as opportunities or threats) to predict behavior or activity and automatically trigger real-time response, such as suggesting a product for a consumer to purchase. Rules are set to guide the event processing and routing.

Supporting complex event processing requires an environment that can integrate vast amounts of data of various types.

complex event processing is often tied to Big Data.

7. Data Federation and Virtualization 数据联邦和虚拟化

Data Federation provides access to a combination of individual data stores, regardless of structure.

Data Virtualization enables distributed databases, as well as multiple heterogeneous data stores, to be accessed and viewed as a single database.

8. Data-as-a-Service (DaaS) 数据即服务

Software-as-a-service (SaaS) is a delivery and licensing model.

(IT-as-a-service, platform-as-a-service, database-as-a-service).

One definition of Data-as-a-Service (DaaS) is data licensed from a vendor and provided on demand, rather than stored and maintained in the data center of the licensing organization

9. Cloud-based Integration 云化集成

Cloud-based integration (also known as integration platform-as-a-service or IPaaS) is a form of systems integration

delivered as a cloud service that addresses data, process, service oriented architecture (SOA), and application integration use cases.

Prior to the emergence of cloud computing, integration could be categorized as either internal or business to business (B2B).

Internal integration requirements are serviced through an on-premises middleware platform, and typically use a service bus (ESB) to manage exchange of data between systems.

Business-to-business integration is serviced through EDI (electronic data interchange) gateways or value-added networks (VAN) or market places.

1. Electronic data interchange (EDI) through XML typically involves each of the following EXCEPT A：customers transactions.客户交易 B：a reduction of paper flow C：contracts between trading partners D：transaction converters E：All 正确答案：A 你的答案：B 解析：8.1.3题解：A是客户交易，不相关。在云计算出现之前，集成可以分为内部集成和企业间集成（B2B）。内部集成需求是通过内部中间件平台提供服务，并且通常使用服务总线（ESB）来管理系统之间的数据交换。企业间集成是通过电子数据交换（EDI）网关、增值网络（VAN）或市场完成。

Cloud-based integration involves interacting with the SaaS application data to be integrated using SOA interaction services.

1.4.8. Data Exchange Standards 数据交换标准

Data Exchange Standards are formal rules for the structure of data elements.

An exchange pattern defines a structure for data transformations needed by any system or organization exchanging data.

Data needs to be mapped to the exchange specification.

NIEM 国家信息交换模型 uses Extensible Markup Language (XML) for schema definitions and element representation

2. Data Integration Activities

2.1. Data Integration and Interoperability involves getting data where it is needed, when it is needed, and in the form in which it is needed. Data integration activities follow a development lifecycle. They begin with planning and move through design, development, testing, and implementation. Once implemented, integrated systems must be managed, monitored and enhanced.

2.2. Plan and Analyze 规划和分析

2.2.1. Define Data Integration and Lifecycle Requirements 定义

Defining data integration requirements involves understanding the organization’s business objectives, as well as the data required and the technology initiatives proposed to meet those objectives.

also necessary to gather any relevant laws or regulations regarding the data to be used

Requirements must also account for organizational policy on data retention and other parts of the data lifecycle

Data integration and lifecycle requirements are usually defined by business analysts, data stewards, and architects in various functions, including IT,

determine the type of DII interaction model, which then determines the technology and services necessary to fulfill the requirements.

The process of defining requirements creates and uncovers valuable Metadata. This Metadata should be managed throughout the data lifecycle, from discovery through operations. The more complete and accurate an organization’s Metadata, the better its ability to manage the risks and costs of data integration.

2.2.2. Perform Data Discovery 执行数据探索

Data discovery should be performed prior to design.

The goal of data discovery is to identify potential sources of data for the data integration effort.

Discovery will identify where data might be acquired and where it might be integrated.

Discovery also includes high-level assessment of data quality, to determine whether the data is fit for the purposes of the integration initiative.

Data discovery produces or adds to an inventory of organizational data. This inventory should be maintained in a Metadata repository.

Most organizations have a need to integrate data from their internal systems.

data integration solutions may also involve the acquisition of data from outside the organization.

2.2.3. Document Data Lineage 记录数据血缘

The process of data discovery will also uncover information about how data flows through an organization.

How the data under analysis is acquired or created by the organization,

where it moves and is changed within the organization,

how the data is used by the organization for analytics, decision-making, or event triggering.

Detailed lineage can include the rules according to which data is changed, and the frequency of changes.

Analysis of lineage may identify updates required to documentation of systems in use.

The analysis process may also identify opportunities for improvements in the existing data flow.

2.2.4. Profile Data 剖析数据

1. Understanding data content and structure is essential to successful integration of data.

4. Data profiling A：provides guidance to turn bad habits into good habits B：creates data models with a precise set of symbols and text C：aids requirements analysis D：examines existing data to understand content and structure. E：Above is all right 正确答案：D 你的答案：D 解析：8.2.1.题解：4.剖析数据理解数据的内容和结构是实现数据集成成功的关键。数据剖析（Data Profiling）有助于实现这一目标

6. which of these is not an expected role of a data Quality oversight board? A：Establishing communications feedback mechanisms B：Setting data quality improvement priorities C：Producing certification compliance policies D：Developing maintaining data quality E：Data profiling analysis 正确答案：E 你的答案：E 解析：8.2.1.题解：4.剖析数据理解数据的内容和结构是实现数据集成成功的关键。数据剖析（Data Profiling）有助于实现这一目标

2. Data profiling contributes to this end.

Actual data structure and contents always differ from what is assumed

Sometimes differences are small; other times they are large enough to derail an integration effort.

profiling can help integration teams discover these differences and use that knowledge to make better decisions about sourcing and design.

3. Basic profiling involves analysis of:

1. Data format as defined in the data structures and inferred from the actual data

2. Data population, including the levels of null, blank, or defaulted data

3. Data values and how closely they correspond to a defined set of valid values

4. Patterns and relationships internal to the data set, such as related fields and cardinality rules

5. Relationships to other data sets

4. More extensive profiling of the potential source and target data sets is required to understand how well the data meets the requirements of the particular data integration initiative. Profile both the sources and targets to understand how to transform the data to match requirements.

5. One goal of profiling is to assess the quality of data.

16. A Term is a form of data analysis used to inspect data and assess quality. this term is called A：Data Profiling B：Data parsing C：Data Modeling D：Data analysis E：Data Cleansing 正确答案：A 你的答案：A 解析：8.2.1.题解：4.剖析数据理解数据的内容和结构是实现数据集成成功的关键。数据剖析（Data Profiling）有助于实现这一目标

6. The requirement to profile data must be balanced with an organization’s security and privacy regulations.

2.2.5. Collect Business Rules 收集业务规则

Business rules are a critical subset of requirements. A business rule is a statement that defines or constrains an aspect of business processing.

Business rules are intended to assert business structure or to control or influence the behavior of the business.

four categories: definitions of business terms, facts relating terms to each other, constraints or action assertions 断言, and derivations 派生.

Include

1. Assess data in potential source and target data sets

2. Direct the flow of data in the organization

3. Monitor the organization’s operational data

4. Direct when to automatically trigger events and alerts

Gathering business rules is also called rules harvesting or business rule mining.

The business analyst or data steward can extract the rules from existing documentation

For Master Data Management,

business rules include match rules 匹配规则, merge rules 合并, survivorship rules 存活, and trust rules 信任.

For data archiving, data warehousing, and other situations where a data store is in use,

the business rules also include data retention rules 保留.

2.3. Design Data Integration Solutions 设计数据集成解决方案

2.3.1. Design Data Integration Architecture 设计数据集成架构

Data integration solutions should be specified at both the enterprise level and the individual solution level

Design a solution to meet the requirements, reusing as many of the existing Data Integration and Interoperability components as is feasible.

1. Select Interaction Model 选择交互模式

Determine which interaction model or combination will fulfill the requirements – hub-and-spoke, point-to-point, or publish-subscribe.

2. Design Data Services or Exchange Patterns 设计数据服务或交换方式

Create or re-use existing integration flows to move the data. These data services should be companions to existing similar data services

Any data exchange specification design should start with industry standards, or other exchange patterns already existing. When possible, make any changes to existing patterns generic enough to be useful to other systems;

2.3.2. Model Data Hubs, Interfaces, Messages, and Data Services 建模数据中心、接口、消息、数据服务

1. data persists

such as Master Data Management hubs, data warehouses and marts, and operational data stores

2. only for moving or transforming data

such as interfaces, message layouts, and canonical models

2.3.3. Map Data Sources to Targets 映射数据源到目标

Almost all data integration solutions include transforming data from source to target structures. Mapping sources to targets involves specifying the rules for transforming data from one location and format to another.

For each attribute mapped, a mapping specification

1. Indicates the technical format of the source and target

2. Specifies transformations required for all intermediate staging points between source and target

3. Describes how each attribute in a final or intermediate target data store will be populated

4. Describes whether data values need to be transformed; for example, by looking up the source value in a table that indicates the appropriate target value

5. Describes what calculations are required

2.3.4. Design Data Orchestration 设计数据编排

The flow of data in a data integration solution must be designed and documented. Data orchestration is the pattern of data flows from start to finish, including intermediate steps, required to complete the transformation and/or transaction.

1. Batch data integration orchestration will indicate the frequency of the data movement and transformation.

is usually coded into a scheduler 调度器

2. Real-time data integration orchestration is usually triggered by an event, such as new or updated data.

2.4. Develop Data Integration Solutions 开发数据集成解决方案

1. Develop Data Services 开发数据服务

Develop services to access, transform, and deliver data as specified, matching the interaction model selected.

Using consistent tools or standard vendor suites across the organization for these various purposes can simplify operational support and lower operating costs

2. Develop Data Flows 开发数据流编排

Integration or ETL data flows will usually be developed within tools specialized to manage those flows in a proprietary way.

Batch data flows will be developed in a scheduler (usually the enterprise standard scheduler - CTRL-M) that will manage the order, frequency, and dependency of executing the data integration pieces that have been developed.

Interoperability requirements may include developing mappings or coordination points between data stores.

3. Develop Data Migration Approach 制定数据迁移方法

Data needs to be moved when new applications are implemented or when applications are retired or merged. This process involves transformation of data to the format of the receiving application.

Migration is not quite a one-time process, as it needs to be executed for testing phases as well as final implementation.

Data migration projects are frequently under-estimated or under-designed,

because programmers are told to simply move the data; they do not engage in the analysis and design activities required for data integration. When data is migrated without proper analysis, it often looks different from the data that came in through the normal processing. Or the migrated data may not work with the application as anticipated.

Profiling data of core operational applications will usually highlight data that has been migrated from one or more generations of previous operational systems and does not meet the standards of the data that enters the data set through the current application code.

4. Develop a Publication Approach 制定发布方式

Systems where critical data is created or maintained need to make that data available to other systems in the organization.

Best practice is to define common message definitions (canonical model) for the various types of data in the organization and let data consumers (either applications or individuals) who have appropriate access authority subscribe to receive notification of any changes to data of interest.

5. Develop Complex Event Processing Flows 开发复制事件处理流

requires

1. Preparation of the historical data about an individual, organization, product, or market and pre-population of the predictive models

2. Processing the real-time data stream to fully populate the predictive model and identify meaningfulevents (opportunities or threats)

3. Executing the triggered action in response to the prediction

Preparation and pre-processing of the historical data needed in the predictive model may be performed in nightly batch processes or in near real-time.

The response to the identification of a meaningful event may be as simple as a warning being sent out or as complex as the automatic deployment of armed forces.

6. Maintain DII Metadata 维护DII的元数据

As previously noted (see Section 2.1), an organization will create and uncover valuable Metadata during the process of developing DII solutions. This Metadata should be managed and maintained to ensure proper understanding of the data in the system, and to prevent the need to rediscover it for future solutions. Reliable Metadata improves an organization’s ability to manage risks, reduce costs, and obtain more value from its data.

Document the data structures of all systems involved in data integration as source, target, or staging. Include business definitions and technical definitions (structure, format, size), as well as the transformation of data between the persistent data stores. Whether data integration Metadata is stored in documents or a Metadata repository, it should not be changed without a review and approval process from both business and technical stakeholders

2.5. Implement and Monitor 实施与检测

2.5.1. Activate the data services that have been developed and tested.

2.5.2. Establish parameters that indicate potential problems with processing, as well as direct notification of issues.

2.5.3. Data interaction capabilities must be monitored and serviced at the same service level as the most demanding target application or data consumer

3. Tools

3.1. Data Transformation Engine/ETL Tool 数据转换引擎/ETL工具

3.1.1. A data transformation engine (or ETL tool) is the primary tool in the data integration toolbox, central to every enterprise data integration program

3.1.2. Basic considerations in selecting a data transformation engine should include whether it is necessary to handle batch as well as real-time functionality, and whether unstructured as well as structured data needs to be accommodated

as the most mature tools exist for batch-oriented processing of structured data only.

3.2. Data Virtualization Server 数据虚拟化服务器

3.2.1. Data transformation engines usually perform extract, transform, and load physically on data; however, data virtualization servers perform data extract, transform, and integrate virtually

3.2.2. Data virtualization servers can combine structured and unstructured data.

3.2.3. A data warehouse is frequently an input to a data virtualization server, but a data virtualization server does not replace the data warehouse in the enterprise information architecture.

3.3. Enterprise Service Bus 企业服务总线

3.3.1. An enterprise service bus (ESB) refers to both a software architecture model and a type of message-oriented middleware used to implement near real-time messaging between heterogeneous data stores, applications, and servers that reside within the same organization

3.3.2. Most commonly, an ESB is used in asynchronous format to enable the free flow of data. An ESB can also be used synchronously in certain situations.

3.3.3. This model is called ‘near real-time’

3.3.4. This is a loosely coupled model

3.4. Business Rules Engine 业务规则引擎

3.4.1. Many data integration solutions are dependent on business rules

3.4.2. A business rules engine that allows non-technical users to manage business rules implemented by software is a very valuable tool that will enable evolution of the solution at a lower cost,

3.5. Data and Process Modeling Tools 数据和流程建模工具

3.5.1. Data modeling tools should be used to design not only the target but also the intermediate data structures needed in data integration solutions.

3.6. Data Profiling Tool 数据剖析工具

3.6.1. Data profiling involves statistical analysis of data set contents to understand format, completeness, consistency, validity, and structure of the data.

3.7. Metadata Repository 元数据存储库

3.7.1. During data integration projects, one or more Metadata repositories may be used to document the technical structure and business meaning of the data being sourced, transformed, and targeted.

4. Techniques

4.1. The basic goals are

4.1.1. keep the applications coupled loosely

4.1.2. limit the number of interfaces developed

4.1.3. requiring management by using a hub-and-spoke approach,

4.1.4. to create standard (or canonical) interfaces.

5. Implementation Guidelines

5.1. Readiness Assessment / Risk Assessment

5.1.1. so the readiness/risk assessment should be around enterprise integration tool implementation, or enhancing capabilities to allow interoperability.

Implementing enterprise data integration solutions is usually cost-justified based on implementation between many systems.

5.1.2. making sure that some participants in every project are business- or application-oriented, and not just data integration tool experts

5.2. Organization and Cultural Change

5.2.1. Organizations must determine whether responsibility for managing data integration implementations is centralized or whether it resides with decentralized application teams.

5.2.2. Many organizations develop a Center of Excellence specializing in the design and deployment of the enterprise data integration solutions.

Local teams understand the data in their applications.

Central teams can build deep knowledge of tools and technologies

5.2.3. Local and central teams collaborate to develop solutions connecting an application into an enterprise data integration solution.

5.2.4. Review all data transformation mapping design and changes with by business subject matter experts in each involved system.

6. DII Governance

6.1. Intro

6.1.1. Decisions about the design of data messages, data models, and data transformation rules have a direct impact on an organization’s ability to use its data. These decisions must be business-driven.

6.1.2. Business stakeholders are responsible for defining rules for how data should be modeled and transformed.

6.1.3. In DII, the landscape of governance controls to support trust can be complex and detailed.

6.1.4. Policies need to be established to ensure that the organization benefits from an enterprise approach to DII.

1. policies can be put in place to ensure that SOA principles are followed,

2. new services are created only after a review of existing services

3. all data flowing between systems goes through the enterprise service bus.

6.2. Data Sharing Agreements 数据共享协议

6.2.1. Prior to the development of interfaces or the provision of data electronically, develop a data sharing agreement or memorandum of understanding (MOU) which stipulates the responsibilities and acceptable use of data to be exchanged, approved by the business data stewards of the data in question.

9. A document that stipulates 规定 the responsibilities and acceptable use of data to be exchanged is a: A：project charter B：data model. C：data sharing agreement. D：data quality assessment. E：interface contract 正确答案：C 你的答案：C 解析：8.6.1数据共享协议在开发接口或以电子方式提供数据之前，应制定一份数据共享协议或谅解备忘录（MOU）。该协议规定了交换数据的责任和可接受的使用用途，并由相关数据的业务数据主管批准。

6.2.2. The data sharing agreements should specify anticipated use and access to the data, restrictions on use, as well as expected service levels, including required system up times and response times. These agreements are especially critical for regulated industries, or when personal or secure information is involved.

6.3. DII and Data Lineage

6.3.1. Data lineage is useful to the development of DII solutions. It is also often required for data consumers to use data, but it is becoming even more important as data is integrated between organizations.

6.3.2. Governance is required to ensure that knowledge of data origins and movement is documented. Data sharing agreements may stipulate limitations to the uses of data and in order to abide by these, it is necessary to know where data moves and persists.

6.3.3. In addition, data lineage information is required when making changes to data flows. This information must be managed as a critical part of solution Metadata. Forward and backward data lineage (i.e., where did data get used and where did it come from) is critical as part of the impact analysis needed when making changes to data structures, data flows, or data processing.

6.4. Data Integration Metrics

6.4.1. To measure the scale and benefits from implementing Data Integration solutions, include metrics on availability, volume, speed, cost, and usage:

1. Data Availability 可用性

Availability of data requested

2. Data Volumes and Speed 数据量和速度

Volumes of data transported and transformed

Volumes of data analyzed

Speed of transmission

Latency between data update and availability

Latency between event and triggered action

Time to availability of new data sources

3. Solution Costs and Complexity 方案成本和复杂性

Cost of developing and managing solutions

Ease of acquiring new data

Complexity of solutions and operations

Number of systems using data integration solutions

7. Works Cited / Recommended

7.1. 2. Data mashups 混合.集成 deliver A：coincident 一致 data through a presentation layer. B：deep analytics C：contents of the data warehouse only D：data staff creative ideas E：All 正确答案：A 你的答案：A 解析：Data Mashing is the process of integrating business-related heterogeneous and application data from numerous sources to give a more unified view from a "big picture" perspective.

Chapter 9: Document and Content Management 文件与内容管理

1. Introduction

1.1. Definition

1.1.1. Document and Content Management entails controlling the capture, storage, access, and use of data and information stored outside relational databases.

Its focus is on maintaining the integrity of and enabling access to documents and other unstructured or semi-structured information which makes it roughly equivalent to data operations management for relational databases

32. Integrating data security with document and content management knowledge areas guides the implementation of: A：appropriate access and authorization to unstructured data B：appropriate privacy controls on data marts. C：straight-through processing for NoSQL queries D：fitness for purpose metrics for unstructured data E：appropriate access and authorization to structured data 正确答案：A 你的答案：B 解析：9.1. 文件和内容管理是指针对存储在关系型数据库之外的数据和信息的采集、存储、访问和使用过程的管理。它的重点在于保持文件和其他非结构化或半结构化信息的完整性，并使这些信息能够接访问。

enabling access to documents and other unstructured or semi-structured information

Ensuring security and quality requires governance, reliable architecture, and well-managed Metadata.

1.1.2. Planning, implementation, and control activities for lifecycle management of data and information found in any form or medium.

1.1.3. Deliverables

1. Content and Records Management Strategy

2. Policy and procedure

3. Content Repository

4. Managed record in many media formats

5. Audit trail and log

6. 15. Which of the following are primary deliverables of proper document and record management? A："Data from tracking devices, building sensor data" B："Relational databases, database logs，paper documents" C："Local drives of laptops，transcripts of phone calls" D："Managed records in many media formats ， e-discovery records ， policies and procedures, contracts and financial documents“ E："Spreadsheets, company library books ,sales transactions“ 正确答案：D 你的答案：C 解析：9.1.1

1.2. Business Drivers

1.2.1. The primary business drivers for document and content management include regulatory compliance, the ability to respond to litigation and e-discovery requests, and business continuity requirements.

1. regulatory compliance 法规要求

Laws and regulations require that organizations maintain records of certain kinds of activities

Records include both paper documents and electronically stored information (ESI).

2. respond to litigation 诉讼响应

Good records management is necessary for business continuity. It also enables an organization to respond in the case of litigation.

3. e-discovery requests 电子取证

E-discovery is the process of finding electronic records that might serve as evidence in a legal action.

4. business continuity requirements 业务连续性

Gaining efficiencies is a driver for improving document management.

1.3. Goals and Principles

1.3.1. goals

1. Ensuring effective and efficient retrieval and use of data and information in unstructured formats

29. A goal of Document and Content Management' is to ensure effective and efficient retrieval and use of A：data and information in unstructured formats B："data but not information in unstructured formats" C："information but not data in unstructured formats." D：data and information in structured formats E：data and information in relational formats 正确答案：A 你的答案：A 解析：9.1.2题解：实施文件和内容管理最佳实践的目标，包括：1）确保能够高速有效地采集和使用非结构化的数据和信息。2）确保结构化和非结构化数据之间的整合能力。3）遵守法律义务并达到客户预期。

2. Ensuring integration capabilities between structured and unstructured data

3. Complying with legal obligations and customer expectations

1.3.2. principles

1. Everyone in an organization has a role to play in protecting the organization’s future. Everyone mustcreate, use, retrieve, and dispose of records in accordance with the established policies and procedures

2. Experts in the handling of records and content should be fully engaged in policy and planning. Regulatory and best practices 监管实践 can vary significantly based on industry sector and legal jurisdiction.

1.3.3. ARMA International, a not-for-profit professional association for managing records and information, published a set of Generally Acceptable Record keeping Principles® (GARP) 普遍接受的档案保存指导原则

Principle of Accountability 问责原则

An organization shall assign a senior executive to appropriate individuals, adopt policies and processes to guide staff, and ensure program auditability.

Principle of Integrity 完整原则

An information governance program shall be constructed so the records and information generated or managed by or for the organization have a reasonable and suitable guarantee of authenticity and reliability.

Principle of Protection 保护原则

An information governance program shall be constructed to ensure a reasonable level of protection to information that is personal or that other wise requires protection.

Principle of Compliance 遵从原则

An information governance program shall be constructed to comply withapplicable laws and other binding authorities, as well as the organization’s policies.

Principle of Availability 可用原则

An organization shall maintain its information in a manner that ensurestimely, efficient, and accurate retrieval of its information.

Principle of Retention 保留原则

An organization shall retain its information for an appropriate time, takinginto account all operational, legal, regulatory and fiscal requirements, and those of all relevant binding authorities.

9. Regulatory data retention requirements drive the need to keep data for set periods for A：submitting datasets to regulators B：historical reporting C：customer analytics. D：legal obligations. E：alternate backup sources 正确答案：D 你的答案：D 解析： 9.1.2题解：6）保留原则（Retention）。组织的信息应保留适当的时间，并考虑所有运营、法律、监管和财政以及其他所有相关约束的要求。

Principle of Disposition 处置原则

An organization shall provide secure and appropriate disposition of information in accordance with its policies, and, applicable laws, regulations and other bindingauthorities.

Principle of Transparency 透明原则

An organization shall document its policies, processes and activities,including its information governance program,in a manner that is available to and understood by staffand appropriate interested parties.

22. In 2009，ARMA International published GARP for managing records and information. GARP stands for. A：Global Accredited Recordkeeping principles Answered B：Gregarious Archive of Recordkeeping Processes C：Generally Available Recordkeeping Practices D：G20 Approved Recordkeeping Principles E：Generally Acceptable Recordkeeping Principles 正确答案：E 你的答案：E 解析：9.1.3：ARMA国际（非营利性的档案和信息管理专业协会）在2009年发布了一套被普遍接受的档案保存指导原则（GARP），它描述了应该如何维护业务档案。

1.4. Essential Concepts

1.4.1. Content 内容

1. Content refers to the data and information inside the file, document, or website.

Content also has a lifecycle. In its completed form, some content becomes a matter of record for an organization. Official records are treated differently from other content.

2. Content Management 内容管理

Content management includes the processes, techniques, and technologies for organizing, categorizing, and structuring information resources so that they can be stored, published, and reused in multiple ways.

The lifecycle of content can be active

with daily changes through controlled processes for creation and modification;

or it can be more static

with only minor, occasional changes.

Content may be managed formally (strictly stored, managed, audited, retained or disposed of)

or informally through ad hoc 临时的 updates.

Content management is particularly important in websites and portals, but the techniques of indexing based on keywords and organizing based on taxonomies can be applied across technology platforms. When the scope of content management includes the entire enterprise, it is referred to as Enterprise Content Management (ECM).企业内容管理

3. Content Metadata 内容元数据

Metadata is essential to managing unstructured data, both what is traditionally thought of as content and documents and what we now understand as ‘Big Data’. Without Metadata, it is not possible to inventory and organize content. Metadata for unstructured data content is based on:

1. Format 格式

2. Search-ability 可搜索性

3. Self-documentation 自我描述性

4. Existing patterns 既有模式

5. Content subjects 内容主题

6. Requirements 需求

the maintenance of Metadata for unstructured data becomes the maintenance of a cross-reference between various local patterns 本地模式 and the official set of enterprise Metadata

a centralized team maintains cross-reference patterns between records management indexes, taxonomies, and even variant thesauri.变体主题词

4. Content Modeling 内容建模

Content modeling is the process of converting logical content concepts into content types, attributes, and data types with relationships.

An attribute describes something specific and distinguishable about the content to which it relates.

A data type restricts the type of data the attribute may hold, enabling validation and processing.

There are two levels

The first is at the information product level 信息产品级别

The second is at the component level 组件级别

The level of detail in the model depends on the granularity 颗粒度 desired for reuse and structure.

They support adaptive content, which is format-free and device-independent.

The models become the specifications for the content implemented in such structures such as XML schema definition (XSDs), forms, or stylesheets 样式表.

5. Content Delivery Methods 内容分发方法

Delivery methods include web pages, print, and mobile apps as well as eBooks with interactive video and audio.

Content delivery systems are ‘push’, ‘pull’, or interactive.

Push 推式: In a push delivery system, users choose the type of content delivered to them on a pre-determined schedule.

Really Simple Syndication (RSS)

Pull 拉式: In a pull delivery system, users pull the content through the Internet.

An example of a pull systemis when shoppers visit online retail stores.

Interactive 交互式: Interactive content delivery methods, such as third-party electronic point of sale (EPOS)apps

Options for sharing data between applications include EnterpriseApplication Integration (EAI), Changed Data Capture, Data Integration and EII. (See Chapter 8.)

1.4.2. Controlled Vocabularies 受控词表

1. A controlled vocabulary is a defined list of explicitly allowed terms used to index, categorize, tag, sort, and retrieve content through browsing and searching.

A controlled vocabulary is necessary to systematically organize documents, records, and content. Vocabularies range in complexity from simple lists or pick lists, to the synonym rings or authority lists, to taxonomies, and, the most complex, thesauri and ontologies.

An example of a controlled vocabulary is the Dublin Core 都柏林核心元素集, used to catalog publications.

Defined policies control over who adds terms to the vocabulary (e.g., a taxonomist or indexer 分类学家, or librarian 索引管理员).

folksonomy 大众分类法，通俗分类法

2. Vocabulary Management 词汇表管理

Vocabulary management is the function of defining, sourcing, importing, and maintaining any given vocabulary.

Key questions to enable vocabulary management focus on uses, consumers, standards, and maintenance:

1. What information concepts will this vocabulary support?

2. Who is the audience for this vocabulary? What processes do they support? What roles do they play?

3. Why is the vocabulary necessary? Will it support an application, content management, or analytics?

4. What decision-making body is responsible for designating preferred terms?

5. What existing vocabularies do different groups use to classify this information? Where are theylocated? How were they created? Who are their subject matter experts? Are there any security orprivacy concerns for any of them?

6. Is there an existing standard that can fulfill this need? Are there concerns of using an external standardvs. internal? How frequently is the standard updated and what is the degree of change of each update?Are standards accessible in an easy to import / maintain format, in a cost-efficient manner?

preferred vocabularies 首选词汇表 would still be defined in an organization

3. Vocabulary Views and Micro-controlled Vocabulary 词汇表视图和微控制词汇表

A vocabulary view is a subset of a controlled vocabulary, covering a limited range of topics within the domain of the controlled vocabulary.

Vocabulary views increase information’s usability by limiting the content to what is appropriate to the users.

A micro-controlled vocabulary is a vocabulary view containing highly specialized terms not present in the general vocabulary.

Micro-controlled vocabularies are necessary when the goal is to take advantage of a standard vocabulary, but the content is not sufficient and there is a need to manage additions/extensions for a specific group of information consumers.

4. Term and Pick Lists 术语和选择列表

Lists of terms are just that: lists. They do not describe relationships between the terms.

Pick lists are often buried in applications.

These pick lists are managed as faceted taxonomies 分面分类法 inside the software.

5. Term Management 术语管理

defines a term as “One or more words designating a concept (ANSI/NISO Z39.19-2005)

Term Management includes specifying how terms are initially defined and classified and how this information is maintained once it starts being used in different systems.

defines a preferred term as one of two or more synonyms or lexical variants selected as a term for inclusion in a controlled vocabulary.

Term management includes establishing relationships between terms within a controlled vocabulary. There are three types of relationships:

Equivalent term relationship 等价术语关系

Hierarchical relationship 层次化术语关系

Related term relationship 关联关系

6. Synonym Rings and Authority Lists 同义词环和规范表

A synonym ring is a set of terms with roughly equivalent meaning.

synonym ring allows users who search on one of the terms to access content related to any of the terms.

synonym rings is for retrieval 检索, not for indexing 索引.

one term is preferred and the others are variants.

An authority list is a controlled vocabulary of descriptive terms designed to facilitate retrieval of information within a specific domain or scope.

An authority file cross-references synonyms and variants for each term to guide the user from a non-preferred to a preferred term.

7. Taxonomies 分类法

Taxonomy is an umbrella 总称 term referring to any classification or controlled vocabulary

The best-known example of taxonomy is the classification system for all living things developed by the Swedish biologist Linnaeus.

4. Classifying data helps to A：group together similar kinds of data B：categorize data architecture and definitions C：create data tables files and records. D：apply common standards and processes to create meta-data E：all 正确答案：A 你的答案：B 解析：9.1.3（6）分类法分类法（Taxonomies）是指任何分类或受控词表的总称。最著名的例子是瑞典生物学家林奈（Linnaeus）开发的所有生物的分类系统。在内容管理中，分类法是一种命名结构，包含用于概述主题、启用导航和搜索系统的受控词表。分类法有助于减少歧义并控制同义词。层次分类法包含了对索引者和搜索者都有帮助的多种类型的

19. An umbrella term for any classification or controlled vocabulary is A：dictionary B：data model C：metadata D：English E：taxonomy. 正确答案：E 你的答案：E 解析：9.1.3分类法（Taxonomies）是指任何分类或受控词表的总称

In content management, a taxonomy is a naming structure containing a controlled vocabulary used for outlining topics and enabling navigation and search systems.

Taxonomies can have different structures:

1. A flat taxonomy 扁平分类法 has no relationships among the set of controlled categories. All the categories are equal.

a list of countries

10. A kind of taxonomy has no relationships among the set of controlled categories. this taxonomy is called A：facet B：network C：polyhierarchy D：hierarchical E：flat 正确答案：E 你的答案：E 解析：9.1.3：1）扁平分类法（Flat Taxonomy）。在受控类别集之间没有关系，所有类别都是平等的。这类似于列表。例如，一个包含多个国家的列表。

2. A hierarchical taxonomy 层次分类法 is a tree structure where nodes are related by a rule. A hierarchy has at least two levels and is bi-directional.

geography

3. A hierarchical taxonomy in content management is similar to what type of relationship? A：N-ary N元 B：Ternary 三元 C：Subtype- supertype 子类-超类 D：CRUD(create, retrieve, update, delet) E：Recursive 递归正确答案：C 你的答案：C 解析：9.1.3：2）层次分类法（Hierarchical Taxonomy）。它是一种树结构，其中节点通过规则相互关联。层次结构至少具有两个级别并且是双向的。向上移动层级会扩展类别；向下移动会细化类别。一个能够说明这一点的例子是地理信息，从所属大陆直到详细的街道地址。

8. A kind of taxonomy is a tree structure where nodes are related by a rule. this taxonomy is called A：facet B：network C：polyhierarchy D：hierarchical E：flat 正确答案：D 你的答案：D 解析：9.1.3：2）层次分类法（Hierarchical Taxonomy）。它是一种树结构，其中节点通过规则相互关联。层次结构至少具有两个级别并且是双向的。向上移动层级会扩展类别：向下移动会细化类别。一个能够说明这一点的例子是地理信息，从所属大陆直到详细的街道地址。

28. An enterprise's organization chart has multiple levels each with a single reporting line this is an example of a: A：hierarchical taxonomy. B：flat taxonomy. C：hybrid taxonomy. D：compound taxonomy. E：ecological taxonomy. 正确答案：A 你的答案：A 解析：9.1.3：2）层次分类法（Hierarchical Taxonomy）。它是一种树结构，其中节点通过规则相互关联。层次结构至少具有两个级别并且是双向的。向上移动层级会扩展类别：向下移动会细化类别。一个能够说明这一点的例子是地理信息，从所属大陆直到详细的街道地址。

3. A polyhierarchy 多重层级结构 is a tree-like structure with more than one node relation rule. Child nodes may have multiple parents. Those parents may also share grandparents.

11. A kind of taxonomy is a tree-like structure with more than one node relation rule. child nodes may have multiple parents this taxonomy is called A：facet B：network C：polyhierarchy D：hierarchical E：flat 正确答案：C 你的答案：C 解析：9.1.3：3）多重层级结构（Polyhierarchy）。它是具有多个节点关系规则的树状结构。子节点可能有多个父节点，父节点也可以共用一个祖父节点。因此，遍历路径可能会很复杂，所以必须注意避免潜在的无效遍历：从与父节点相关的节点开始向上遍历而非祖父节点。然而，复杂的多重层级结构可能更适合面分类法。

4. A facet taxonomy 面分类法（刻面） looks like a star where each node is associated with the center node. Facets areattributes of the object in the center.

Metadata, where each attribute (creator, title,access rights, keywords, version, etc.)

6. A kind of taxonomy looks like a star where each node is associated with the center node. this taxonomy is called A：facet B：network C：polyhierarchy D：hierarchical E：flat 正确答案：A 你的答案：B 解析：9.1.3. 4）面分类法（Facet Taxonomy）。它指的是每个节点与中心节点相关联，其形状看起来像星形国。每个面是中心对象的一个属性。这里的例子是元数据，其中每个属性（创建者、标题、访问权限、关键字、版本等）是内容对象的一个面。

5. A network taxonomy 网络分类法 uses both hierarchical and facet structures. Any two nodes in network taxonomyestablish linkages based on their associations.

recommender engine 推荐引擎

thesaurus 主题词表

2. Network taxonomy organizes content into ___ categories A：hierarchical 分层 and facet 分面 B：semantic and ontological C：flat and facet D：ontological and hierarchical E：All 正确答案：A 你的答案：A 解析：9.1.3. 5）网状分类法（Network Taxonomy）。既可用于层级结构，也可用于刻面结构。网状分类中的任何两个节点都基于它们的关联来建立链接，其中一个例子就是推荐引擎（如果你喜欢那个，你可能也会喜欢这个）；另一个例子是主题词表。

12. A kind of taxonomy uses both hierarchical and facet structures. this taxonomy is called A：facet B：network C：polyhierarchy D：hierarchical E：flat 正确答案：B 你的答案：B 解析：9.1.3. 5）网状分类法（Network Taxonomy）。既可用于层级结构，也可用于刻面结构。网状分类中的任何两个节点都基于它们的关联来建立链接，其中一个例子就是推荐引擎（如果你喜欢那个，你可能也会喜欢这个）；另一个例子是主题词表。

Taxonomies are particularly important for presenting and finding information on websites, as many search engines rely on exact word matches and can only find items tagged or using the same words in the same way.

8. Classification Schemes and Tagging 分类方案和打标签

Classification schemes are codes that represent controlled vocabulary.

Dewey Decimal System 杜威十进制分类法 and the US Library of Congress Classification 美国国会图书馆分类

Folksonomies 大众分类法 are classification schemes for online content terms and names obtained through social tagging 社交标签.

Folksonomies are not usually considered authoritative or applied to document indexing because experts do not compile them.

they offer the potential to enhance information retrieval. Folksonomy terms can be linked to structured controlled vocabularies.

1. All of the following techniques can be used to facilitate automated retrieval of structured and unstructured data EXCEPT A：classification of unstructured data in a taxonomy. B：Social book marking (adding tags to websites) C：implementation of tabular design to provide easy access. D：development of a meta-data layer to provide a common access method E：all 正确答案：B 你的答案：C 解析： 9.1.3题解：大众分类法是通过社交标签对在线内容术语和名称分类的方案。个人用户和团体使用它们来注释和分类数字内容。它们通常没有层次结构或优选术语。对已有术语分类，并不服务检索

14. Data quality includes all of the following EXCEPT A：data quality requirements B：timeliness C：data quality rules. D：data classification E：Data Accuracy 正确答案：D 你的答案：B 解析：9.1.3：在内容管理中，分类法是一种命名结构，包含用于概述主题、启用导航和搜索系统的受控词表。分类法有助于减少歧义并控制同义词。层次分类法包含了对素引者和搜奈者都有帮助的多种类型的父/子关系。

9. Thesauri 主题词表

A thesaurus is type of controlled vocabulary used for content retrieval 内容检索. It combines characteristics of synonym lists and taxonomies.

A thesaurus provides information about each term and its relationship to other terms.

23. A term is type of controlled vocabulary used for content retrieval. lt combines characteristics of synonym lists and taxonomies. this term is called A：Taxonomies B：Data Map C：Ontology D：thesaurus E：Records 正确答案：D 你的答案：C 解析：9.1.3：（8）主题词表（Thesauri）又称叙词表，是一种用于内容检索的受控词表。它结合了同义词列表和分类方案的特征。主题词表提供相关的每个术语及其与其他术语的关系信息。

Thesauri can be used to organize unstructured content, uncover relationships between content from different media, improve website navigation, and optimize search.

Standards that provide guidance on creating thesauri include ISO 25964 and ANSI/NISO Z39.19. 10.2.2.1.5 Ontologies.

10. Ontology 本体

An ontology is a type of taxonomy that represents a set of concepts and their relationships within a domain.

21. A term is a type of taxonomy that represents a set of concepts and their relationships within a domain this term is called A：Taxonomies B：Data Map C：Ontology D：thesaurus E：Records 正确答案：C 你的答案：B 解析：9.1.3：（9）本体（Ontology）是一种分类法，它代表一套概念和它们在某个领域内概念之间的关联。

Ontology languages such as Resource Description Framework Schema (RDFS) are used to develop ontologies by encoding the knowledge about specific domains

They may include reasoning rules to support processing of that knowledge. OWL (Web Ontology Language), an extension to RDFS, is a formal syntax for defining ontologies.

Ontologies describe classes (concepts), individuals (instances), attributes, relations, and events.

There are two key differences between a taxonomy (like a data model) and an ontology:

1. A taxonomy provides data content classifications for a given concept area. A data model specifically calls out the entity to which an attribute belongs and the valid for that attribute. In an ontology, though,entity, attribute, and content concepts can be completely mixed. Differences are identified throughMetadata or other relationships.

2. In a taxonomy or data model, what is defined is what is known – and nothing else. This is referred toas a closed-world assumption 封闭世界假设. In an ontology, possible relationships are inferred based on the nature of existing relationships, so something that is not explicitly declared can be true. This is referred to as the open-world assumption 开放世界假设.

While taxonomy management evolved under the Library Sciences, today the art and science of taxonomy and ontology management fall under the semantics management space 纳入语义管理领域.

common pitfalls 常见陷阱

1. Failure to distinguish between an instance-of relationship and a subclass-of relationship

2. Modeling events as relations

3. Lack of clarity and uniqueness of terms

4. Modeling roles as classes

5. Failure to reuse

6. Mixing semantics of modeling language and concepts

7. Use of a web-based, platform-independent tool (e.g., OOPS!) for ontology validation helps withdiagnosis and repair of pitfalls

1.4.3. Documents and Records 文件和档案

1. Documents 文件 are electronic or paper objects that contain instructions for tasks, requirements for how and when to perform a task or function, and logs of task execution and decisions.

Documents can communicate and share information and knowledge. Examples of documents include procedures, protocols, methods, and specifications.

2. Records 档案 provide evidence that actions were taken and decisions were made in keeping with procedures; they can serve as evidence of the organization’s business activities and regulatory compliance.

Only a subset of documents will be designated as records.

16. which statement best describes the relationship between documents and records? A：Documents are a sub-set of records B：Documents and records are not related C：Documents and records are the same thing D：Documents are written and records are audio E：Records are a sub-set of documents 正确答案：E 你的答案：E 解析： 9.1.3.3 只有部分文件才能称为档案

3. Document Management 文件管理

Document management encompasses the processes, techniques, and technologies for controlling and organizing documents and records throughout their lifecycle.

In general, document management concerns files, with little attention to file content. The information content within a file may guide how to manage that file, but document management treats the file as a single entity.

Managing the lifecycle of documents and records includes:

1. Inventory 编目: Identification of existing and newly created documents / records.

2. Policy 制度: Creation, approval, and enforcement of documents / records policies, including a document /records retention policy.

3. Classification 分类 of documents / records.

4. Storage 存储: Short- and long-term storage of physical and electronic documents / records.

5. Retrieval and Circulation 检索和流转: Allowing access to and circulation of documents / records in accordancewith policies, security and control standards, and legal requirements.

6. Preservation and Disposal 保存和处置: Archiving and destroying documents / records according toorganizational needs, statutes, and regulations.

7. 20. which of these describes activities in the document/record management lifecycle? A："Storage, disposal, managing access" B："Identification，management of policies, classification，retention，storage，retrieval and circulation，preservation and disposal" C："Acquisition，editing storage，printing, backup, disposal D："Acquisition classification, storage，purging" E："Encryption，backup，disposal, extraction" 正确答案：B 你的答案：B 解析：9.1.3. ：管理文件和档案的生命周期包括：1）编目。识别已有的和新建的文件/档案。2）制度。文件/档案制度的创建、批准和实施，包括文件/档案的保管制度。3）分类。文件/档案的分类。4）存储。纸质和电子文件/档案的短期和长期存储。5）检索和流转。在遵守制度、安全、控制标准和法律的情况下，允许文件/档案的访问和流通。6）保存和处置。在遵守组织需求、规章和法规的情况下，对文件/档案进行归档和销毁。

Documents are often developed within a hierarchy with some documents more detailed than others are.

ISO 9001-4.2

4. Records Management 档案管理

Document management includes records management. Managing records has special requirements

A Vital Record is type a record required to resume an organization’s operations the event of a disaster.

Well-prepared records have characteristics such as:

1. Content 内容: Content must be accurate, complete and truthful.

2. Context 背景: Descriptive information (Metadata) about the record’s creator, date of creation, orrelationship to other records should be collected, structured and maintained with the record at the timeof record creation.

3. Timeliness 及时性: A record should be created promptly after the event, action or decision occurs.

4. Permanency 永久性: Once they are designated as records, records cannot be changed for the legal length oftheir existence.

5. Structure 结构: The appearance and arrangement of a record’s content should be clear. They should berecorded on the correct forms or templates. Content should be legible, terminology should be usedconsistently.

Many records exist in both electronic and paper formats. Records Management requires the organization to know which copy (electronic or paper) is the official ‘copy of record’ to meet record keeping obligations. Once the copy of record is determined, the other copy can be safely destroyed.

5. Digital Asset Management 数字资产管理

Digital Asset Management (DAM) is process similar to document management that focuses on the storage, tracking and use of rich media documents like video, logos, photographs, etc.

1.4.4. Data Map 数据地图

1. A Data Map is an inventory of all ESI 电子方式存储 data sources, applications, and IT environments that includes the owners of the applications, custodians, relevant geographical locations, and data types.

1.4.5. E-discovery 电子取证

1. Discovery is a legal term that refers to pre-trial phase of a lawsuit where both parties request information from each other to find facts for the case and to see how strong the arguments are on either side.

Examples include the UK Bribery Act 英国反贿赂法, Dodd-Frank Act 多德-弗兰克法案, Foreign Account Tax Compliance Act (FATCA) 外国账户税收合规法案, Foreign Corrupt Practices Act 反海外腐败法, EU Data Protection Regulations and Rules 欧盟数据保护条例和细则, global anti-trust regulations 全球反垄断的法案, sector-specific regulations 特定行业法规, and local court procedural rules.

30. A legal term is a legal term that refers to pre-trial 预审 phase of a lawsuit where both parties request information from each other to find facts for the case and to see how strong the arguments are on either side this term is called A：Taxonomies B：GDPR C：Discovery 电子取证 D：FRCP美国联邦民事诉讼规则 E：EDRM 正确答案：C 你的答案：C 解析：9.1.3：电子取证“取证”（Discovery）是一个法律术语，指诉讼的预审阶段，双方当事人互相要求对方提供信息，以查明案件事实，并了解双方的论点有多强。

2. the legal hold notification (LHN) process 法律保留通知

LHN includes identifying information that may be requested in a legal proceeding, locking that data or document down to prevent editing or deletion, and then notifying all parties in an organization that the data or document in question is subject to a legal hold.

3. a high-level Electronic Discovery Reference Model developed by EDRM, a standards and guidelines organization for e-discovery 电子取证和标准组织

The EDRM model assumes that data or information governance is in place.

The model includes eight e-discovery phases that can be iterative.

1. The first phase, Identification 辨认阶段, has two sub-phases:

Early Case Assessment 早期案例评估

In Early Case Assessment, the legal case itself is assessed for pertinent information, called descriptive information or Metadata

Early Data Assessment 早期数据评估

In Early Data Assessment, the types and location of data relevant to the case is assessed.

2. Preservation 保全阶段 ensures that the data that has been identified as potentially relevant is placed in a legal hold so it is not destroyed.

3. Collection 收集阶段 includes the acquisition and transfer of identified data from the company to their legal counsel in a legally defensible manner.

4. During the Processing 处理阶段 phase data is de-duplicated, searched, and analyzed to determine which data items will move forward to the Review phase

5. In the Review 审查阶段 phase, documents are identified to be presented in response to the request.

6. Processing and Review depend on analysis 分析阶段, but Analysis is called out as a separate phase with a focus on content.

7. In the Production phase 生成阶段, data and information are turned over to opposing counsel, based on agreed-to specifications.

Native production 原生产品 retains the original format of the files.

Near-native production 近原生产品 alters the original format through extraction and conversion.

8. Displaying the ESI at depositions, hearings, and trials is part of the Presentation 演示阶段 phase.

1.4.6. Information Architecture 信息架构

1. Information Architecture is the process of creating structure for a body of information or content.

2. It includes the following components:

1. Controlled vocabularies 受控词表

2. Taxonomies and ontologies 分类法和本体

3. Navigation maps 导航地图

4. Metadata maps 元数据映射

5. Search functionality specifications 搜索功能规格

6. Use cases 用例

7. User flows 用户流

3. The information architecture and the content strategy together describe the ‘what’ – what content will be managed in a system. The design phases describe ‘how’ the content management strategy will be implemented.

4. For a document or content management system, the information architecture identifies the links and relationships between documents and content, specifies document requirements and attributes, and defines the structure of content in a document or content management system.

1.4.7. Search Engine 搜索引擎

1. A search engine is software that searches for information based on terms and retrieves websites that have those terms within their content.

Search functionality requires several components: search engine software proper, spider software that roams the Web 漫游网络的爬虫 and stores the Uniform Resource Locators (URLs) of the content it finds, indexing of the encountered keywords and text, and rules for ranking.

1.4.8. Semantic Model 语义模型

1. Semantic modeling is a type of knowledge modeling that describes a network of concepts (ideas or topics of concern) and their relationships.

Semantic objects 语义对象 are things represented in the model.

Bindings 语义约束 represent associations or association classes in UML.

1.4.9. Semantic Search 语义搜索

1. Semantic searching focuses on meaning and context rather than predetermined 预先设定 keywords.

2. Requirements for semantic search involve figuring out what users want which means thinking like the users.

3. Users of Business Intelligence (BI) and analytics tools often have semantic search requirements.

1.4.10. Unstructured Data 非结构化数据

1. It is estimated that as much as 80% of all stored data is maintained outside of relational databases

2. Unstructured data is found in various electronic formats: word processing documents, electronic mail, social media, chats, flat files, spreadsheets, XML files, transactional messages, reports, graphics, digital images, microfiche, video recordings, and audio recordings. An enormous amount of unstructured data also exists in paper files.

3. The fundamental principles of data management apply to both structured and unstructured data.

Storage, integrity, security, content quality, access, and effective use guide the management of unstructured data.

Unstructured data requires data governance, architecture, security Metadata, and data quality.

4. Unstructured and semi-structured data have become more important to data warehousing and Business Intelligence.

1.4.11. Workflow 工作流

1. Content development should be managed through a workflow that ensures content is created on schedule and receives proper approvals.

25. The addition of workflow to a content management system will : A：enable the controlled review and approval of documents B：restructure an enterprise glossary. C：enforce the controlled review and approval of database designs D：implement a data warehouse landing zone E：allow the approval of system access requests 正确答案：A 你的答案：C 解析：9.1.3：11.工作流应该通过一个工作流（Workflow）管理内容开发，以确保内容按时创建并获得适当的批准。工作流组件可以包括创建、处理、路由、规则、管理、安全性、电子签名、截止日期、升级（如果出现问题）、报告和交付等过程。

2. Workflow components can include the creation, processing, routing, rules, administration, security, electronic signature, deadline, escalation (if problems occur), reporting and delivery.

3. should be automa ted through the use of a content management system (CMS) 内容管理系系统 or a standalone system, rather than manual processes.

The workflow needs to be repeatable, ideally containing process steps common across a variety of content.

Alignment 对齐 of the stakeholders and distribution points (including technology) is important.

2. Activities

2.1. Plan for Lifecycle Management 规划生命周期管理

2.1.1. The practice of document management involves planning for a document’s lifecycle, from its creation or receipt, through its distribution, storage, retrieval, archiving and potential destruction.

First, identify the organizational unit responsible for managing the documents and records.

Then, It also develops an overall document management plan that includes a business continuity plan for vital documents and records.

Plan for Records Management 规划档案管理

Records management starts with a clear definition of what constitutes a record.

Managing electronic records requires decisions about where to store current, active records and how to archive older records.

Develop a Content Strategy 制定内容策略

Planning for content management should directly support the organization’s approach to providing relevant and useful content in an efficient and comprehensive manner.

A plan should account for content drivers (the reasons content is needed), content creation and delivery.

A content strategy should start with an inventory of current state and a gap assessment.

2.1.2. Create Content Handling Policies 创建内容处理制度

Policies codify requirements by describing principles, direction, and guidelines for action. They help employees understand and comply with the requirements for document and records management.

Most document management programs have policies related to:

1. Scope and compliance with audits

2. Identification and protection of vital records

3. Purpose and schedule for retaining records (a.k.a retention schedule)

4. How to respond to information hold orders (special protection orders); these are requirements forretaining information for a lawsuit, even if retention schedules have expired

5. Requirements for onsite and offsite storage of records

6. Use and maintenance of hard drive and shared network drives

7. Email management, addressed from content management perspective

8. Proper destruction methods for records (e.g., with pre-approved vendors and receipt of destructioncertificates)

Social Media Policies 社交媒体制度

Many organizations are developing policies to respond to new media.

especially if employees post in the course of conducting business using organizational accounts.

5. An organization's social media policy, if implemented. MUST map to its A：corporate governance B：employee security policy. C：mission statement. D：human resources policy. E：none 正确答案：D 你的答案：B 解析：9.2.2：和人力资源相关。（1）社交媒体制度（Policy）除了这些常规的话题外，许多组织正在制定应对新媒体的制度。例如，组织必须明确在Facebook、Twitter、Linkedln、聊天室、博客、维基或在线论坛上发布的社交媒体内容是否构成档案？特别是员工在使用组织账户开展业务的过程中发布的内容。

Device Access Policies 设备访问制度

Since the pendulum is swinging towards user driven IT with BYOD (bring-your-own-devices), BYOA (bring-your-own-apps), and WYOD (wear-your-own-devices), the content and records management functions need to work with these scenarios in order to ensure compliance, security and privacy.

put controls 进行控制 on formal content.

provide guidance 提供指导 on informal content.

Handling Sensitive Data 处理敏感数据

Organizations are legally required to protect privacy by identifying and protecting sensitive data.

Responding to Litigation 应对诉讼

Organizations should prepare for the possibility of litigation requests through proactive e-discovery. (Hope for the best; prepare for the worst.)

2.1.3. Define Content Information Architecture 定义内容信息架构

Searches use either content-based indexing or Metadata. Indexing designs look at decision options for key aspects or attributes of indexes based on needs and preferences of users. They also look at the vocabulary management and the syntax for combining individual terms into headings or search statements.

2.2. Manage the Lifecycle 实施生命周期管理

2.2.1. Capture Records and Content 获取档案和内容

Capturing content is the first step to managing it. Electronic content is often already in a format to be stored in electronic repositories. To reduce the risk of losing or damaging records, paper content needs to be scanned and then uploaded to the corporate system, indexed, and stored in the repository. Use electronic signatures if possible.

Some social media platforms offer the capability of capturing records

Web crawlers can capture versions of websites

Web capture tools, application-programming interfaces (APIs), and RSS feeds can capture content or social media export tools.

media records can also be captured manually or via predefined, automated workflows.

When content is captured, it should be tagged (indexed) with appropriate Metadata, such as (at minimum) a document or image identifier, the data and time of capture, the title and author(s). Metadata is necessary for retrieval of the information, as well as for understanding the context of the content. Automated workflows and recognition technologies can help with the capture and ingestion process, providing audit trails.

2.2.2. Manage Versioning and Control 管理版本的控制

ANSI Standard 859 has three levels of control of data, based on the criticality of the data and the perceived harm that would occur if data were corrupted or otherwise unavailable: formal, revision, and custody:

1. Formal control 正式控制 requires formal change initiation, thorough evaluation for impact, decision by achange authority, and full status accounting of implementation and validation to stakeholders

2. Revision control 修订控制 is less formal, notifying stakeholders and incrementing versions when a change isrequired

3. Custody control 托管控制 is the least formal, merely requiring safe storage and a means of retrieval

ANSI 859 recommends taking into account the following criteria when determining which control level applies to a data asset:

1. Cost of providing and updating the asset

2. Project impact, if changes will have significant cost or schedule consequences

3. Other consequences of change to the enterprise or project

4. Need to reuse the asset or earlier versions of the asset

5. Maintenance of a history of change (when required by the enterprise or the project)

2.2.3. Backup and Recovery 备份和恢复

The document / record management system needs to be included in the organization’s overall corporate backup and recovery activities, including business continuity and disaster recovery planning.

Vital 重要 records must be identified, and plans for their protection and recovery must be developed and maintained.

A Business Continuity Plan 业务连续性计划 (or Disaster Recovery Plan 灾难恢复计划) contains written policies, procedures, and information designed to mitigate the impact of threats to an organization’s data, including documents, and to recover them as quickly as possible, with minimum disruption, in the event of a disaster.

7. "When defining your business continuity plan，which of the following should one consider doing? " A：Write a report and discuss with management the required budget B："Determine the risk，probability and impact，check document backup frequency" C："Have the contracts in place to acquire new hardware in case of technical problems，define policies" D：Make sure that the data is retained sufficiently long check that critical data is encrypted check access rights E：Consider written policies and procedures，impact mitigating measures，required recovery time and acceptable amount of disruption，the criticality of the documents 正确答案：E 你的答案：B 解析：9.2.4. 业务连续性计划（或灾难恢复计划）包含书面制度、程序和信息，旨在缓解对组织数据（包括文件）威胁的影响。在发生灾难时，尽快恢复这些数据，同时尽量减少中断。

2.2.4. Manage Retention and Disposal 管理保管和处置

Effective document / records management requires clear policies and procedures, especially regarding retention and disposal of records.

Non-value-added information should be removed from the organization’s holdings and disposed of to avoid wasting physical and electronic space, as well as the cost associated with its maintenance. There is also risk associated with retaining records past their legally required timeframes. This information remains discoverable for litigation.

Still, many organizations do not prioritize removal of non-value added information because:

1. Policies are not adequate

2. One person’s non-valued-added information is another’s valued information

3. Inability to foresee future possible needs for current non-value-added physical and / or electronicrecords

4. There is no buy-in for Records Management

5. Inability to decide which records to delete

6. Perceived cost of making a decision and removing physical and electronic records

7. Electronic space is cheap. Buying more space when required is easier than archiving and removalprocesses

8. 27. which of the following is a reason why organizations do not dispose 丢掉 of non-value-adding information? A：Storage is cheap and easily expanded B：The information is never out of date. C：The organization's data quality benchmark diminishes D：Data modelling the content is hard to reproduce E：the metadata repository can not be updated 正确答案：A 你的答案：A 解析：9.2.4. 题解：无附加值的信息应该从组织的资产中移除并处理掉，以避免浪费实体和电子空间以及与其维护相关的成本。超过法定时限保留档案也是存在风险的，在诉讼时仍能发现这些信息。然而，许多组织并没有优先删除无附加值的信息，这是因为：1）制度不适用。2）对某一个人来说是无附加值信息，但对另一个人来说却是有价值的信息。3）无法预见当前的无附加值实体和/或电子档案未来可能的需求。4）对档案管理的不认可。5）无法决定删除哪些档案。6）做决定与移除实体和电子档案的感知成本。7）电子空间很便宜，购买更多的空间比归档和移除过程更容易。

2.2.5. Audit Documents / Records 审计文件/档案

Document / records management requires periodic auditing to ensure that the right information is getting to the right people at the right time for decision-making or performing operational activities.

An audit usually involves the following steps:

1. Defining organizational drivers and identifying the stakeholders that comprise the ‘why’ of document /records management

2. Gathering data on the process (the ‘how’), once it is determined what to examine / measure and whattools to use (such as standards, benchmarks, interview surveys)

3. Reporting the outcomes

4. Developing an action plan of next steps and timeframes

2.2.6. 18. Which of the following is not a step in the 'document and content management lifecycle" A：Create a content strategy. B：Audit documents and records C：Capture records and content D：Manage retention and disposal E：Manage versions and control 正确答案：A 你的答案：A 解析：9.2.2.实施生命周期管理的活动包括：Capture Records and Content 获取档案和内容；Manage Versioning and Control 管理版本的控制；Backup and Recovery 备份和恢复；Manage Retention and Disposal 管理保管和处置；Audit Documents/Records 审计文件/档案

2.3. Publish and Deliver Content 发布和分发内容

2.3.1. Provide Access, Search, and Retrieval 开放访问、搜索和检索

Once the content has been described by Metadata / key word tagging and classified within the appropriate information content architecture, it is available for retrieval and use.

2.3.2. Deliver Through Acceptable Channels 通过可接受的渠道分发

There is the potential that any changed content may need to be brought back into the original format.

When structured data from databases is formatted into HTML, it becomes difficult to recover the original structured data, as separating the data from the formatting is not always straightforward.

3. Tools

3.1. Enterprise Content Management Systems 企业内容管理系统

3.1.1. An ECM may consist of a platform of core components or a set of applications that can be integrated wholly or used separately.

3.1.2. Document Management 文件管理

A document management system is an application used to track and store electronic documents and electronic images of paper documents.

Document management systems commonly provide storage, versioning, security, Metadata Management, content indexing, and retrieval capabilities.

Digital Asset Management 数字资产管理

Image Processing 图像处理系统

Records Management System 档案管理系统

24. A document management system is an application used to track and store electronic documents and electronic images of paper documents which provides the following capabilities: A："Local disk storage and indexing of documents" B："storage，versioning, security meta-data management，indexing and retrieval" C："Scanning and transcoding of documents D："Wiki，collaboration online editing" E："securing forwarding of documents to colleagues never having to dispose of documents“ 正确答案：B 你的答案：B 解析：9.3.1：1.文件管理文件管理系统是用于跟踪和存储电子文件和纸质文件的电子影像的应用程序。文件库系统、电子邮件系统和影像管理系统是专门的文件管理系统，文件管理系统通常具有存储、版本控制、安全性、元数据管理、内容素引和检索功能。某些系统的扩展功能可以包括文件的元数据视图。

3.1.3. Content Management System 内容管理系统

A content management system is used to collect, organize, index, and retrieve content, storing it either as components or whole documents, while maintaining links between components.

3.1.4. Content and Document Workflow 内容和文件工作流

Workflow tools support business processes, route content and documents, assign work tasks, track status, and create audit trails. A workflow provides for review and approval of content before it is published.

3.2. Collaboration Tools 协作工具

3.2.1. Team collaboration tools enable the collection, storage, workflow, and management of documents pertinent to team activities.

Social networking enables individual and teams to share documents and content inside the team and to reach out to an external group for input using blogs, wikis, RSS, and tagging.

3.3. Controlled Vocabulary and Metadata Tools 受控词汇表和元数据工具

3.3.1. Tools that help develop or manage controlled vocabularies and Metadata range from office productivity software, Metadata repositories, and BI tools, to document and content management systems. For example:

1. Data models used as guides to the data in an organization

2. Document management systems and office productivity software

3. Metadata repositories, glossaries, or directories

4. Taxonomies and cross-reference schemes between taxonomies

5. Indexes to collections (e.g., particular product, market or installation), filesystems, opinion polls,archives, locations, or offsite holdings

6. Search engines

7. BI tools that incorporate unstructured data

8. Enterprise and departmental thesauri

9. Published reports libraries, contents and bibliographies, and catalogs

3.4. Standard Markup and Exchange Formats 标准标记和交换格式

3.4.1. Computer applications cannot process unstructured data / content directly. Standard markup and exchange formats facilitate the sharing of data across information systems and the Internet.

3.4.2. XML (Extensible Markup Language) 可扩展标示语言

Extensible Markup Language (XML) provides a language for representing both structured and unstructured data and information. XML uses Metadata to describe the content, structure, and business rules of any document or database.

XML requires translating the structure of the data into a document structure for data exchange. XML tags data elements to identify the meaning of the data. Simple nesting and references provide the relationships between data elements.

XML namespaces provide a method to avoid a name conflict when two different documents use the same element names. Older methods of markup include HTML and SGML, to name a few.

The need for XML-capable content management has grown for several reasons:

1. XML provides the capability of integrating structured data into relational databases with unstructureddata.

2. XML can integrate structured data with unstructured data in documents, reports, email, images,graphics, audio, and video files. Data

3. XML also can build enterprise or corporate portals, (Business-to-Business [B2B], Business-to-Customer [B2C]), which provide users with a single access point to a variety of content.

4. XML provides identification and labeling of unstructured data / content so that computer applicationscan understand and process them

Extensible Markup Interface (XMI) 扩展标记接口

3.4.3. JSON (JavaScript Object Notation) 基于JavaScript语言的轻量级的数据交换格式

JSON (JavaScript Object Notation) is an open, lightweight standard format for data interchange. Its text format is language-independent and easy to parse,

a collection of unordered name / value pairs 无序名称/值配对的集合 known as objects 对象

an ordered list of values 值的有序集合 realized as an array. 数组

It is emerging as the preferred format in web-centric, NoSQL databases.

An alternative to XML, JSON is used to transmit data between a server and web application

3.4.4. RDF (Resource Description Framework) 资源描述框架 and Related W3C Specifications 相关万维网规范

Resource Description Framework (RDF), a common framework used to describe information about any Web resource, is a standard model for data interchange on the Web.

The RDF resources are saved in a triplestore

which is a database used to store and retrieve semantic queries using SPARQL.

RDF makes statements about a resource in the form of subject (resource)-predicate (property name)-object (property value) expressions or triples.

Usually the subject-predicate-object is each described by a URI (Uniform Resource Identifier), 统一资源标识符

SKOS (Simple Knowledge Organization System 简单知识组织系统) is a RDF vocabulary

Any type of classification, taxonomy, or thesaurus can be represented in SKOS.

OWL (W3C Web Ontology Language 网络本体语言) is a vocabulary extension of RDF.

RDF can help with the ‘variety’ characteristic of Big Data.

Both RDF and OWL are Semantic Web standards 语义网标准

3.4.5. Schema.org

Schema.org provides a collection of shared vocabularies or schemas for on-page markup so that the major search engines can understand them

Snippets are the text that appears under every search result. Rich snippets are the detailed information on specific searches (e.g., gold star ratings under the link).

3.5. E-discovery Technology

3.5.1. E-discovery often involves review of large volumes of documents.

capabilities and techniques such as early case assessment, collection, identification, preservation, processing, optical character recognition (OCR) 光学字符识别, culling 剔除, similarity analysis, and email thread analysis.

Technology-assisted review (TAR) 技术辅助审查 is a workflow or process where a team can review selected documents and mark them relevant or not.

4. Techniques

4.1. Litigation Response Playbook 诉讼应诉手册

4.1.1. E-discovery starts at the beginning of a lawsuit. However, an organization can plan for litigation response through the development of a playbook containing objectives, metrics and responsibilities before a major discovery project begins.

4.1.2. The playbook defines the target environment for e-discovery and assesses if gaps exist between current and target environments. It documents business processes for the lifecycle of e-discovery activities and identifies roles and responsibilities of the e-discovery team. A playbook can also enable an organization to identify risks and proactively prevent situations that might result in litigation.

1. Establish an inventory of policies and procedures for specific departments (Legal, RecordsManagement, IT).

2. Draft policies for topics, such as litigation holds, document retention, archiving, and backups.

3. Evaluate IT tool capabilities such as e-discovery indexing, search and collection, data segregation andprotection tools as well as the unstructured ESI sources / systems.

4. Identify and analyze pertinent legal issues.

5. Develop a communication and training plan to train employees on what is expected.

6. Identify materials that may be prepared in advance for tailoring to a legal case.

7. Analyze vendor services in case outside services are required.

8. Develop processes on how to handle a notification and keep the playbook current.

4.2. Litigation Response Data Map 诉讼应诉数据映射

4.2.1. data map is a catalog of information systems. It describes the systems and their uses, the information they contain, retention policies, and other characteristics.

E-discovery often has a limited timeframe (e.g., 90 days). Providing attorneys with a data map of the IT and ESI environment available can enable an organization to respond more effectively.

4.2.2. An e-discovery data map should indicate which records are readily accessible and which are not.

The inaccessible data needs to be identified and the reasons why it is inaccessible need to be documented.

5. Implementation Guidelines

5.1. Implementing ECM is a long-term effort that can be perceived as expensive

5.1.1. To minimize risks, ensure that the content, not the technology, drives decisions for ECM implementation.

5.1.2. Configure the workflow around the organizational needs to show value.

5.2. Readiness Assessment / Risk Assessment 就绪评估/风险评估

5.2.1. The purpose of an ECM readiness assessment is to identify areas where content management improvement is needed and to determine how well adapted the organization is to changing its processes to meet these needs.

5.2.2. Risks can arise with ECM implementations due to project size, complexity in integrating with other software applications, process and organizational issues, and the effort required to migrate content

Other risks include failure to put policies, processes, and procedures in place or lack of communication with stakeholders.

5.2.3. Records Management Maturity 档案管理成熟度

Information Governance Maturity Model 信息治理成熟度模型 describes the characteristics of the information governance and recordkeeping environment at five levels of maturity for each of the eight GARP principles:

1. Level 1 Sub-Standard 低于标准的

2. Level 2 In Development 发展中的

3. Level 3 Essential 基本的

4. Level 4 Proactive 积极的

5. Level 5 Transformational 完成变革的

Several standards can be applied for technical assessments of records management systems and applications. For example,

1. DoD 5015.2 Electronic Records Management Software Applications Design Criteria Standard

2. ISO 16175, Principles and Functional Requirements for Records in Electronic Office Environments

3. The Model Requirements for the Management of Electronic Records (MoReq2)

4. The Records Management Services (RMS) specification from the Object Management Group (OMG)

5.2.4. E-discovery Assessment 电子取证评估

A readiness assessment should examine and identify improvement opportunities for the litigation response program.

The risks of not having defined a proactive litigation response should be assessed and quantified.

5.3. Organization and Cultural Change 组织和文化变革

5.3.1. People can be a greater challenge than the technology.

5.3.2. Often organizations manage information, including records, departmentally, creating information silos 信息孤岛 that hinder the sharing and proper management of data.

5.3.3. Both content and records management need to be elevated organizationally,

the Records and Information Management (RIM) 档案与信息管理

aligned with the corporate legal function along with the e-discovery function

aligned with marketing or an operational support group to improve operational efficiency

6. Documents and Content Governance

6.1. Information Governance Frameworks 信息治理架构

6.1.1. Documents, records, and other unstructured content represent risk to an organization. Managing this risk and getting value from this information both require governance. Drivers include:

1. Legal and regulatory compliance

2. Defensible disposition of records

3. Proactive preparation for e-discovery

4. Security of sensitive information

5. Management of risk areas such as email and Big Data

6.1.2. One set of principles is the ARMA GARP® principles (see Section 1.2). Other principles include:

1. Assign executive sponsorship for accountability

2. Educate employees on information governance responsibilities

3. Classify information under the correct record code or taxonomy category

4. Ensure authenticity and integrity of information

5. Determine that the official record is electronic unless specified differently

6. Develop policies for alignment of business systems and third-parties to information governancestandards

7. Store, manage, make accessible, monitor, and audit approved enterprise repositories and systems forrecords and content

8. Secure confidential or personally identifiable information

9. Control unnecessary growth of information

10. Dispose information when it reaches the end of its lifecycle

11. Comply with requests for information (e.g., discovery, subpoena, etc.)

12. Improve continuously

6.1.3. The Information Governance Reference Model (IGRM)信息治理参考模型 shows the relationship of Information Governance to other organizational functions.

The outer ring includes the stakeholders who put policies, standards, processes, tools and infrastructure in place to manage information

The center shows a lifecycle diagram with each lifecycle component within the color or colors of the stakeholder(s) who executes that component.

The IGRM complements ARMA’s GARP®.

6.2. Proliferation of Information 信息的激增

6.2.1. unstructured data grows much faster than structured data.

ownership can be difficult to ascertain. 所有权

difficult to classify the content 分类

purpose of the content cannot always be inferred 目的

without required Metadata 元数据

misinterpreted and, if content is not known 曲解

mishandled or present privacy concerns. 错误处理

6.3. Govern for Quality Content 管理高质量的内容

6.3.1. Managing unstructured data requires effective partnership between data stewards and other data management professionals and records managers.

13. In content management business data stewards help with all of the following EXCEPT A：performance issues B：controlled vocabularies C：enterprise taxonomies. D：communicating content issues E：all 正确答案：A 你的答案：A 解析：系统性能因素由DBA负责

6.3.2. Accurate, complete, and up-to-date information will aid in making decisions

6.3.3. High quality information improves competitive advantage and increases organizational effectiveness

6.3.4. Defining quality content requires understanding the context of its production and use.

1. Producers 生产者: Who creates the content and why do they create it?

2. Consumers 消费者: Who uses the information and for what purposes?

3. Timing 时间: When is the information needed? How frequently does it need to be updated or accessed?

4. Format 格式: Do consumers need the content in a specific format for to meet their goals? Are thereunacceptable formats?

5. Delivery 分发: How will information be delivered? How will consumers access the information? How willsecurity be enforced to prevent inappropriate access to electronic content?

6.4. Metrics 度量指标

6.4.1. Records Management 档案管理

measure success of a records management system implementation

Percentage of total documents and email per user identified as corporate records

Percentage of the identified corporate records declared as such and put under records control

Percentage of total stored records that have the proper retention rules applied

ARMA’s GARP principle categories and maturity model can guide the definition of KPIs.

6.4.2. E-discovery 电子取证

One common KPI of e-discovery is cost reduction

Another KPI is efficiency gained in collecting information ahead of time rather reactively

Measurement of e-discovery is critical to a better rate of litigation wins.

The EDRM model can guide development of KPIs based on what is required by each phase

ERDM also publishes a Metrics Model for e-discovery metrics. The primary elements of Volume, Time, and Cost are in the center surrounded by the seven aspects of e-discovery work (Activities, Custodians, Systems, Media, Status, Format, and QA) which affect the outcome of the center elements.

6.4.3. ECM 企业内容管理

KPIs should be developed to measure both tangible and intangible benefits of ECM

Tangible 有形的 benefits include increased productivity, cost reduction, improved information quality, and improved compliance.

Intangible 无形的 benefits include improved collaboration, and simplification of job routines and workflow.

As ECM is being established, KPIs will focus on program and operational metrics.

Program metrics 规划指标 include number of ECM projects, adoption, and user satisfaction levels

Operational metrics 运营指标 include the typical system type KPIs, such as the amount of downtime, number of users, etc.

Specific ECM metrics such as storage utilization (e.g., comparison of amount used with ECM implementation vs. amount used before ECM) and search retrieval performance can also be used as KPIs.

Over time, KPIs related to the value of business solutions can be developed.

1. Financial KPIs can include the cost of ECM system, reduced costs related to physical storage, andpercentage decrease in operational costs.

2. Customer KPIs can include percentage incidents resolved at first contact and number of customercomplaints.

3. KPIs representing more effective and productive internal business processes can include percentage ofpaperwork reduced, percentage of error reduction using workflow and process automation.

4. Training KPIs can include number of training sessions for management and non-management.

5. Risk mitigation KPIs can include reduction of discovery costs, and number of audit trails tracking e-discovery requests.

7. Works Cited / Recommended

7.1. 26. Users continue to use a shared drive 共享驱动器 instead of a new document management system This may be due to: A：onerous 繁琐 classification requirements when adding documents B：concurrent updates to the document are handled better by the shared drive. C：concern about the ability to version documents D：a failure to backup the shared drive. E：the document management system is too expensive 正确答案：A 你的答案：A 解析：文件管理内容

7.2. 31. A 'Content Distribution Network' supporting a multi-national website is likely to use A：an extract transform and load solution B：a records disposal solution C：a replication solution 复制方案 D：a database backup and restore solution E：an archiving solution 正确答案：C 你的答案：B 解析：题解：CDN（内容分发网络）描的是一组分布在各个地区的服务器。这些服务器存储着数据的副本，因此服务器可以根据哪些服务器与用户距离最近，来满足数据的请求。

DAMA DMBOK2.0全知识点总结（第7-9章 数据安全 数据集成和互操作 文件和内容管理）

DAMA DMBOK2.0全知识点总结（第7-9章 数据安全 数据集成和互操作 文件和内容管理）

DAMA DMBOK2.0全知识点总结（第7-9章数据安全数据集成和互操作文件和内容管理）

DAMA DMBOK2.0全知识点总结（第7-9章数据安全数据集成和互操作文件和内容管理）