2023年全國碩士研究生考試考研英語一試題真題(含答案詳解+作文范文)_第1頁
已閱讀1頁,還剩7頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

1、<p>  DATA WAREHOUSE</p><p>  Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large nu

2、mber of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise

3、-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is th</p><p>  “So", you may ask, full of intrigue, “what exactly is a data warehouse?"<

4、;/p><p>  Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an or

5、ganization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical

6、data for analysis.</p><p>  According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile col

7、lection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, tim

8、e-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational</p><p>  (1)Subject-oriented: A data warehouse is organized around major subjects, such as custo

9、mer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. He

10、nce, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.</p><p>  (2)Integrated: A data warehou

11、se is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure cons

12、istency in naming conventions, encoding structures, attribute measures, and so on..</p><p>  (3)Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years).

13、 Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.</p><p>  (4)Nonvolatile: A data warehouse is always a physically separate store of data transformed f

14、rom the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two

15、operations in data accessing: initial loading of data and access of data..</p><p>  In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision sup

16、port data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous s

17、ources to support structured and/or ad hoc queries, analytical reporting, and decision making.</p><p>  “OK", you now ask, “what, then, is data warehousing?"</p><p>  Based on the abov

18、e, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data wareh

19、ouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview o

20、f the data, and to make sound decisions based on</p><p>  “How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision

21、making activities, including:</p><p>  (1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spendin

22、g). </p><p>  (2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.<

23、/p><p>  (3) analyzing operations and looking for sources of profit. </p><p>  (4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.&

24、lt;/p><p>  Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multip

25、le, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database ind

26、ustry and research community towards achieving this goal.</p><p>  The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multip

27、le, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate

28、 for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned fro</p><p>  Data warehousing provides an interesting alternative to t

29、he traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneo

30、us sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a dat

31、a w</p><p>  1.Differences between operational database systems and data warehouses</p><p>  Since most people are familiar with commercial relational database systems, it is easy to understand

32、what a data warehouse is by comparing these two kinds of systems.</p><p>  The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are cal

33、led on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data ware

34、house systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and</p><p>  The major distinguishing features between OL

35、TP and OLAP are summarized as follows.</p><p>  (1)Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information techn

36、ology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.</p><p>  (2)Data contents: An OLTP system manages curr

37、ent data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages inf

38、ormation at different levels of granularity. These features make the data easier for use in informed decision making.</p><p>  (3)Database design: An OLTP system usually adopts an entity-relationship (ER) da

39、ta model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.</p><p>  (4)View: An OLTP system focuses mainl

40、y on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to th

41、e evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are

42、stored on multiple stor</p><p>  (5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms.

43、However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries. </p><p>  Other fea

44、tures which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.</p><p>  2.But, why have a separate data warehouse?</p><p&g

45、t;  “Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a s

46、eparate data warehouse?"</p><p>  A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloa

47、ds, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of la

48、rge groups of data at summarized levels, and may require the use of special data organization, access, and implementa</p><p>  Moreover, an operational database supports the concurrent processing of several

49、transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records fo

50、r summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce</p><p>

51、;  Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas opera

52、tional databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (

53、such as aggregation and summarization) of data from heterogeneo</p><p><b>  數(shù)據(jù)倉庫</b></p><p>  數(shù)據(jù)倉庫為商務運作提供了組織結(jié)構(gòu)和工具,以便系統(tǒng)地組織、理解和使用數(shù)據(jù)進行決策。許多組織發(fā)現(xiàn)在如今的具有競爭與快速發(fā)展的世界中數(shù)據(jù)倉庫是非常有用的工具。</p>

54、<p>  在最近的幾年里,許多公司花了幾百萬美元用于構(gòu)建企業(yè)數(shù)據(jù)庫。許多人也認為隨著競爭加劇,數(shù)據(jù)倉庫己成為營銷必備的手段——一種了解顧客的需求的武器。</p><p>  “那么”,你可能會充滿神秘地問,“到底什么是數(shù)據(jù)倉庫?”</p><p>  數(shù)據(jù)倉庫有不同的定義,但卻很難有一個嚴格的定義。不嚴謹?shù)恼f,數(shù)據(jù)倉庫是一個數(shù)據(jù)庫,它與組織機構(gòu)的操作數(shù)據(jù)庫分別維護。數(shù)據(jù)倉庫允許

55、不同應用系統(tǒng)的集成,為統(tǒng)一的歷史數(shù)據(jù)分析提供堅實的平臺,對信息處理提供支持。</p><p>  按照W.H Inmon,一位數(shù)據(jù)倉庫構(gòu)造方面的領頭建筑師說,“數(shù)據(jù)倉庫是一個面向主題的、集成的、隨時間變化的、非易失的數(shù)據(jù)的集合,支持管理決策制定?!边@個簡短,但是復合的定義表述了數(shù)據(jù)倉庫的主要特點。四個關鍵詞,面向主題的、集成的、時變的、非易失的,將數(shù)據(jù)倉庫與其它數(shù)據(jù)存儲系統(tǒng)相區(qū)別。讓我們進下來認識它的四個特征。&

56、lt;/p><p>  (1)面向?qū)ο螅簲?shù)據(jù)倉庫是圍繞一些主題,如顧客、供應商、產(chǎn)品和銷售組織。數(shù)據(jù)倉庫關注決策者的數(shù)據(jù)建模與分析,而不是構(gòu)造機構(gòu)日常操作和事務處理。因此,數(shù)據(jù)倉庫排除了在進程中提供的沒有價值的決策。</p><p>  (2)集成的:數(shù)據(jù)倉庫通常由多個數(shù)據(jù)源組成,如關系數(shù)據(jù)庫、一般文件和聯(lián)機事務處理記錄。數(shù)據(jù)清理和數(shù)據(jù)集成技術被運用于確保命名的合理性、代碼的結(jié)構(gòu),結(jié)構(gòu)尺度等。

57、</p><p>  (3)隨時間變化:數(shù)據(jù)被存儲是用來提供變化歷史角度的信息。數(shù)據(jù)倉庫中所包含的關鍵字,都顯性或隱性的反映時間元素。</p><p>  (4)非易失性:數(shù)據(jù)倉庫是物理地分離存放數(shù)據(jù);基于這種分法,數(shù)據(jù)倉庫不需要傳輸進程,覆蓋和并發(fā)控制機制。它通常只需要兩種數(shù)據(jù)訪問:數(shù)據(jù)的初使化裝入和數(shù)據(jù)訪問。</p><p>  總得來說,數(shù)據(jù)倉庫是一種語義上一

58、致的數(shù)據(jù)存儲,它充當了物理決策數(shù)據(jù)模型的實施關于哪種企業(yè)需要做戰(zhàn)略決策。數(shù)據(jù)倉庫經(jīng)常被認作一種結(jié)構(gòu),由集成的數(shù)據(jù)組合而成,支持結(jié)構(gòu)化和啟發(fā)式查詢、分析報告和決策制定。</p><p>  “好”,“現(xiàn)在你可以問什么是數(shù)據(jù)倉庫?!?lt;/p><p>  基于以上所講的,我們把數(shù)據(jù)倉庫視為構(gòu)造和使用數(shù)據(jù)倉庫的過程。數(shù)據(jù)倉庫的構(gòu)造需要數(shù)據(jù)集成、數(shù)據(jù)清理和數(shù)據(jù)統(tǒng)一。利用數(shù)據(jù)倉庫常常需要一些決策支持技

59、術。這使得知識工作者能夠利用數(shù)據(jù)倉庫,快捷方便地得到數(shù)據(jù)總體視圖,根據(jù)數(shù)據(jù)倉庫中的信息做出準確的決策。有些人使用術語“建立數(shù)據(jù)庫”表示構(gòu)造數(shù)據(jù)倉庫的過程,用倉庫DBMS表示管理和使用數(shù)據(jù)倉庫。我們將不區(qū)分二者。</p><p>  “組織是如何從數(shù)據(jù)倉庫中使用數(shù)據(jù)的?”許多組織使用這些信息支持決策活動,包括:</p><p>  (1)增加顧客關注,包括分析顧客購買模式(如,喜愛買什么、購

60、買時間、預算周期、消費習慣);</p><p>  (2)根據(jù)季度、年、地區(qū)的營銷情況比較,重新配置產(chǎn)品和管理投資,調(diào)整生產(chǎn)策略;</p><p>  (3)分析運作和查找利潤源;</p><p>  (4)管理顧客關系、進行環(huán)境調(diào)整、管理合股人的資產(chǎn)開銷。</p><p>  從異種數(shù)據(jù)庫集成的角度看,數(shù)據(jù)倉庫也是十分有用的。許多組織收集了

61、不同類的數(shù)據(jù),并由多個異種的、自治的、分布的數(shù)據(jù)源維護大型數(shù)據(jù)庫。集成這些數(shù)據(jù),并提供簡便、有效的訪問是非常希望的,并且也是一種挑戰(zhàn)。數(shù)據(jù)庫工業(yè)界和研究界都正朝著實現(xiàn)這一目標竭盡全力。</p><p>  對于異種數(shù)據(jù)庫的集成,傳統(tǒng)的數(shù)據(jù)庫做法是:在多個異種數(shù)據(jù)庫上,建立一個包裝程序和一個集成程序(或仲裁程序)。這方面的例子包括IBM 的數(shù)據(jù)連接程序 和Informix的數(shù)據(jù)刀。當一個查詢提交客戶站點,首先使用元

62、數(shù)據(jù)字典對查詢進行轉(zhuǎn)換,將它轉(zhuǎn)換成相應異種站點上的查詢。然后,將這些查詢映射和發(fā)送到局部查詢處理器。由不同站點返回的結(jié)果被集成為全局回答。這種查詢驅(qū)動的方法需要復雜的信息過濾和集成處理,并且與局部數(shù)據(jù)源上的處理競爭資源。這種方法是低效的,并且對于頻繁的查詢,特別是需要聚集操作的查詢,開銷很大。</p><p>  對于異種數(shù)據(jù)庫集成的傳統(tǒng)方法,數(shù)據(jù)倉庫提供了一個有趣的替代方案。數(shù)據(jù)倉庫使用更新驅(qū)動的方法,而不是查

63、詢驅(qū)動的方法。這種方法將來自多個異種源的信息預先集成,并存儲在數(shù)據(jù)倉庫中,供直接查詢和分析。與聯(lián)機事務處理數(shù)據(jù)庫不同,數(shù)據(jù)倉庫不包含最近的信息。然而,數(shù)據(jù)倉庫為集成的異種數(shù)據(jù)庫系統(tǒng)帶來了高性能,因為數(shù)據(jù)被拷貝、預處理、集成、注釋、匯總,并重新組織到一個語義一致的數(shù)據(jù)存儲中。在數(shù)據(jù)倉庫中進行的查詢處理并不影響在局部源上進行的處理。此外,數(shù)據(jù)倉庫存儲并集成歷史信息,支持復雜的查詢。這樣,建立數(shù)據(jù)倉庫在工業(yè)界就非常流行。</p>

64、<p>  1.操作數(shù)據(jù)庫系統(tǒng)與數(shù)據(jù)倉庫的區(qū)別</p><p>  由于大多數(shù)人都熟悉商品關系數(shù)據(jù)庫系統(tǒng),將數(shù)據(jù)倉庫與之比較,就容易理解什么是數(shù)據(jù)倉庫。</p><p>  聯(lián)機操作數(shù)據(jù)庫系統(tǒng)的主要任務是執(zhí)行聯(lián)機事務和查詢處理。這種系統(tǒng)稱為聯(lián)機事務處理(OLTP)系統(tǒng)。它們涵蓋了一個組織的大部分日常操作,如購買、庫存、制造、銀行、工資、注冊、記帳等。另一方面,數(shù)據(jù)倉庫系統(tǒng)在數(shù)據(jù)

65、分析和決策方面為用戶或“知識工人”提供服務。這種系統(tǒng)可以用不同的格式組織和提供數(shù)據(jù),以便滿足不同用戶的形形色色需求。這種系統(tǒng)稱為聯(lián)機分析處理(OLAP)系統(tǒng)。</p><p>  OLTP 和OLAP 的主要區(qū)別概述如下。</p><p> ?。?)用戶和系統(tǒng)定位:聯(lián)機事務處理是以顧客為導向,用于給客戶和信息技術專家</p><p>  傳輸和職員查詢處理。在線分析

66、系統(tǒng)是以市場為導向,用于知識工作者包括管理員、執(zhí)行官和分析員處理數(shù)據(jù)。</p><p> ?。?)數(shù)據(jù)內(nèi)容:聯(lián)機事務處理系統(tǒng)管理當前數(shù)據(jù),特別的,都是一些詳細并且簡單可以用于做決定。在線分析系統(tǒng)管理大量歷史數(shù)據(jù),提供總結(jié)和聚集的設備,存儲和管理不同水平的粒度。這些特征使得用戶在做決策上更簡單。</p><p>  (3)數(shù)據(jù)庫的設計:聯(lián)機處理系統(tǒng)通常采用實體數(shù)據(jù)模型和應用聯(lián)機系統(tǒng)數(shù)據(jù)設計。

67、在線分析系統(tǒng)采用星形或雪花模型和面向主題的數(shù)據(jù)庫設計。</p><p> ?。?)視圖:聯(lián)機事務處理系統(tǒng)聚焦于當前企業(yè)或部門數(shù)據(jù),而不涉及到歷史數(shù)據(jù)或在不同組織中的數(shù)據(jù)??偟脕碚f,在線分析系統(tǒng)經(jīng)??缭皆S多數(shù)據(jù)庫版本,基于組織機構(gòu)的改革。在線分析系統(tǒng)同樣處理來自不同組織的數(shù)據(jù),從大量數(shù)據(jù)存儲中整合信息。由于體積的龐大,在線分析系統(tǒng)在多個數(shù)據(jù)媒體上建立存儲。</p><p> ?。?)存儲模式

68、:聯(lián)機處理系統(tǒng)組成短小,自動交易。如此的一個系統(tǒng)需要并發(fā)控制和恢復機制。然而,在線分析系統(tǒng)存儲大部分是只讀的,盡管大部分可以復雜查詢。</p><p>  其它區(qū)分聯(lián)機處理系統(tǒng)和在線分析系統(tǒng)包括數(shù)據(jù)大小,操作的頻率,性能的指標。</p><p>  2.但是,為什么需要一個分離的數(shù)據(jù)庫?</p><p>  “既然操作數(shù)據(jù)庫存儲了大量的數(shù)據(jù)”,你也看到了,“為什么不

69、直接執(zhí)行在線分析系統(tǒng)數(shù)據(jù)庫替代花費大量時間和資源去構(gòu)建一個分離的數(shù)據(jù)庫?</p><p>  這種分離的一個主要的原因是可以提高兩個系統(tǒng)的性能。操作數(shù)據(jù)庫是在己知的任務和負載設計的,如果用主關鍵字索引和散列,檢索特定的記錄和優(yōu)化“罐裝”的查詢。另一方面,數(shù)據(jù)倉庫查詢通常是復雜的。它們涉及了一堆數(shù)據(jù)總括水平的大量運算,它們中的一些需要特殊的算法,存儲和基于多維視圖的實現(xiàn)方法。在線分析系統(tǒng)進程查詢在操作數(shù)據(jù)中可能需要

70、降解大量的操作工作。</p><p>  另外,操作數(shù)據(jù)庫支持幾個交易的并行處理。并行控制和恢復機制,比如鎖定和測量,都需要確保交易的一致性和穩(wěn)定性。在線分析系統(tǒng)查詢通常需要對數(shù)據(jù)記錄進行只讀訪問,以進行匯總和聚集。并行控制和恢復機制,如果應用于聯(lián)機處理系統(tǒng),可能會危害控制交易的執(zhí)行,那樣的話,會大大地了降低在線分析系統(tǒng)的吞吐量。</p><p>  最后,從數(shù)據(jù)倉庫中分離數(shù)據(jù)的操作是基于

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論