版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、1,Database Systems(資料庫系統(tǒng)),November 7, 2005Lecture #7By Hao-hua Chu (朱浩華),2,Announcement,Assignment #2 solution will be posted on the course homepage.Assignment #3 is due on Thur (11/10) outside TA’s office.Do not ac
2、cept late assignments.Assignment solution will be posted on the course homepage on Friday (11/11).Practicum assignment #1 is available on course homepage.Midterm next MondayChapters 1,2 (except 2.7),3, 4.1~4.2, 5,8,9
3、,3,,,Storing Data: Disks and Files,Chapter 9,4,,,Disks and Files,DBMS stores information on (“hard”) disks.This has major performance implications for DB system design!READ: transfer data from disk to main memory (RAM)
4、.WRITE: transfer data from RAM to disk.Both are high-cost operations, relative to in-memory operations, so must be planned carefully!,5,,,Why Not Store Everything in Main Memory?,Costs too much. $100 for 1G of SDRAM$
5、100 for 250 GB of HD (cost x250) $40 for 50 GB of tapes. (cost same as HD) -> “Is Tape for backup dead?”Main memory is volatile. We want data to be saved between runs. Typical storage hierarchy:Main memory (R
6、AM) for currently used data.Disk for the main database (secondary storage).Tapes for archiving older versions of the data (backup storage) or just disk-to-disk backup.,6,,,Disks,Secondary storage device of choice. Mai
7、n advantage over tapes: random access vs. sequential.Tapes are for data backup, not for operational data.Access the last byte in a tape requires winding through the entire tape.Data is stored and retrieved in units c
8、alled disk blocks or pages.Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has major impact on DBMS performance!,7,,,Components of a Disk
9、,The platters spin.The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder.Only one head reads/writes at any one time.Block size is a multiple of secto
10、r size (which is fixed).,,8,9,,,Accessing a Disk Page,Time to access (read/write) a disk block, called access time, is a sum of:seek time (moving arm to position disk head on right track)rotational delay (waiting for b
11、lock to rotate under head)transfer time (actually moving data to/from disk surface)Seek time and rotational delay (mechanical parts) dominate the access timeSeek time varies from about 1 to 20msec (avg 10msec)Rotati
12、onal delay varies from 0 to 8msec (avg. 4msec)Transfer rate is about 100MBps (0.025msec per 4KB page)Key to lower I/O cost: reduce seek/rotation delays!If two pages of records are accessed together frequently, put the
13、m close together on disk.,10,11,,,Arranging Pages on Disk,Next block concept (measure the closeness of blocks)(1) blocks on same track (no movement of arm), followed by(2) blocks on same cylinder (switch head, but almo
14、st no movement of arm), followed by(3) blocks on adjacent cylinder (little movement of arm)Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay.For a sequential s
15、can, pre-fetching several pages at a time is a big win!,12,,,,,,,,,,,Platters,,,Spindle,,Disk head,,,Sector,,1,2,3,Next Block Concept,13,RAID,RAID = Redundant Arrays of Independent (Inexpensive) DisksDisk Array: Arrange
16、ment of several disks that gives abstraction of a single, large disk.Goals: Increase performance and reliability.Say you have D disks & each I/O request wants D blocksHow to improve the performance (data transfer
17、rate)?How to improve the performance (request service rate)?How to improve reliability (in case of disk failure)?,14,Two main techniques in RAID,Data striping improves performance.Data (e.g., in the same time file) is
18、 partitioned across multiple HDs; size of a partition is called the striping unit. Performance gain is from reading/writing multiple HDs at the same time.Redundancy improves reliability. Data striping lowers reliabili
19、ty: More disks → more failures.Store redundant information on different disks. When a disk fails, you can reconstruct data from other disks.,15,RAID Levels,Level 0: No redundancy (only data striping)Level 1: Mirrored
20、(two identical copies)Level 0+1: Striping and Mirroring(Level 2: Error-Correcting Code)Level 3: Bit-Interleaved ParityLevel 4: Block-Interleaved ParityLevel 5: Block-Interleaved Distributed Parity(Level 6: Error-Co
21、rrecting Code)More Levels (01-10, 03/30, …),16,RAID Level 0,Strip data across all drives (minimum 2 drives)Sequential blocks of data (in the same file) are written across multiple disks in stripes.Two performance crit
22、erions:Data transfer rate: net transfer rate for a single (large) filerequest service rate: rate at which multiple requests (from different files) can be serviced,17,RAID Level 0,Improve data transfer rate:Read 10 blo
23、cks (1~10) takes only 2-block access time (worse of 5 disks).Theoretical speedup over single disk = N (number of disks)Improve request service rate:File 1 occupies blocks 1 and file 2 occupies block 2. Service two re
24、quests (two files) at the same time.Theoretical speedup over single disk = N.,18,RAID Level 0,Poor reliability:Mean Time To Failure (MTTF) of one disk = 50K hours (5.7 years).MTTF of a disk array of 100 disks is 50K/1
25、00 = 500 hours (21 days)! MTTF decreases linearly with the number of disks.No space redundancy No overhead,19,Mirrored (RAID Level 1),Redundancy by duplicating data on different disks:Mirror means copy each file to
26、both disksSimple but expensive.Fault-tolerant to a single disk failureRecovery by copying data from the other disk to new disk.The other copy can continue to service requests (availability) during recovery.,20,Mirror
27、ed (RAID Level 1),Performance is not the objective, but reliability.Mirroring frequently used when availability is more important than storage efficiency.Data transfer rate:Write performance may be slower than single
28、disk, why?Worse of 2 disksRead performance can be faster than single disk, why?Consider reading block 1 from disk 0 and block 2 from disk 1 at the same time.Compare read performance to RAID Level 0?B
29、etter, but why?,21,Mirrored (RAID Level 1),Data reliability:Assume Mean-Time-To-Repair (MTTR) is 1 hour.Shorter with Hotswap HDs.MTTF of Mirrored 2-disks = 1 / (probability that 2 disks will fail within the same hour)
30、 = MTTR2/2 = (50K) 2/2 hours = many many years.Space redundancy overhead:50% overhead,22,Striping and Mirrors (RAID 0+1),Disk 5,Disk 6,Disk 7,Disk 8,Disk 9,23,Bit-Interleaved Parity (RAID Level 3),Fine-grained striping
31、 at the bit levelOne parity disk:Parity bit value = XOR across all data bit values If one disk fails, recover the lost data: XOR across all good data bit values and parity bit value,?,1,0,0,24,Bit-Interleaved Parity
32、(RAID Level 3),Performance:Transfer rate speedup?x32 of single diskRequest service rate improvement?Same as single disk (do one request at a time),Reliability:Can tolerate 1 disk failure.Space overhead:One parity
33、disk (1/33 overhead),25,Block-Interleaved Parity (RAID Level 4),Coarse-grained striping at the block levelOtherwise, it is similar to RAID 3If one disk fails, recovery the lost block:Read same block of all disks (incl
34、uding parity disk) to reconstruct the lost block.,26,Block-Interleaved Parity (RAID Level 4),Performance:If error, read/write of same block on all disks (worse-of-N on one block)If no error, write also needs to update
35、(read-n-write) the parity block. (no need to read other disks)Can compute new parity based on old data, new data, and old parityNew parity = (old data XOR new data) XOR old parity Result in bottleneck on the parity d
36、isk! (can do only one write at a time)How to remove this bottleneck?,27,Block-Interleaved Parity (RAID Level 4),Reliability:Can tolerate 1 disk failure.Space redundancy overhead:1 parity disk,28,Block-Interleaved Di
37、stributed-Parity (RAID Level 5),Remove the parity disk bottleneck in RAID L4 by distributing the parity uniformly over all of the disks. No single parity disk as bottleneck; otherwise, it is the same as RAID 4.Performa
38、nce improvement in write.You can write to multiple disks (in 2-disk pairs) in parallel.Reliability & space redundancy are the same as RAID L4.,,29,Structure of DBMS,Disk Space Managermanage space (pages) on disk.
39、Buffer Managermanage traffic between disk and main memory. (bring in pages from disk to main memory).File and Access MethodsOrganize records into pages and files.,Query Optimizationand Execution,Relational Operators,
40、Files and Access Methods,Buffer Manager,Disk Space Manager,,,,,,Applications,,Queries,30,,,Disk Space Manager,Lowest layer of DBMS software manages space on disk.Higher levels call upon this layer to:allocate/de-alloca
41、te a pageread/write a pageRequest for a sequence of pages should be satisfied by allocating the pages sequentially on disk! Support the “Next” block concept (reduce I/O cost when multiple sequential pages are request
42、ed at the same time).Higher levels (buffer manager) don’t need to know how this is done, or how free space is managed.,31,More on Disk Space Manager,Keep track of free (used) blocks:List of free blocks + the pointer to
43、 the first free blockBitmap with one bit for each disk block. Bit=1 (used), bit=0 (free)Bitmap approach can be used to identify contiguous areas on disk.,32,,,Buffer Manager,Typically, DBMS has more data than main mem
44、ory.Bring Data into main memory for DBMS to operate on it!Table of pairs is maintained.,,MAIN MEMORY,DISK,,disk page,,free frame,,Page Requests from Higher Levels,BUFFER POOL,,choice of frame dictatedby replacement p
45、olicy,,33,,,When a Page is Requested ...,If the requested page is not in pool (and no free frame):Choose an occupied frame for replacementPage replacement policy (minimize page miss rate)If the replaced frame is dirty
46、, write it to diskRead requested page into chosen framePin the page and return its address. For each frame, you maintainPin_count: number of outstanding requestsDirty: modified and need to written back to diskIf re
47、quests can be predicted (e.g., sequential scans), pages can be pre-fetched several pages at a time.,34,,,More on Buffer Manager,Requestor of page must unpin it (no longer need it), and indicate whether the page has been
48、modified: dirty bit is used for this.Page in pool may be requested many times, a pin count is used. A page is a candidate for replacement iff pin count = 0.,35,,,Buffer Replacement Policy,Frame is chosen for replacem
49、ent by a replacement policy:Least-recently-used (LRU): have LRU queue of frames with pin_count = 0Clock (approximate LRU with less overhead)Use an additional reference_bit per page; set to 1 when the frame is accessed
50、Clock hand moving from frame 0 to frame n.Reset reference_bit of recently accessed frames.Replace frame(s) with reference_bit = 0 & pin_count = 0.FIFO, MRU, Random, etc.Policy can have big impact on # of I/O’s;
51、depends on the access pattern.,36,Clock Algorithm Example,,disk page,,free frame,BUFFER POOL,,,,,,,Pin_count=0,1,1,1,0,0,,Clock Hand,37,Sequential Flooding,Use LRU + repeated sequential scans + (# buffer frames < # pa
52、ges in file)What many page I/O replacements?Example:#buffer frames = 2#pages in a file = 3 (P1, P2, P3)Repeated scan of fileEvery scan of the file result in reading every page of the file.,38,,,DBMS vs. OS File Sys
53、tem,OS does disk space & buffer mgmt: why not let OS manage these tasks?DBMS can better predict the page reference patterns & pre-fetch pages. Adjust replacement policy, and pre-fetch pages based on access patt
54、erns in typical DB operations.DBMS needs to be able to pin a page in memory and force a page to disk.Differences in OS support: portability issuesDBMS can maintain a virtual file that spans multiple disks.,39,,,Files
55、of Records,Higher levels of DBMS operate on records, and files of records.FILE: A collection of pages, each containing a collection of records. Must support:Insert/delete/modify record(s)Read a particular record (spec
56、ified using record id)Scan all records (possibly with some conditions on the records to be retrieved)To support record level operations, we must keep track of:Fields in a record: Record formatRecords on a page: Page
57、formatPages in a file: File format,40,L1,,L2,L3,L4,F1,F2,F3,F4,,,Record Formats (how to organize fields in a record): Fixed Length,Information about field types and offset same for all records in a file; stored in syste
58、m catalogs.Finding i-th field requires adding offsets to base address.,,Base address (B),,,Address = B+L1+L2,41,Fields Delimited by Special Symbols,,,Record Formats: Variable Length,Two alternative formats (# fields is
59、fixed):,Second alternative offers direct access to the i-th field, efficient storage of nulls (special don’t know value); small directory overhead.,,FieldCount,F1 F2 F3
60、 F4,F1 F2 F3 F4,Array of Field Offsets,42,,,Page Formats (How to store records in a page): Fixed Length Records,Record id = . They differ on how deletion (which creates a hole)
61、is handled. In first alternative, shift remaining records to fill hole => changes rid; may not be acceptable given external reference.,,,,,,,,,,,,,,,,,,,,,,Slot 1,Slot 2,Slot N,. . .,. . .,N,M,1,0,. . .,M ... 3
62、2 1,PACKED,UNPACKED, BITMAP,Slot 1,Slot 2,Slot N,,FreeSpace,,,,,Slot M,1,1,number of records,,numberof slots,,43,,,Page Formats: Variable Length Records,,Slot directory contains one slot per record.Each slot contai
63、ns (record offset, record length)Deletion is by setting the record offset to -1.Can move records on page without changing rid (change the record offset, but same slot number); so, attractive for fixed-length records.,P
64、age i,,,,,Rid = (i,N),Rid = (i,2),Rid = (i,1),,,,,,,Pointerto startof freespace,SLOT DIRECTORY,N . . . 2 1,,,,20,16,24,,,N,# slots,44,,,Unordered (Heap) Files,Simplest file structure conta
65、ins records in no particular order.As file grows and shrinks, disk pages are allocated and de-allocated.There are two ways to implement a heap fileDouble-Linked listsPage directory,45,,,Heap File (Doubly Linked Lists
66、),The header page id and Heap file name must be stored someplace.Each page contains 2 `pointers’ plus data.The problem is that inserting a variable size record requires walking through free space list to find a page wi
67、th enough space.,,,,,,,,HeaderPage,DataPage,DataPage,DataPage,DataPage,DataPage,DataPage,,,,,,,,,,,,,,,,,Pages withFree Space,Full Pages,,,46,,,Heap File (Page Directory),The directory is a collection of pages.E
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 14.資料庫database
- 資料庫系統(tǒng)
- 液壓支架系統(tǒng)資料庫
- 資料庫系統(tǒng)的架構(gòu)
- 數(shù)據(jù)庫系統(tǒng)基礎(chǔ)教程database systems(1)
- 資料庫授權(quán)內(nèi)容
- 資料庫系統(tǒng)理論與實(shí)務(wù)
- ch01資料庫系統(tǒng)理論
- 資料庫系統(tǒng)-健行科技大學(xué)
- 物資采購管理資料庫
- 環(huán)評資料庫_27991
- 二.資料庫系統(tǒng)建立與管理
- 高速資料庫引擎開發(fā)
- 物件導(dǎo)向式資料庫
- 第10章資料庫
- 資料庫-淡江大學(xué)
- 第1章資料庫系統(tǒng)的基礎(chǔ)觀念
- 藝術(shù)概論復(fù)習(xí)資料庫
- 電子書與資料庫
- 進(jìn)入總體經(jīng)濟(jì)資料庫
評論
0/150
提交評論