版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、(第一講),2011年2月21日,程 旭,引 論,高等計(jì)算機(jī)系統(tǒng)結(jié)構(gòu),主要教材:,主講教師:,授課時(shí)間地點(diǎn):每周一 下午 15:10—18:00 二教102http://mprc.pku.edu.cn,Computer Architecture: A Quantitative Approach,4th Edition (Oct, 2006) ,Patterson and Hennessy,教材與教師,“高等計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)”
2、的教學(xué)目標(biāo),學(xué)習(xí)和把握將決定二十一世紀(jì)計(jì)算機(jī)具體形態(tài)的設(shè)計(jì)技術(shù)、機(jī)器結(jié)構(gòu)、工藝要素、評(píng)價(jià)方法等,,計(jì)算機(jī)應(yīng)用需要什么?操作系統(tǒng)需要那些功能支持?優(yōu)化編譯可以利用和實(shí)現(xiàn)哪些功能?我們能夠建造什么樣的機(jī)器?今后的計(jì)算機(jī)將會(huì)怎樣?計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)研究人員必須具有寬厚的專業(yè)知識(shí)!,計(jì)算機(jī)基礎(chǔ),數(shù)字邏輯,計(jì)算機(jī)組織與結(jié)構(gòu),操作系統(tǒng),編譯技術(shù),,數(shù)據(jù)結(jié)構(gòu)應(yīng)用基礎(chǔ)C語(yǔ)言編程,,存儲(chǔ)管理調(diào)度并發(fā),,代碼生成優(yōu)化,,基本邏輯單元處理器
3、基礎(chǔ)知識(shí),本課程在教學(xué)安排中的地位,高等計(jì)算機(jī)體系結(jié)構(gòu),,,,如何實(shí)現(xiàn)!具體細(xì)節(jié) ---知其然!,分析+評(píng)測(cè)—知其所以然!并行計(jì)算機(jī)系統(tǒng)結(jié)構(gòu),Charles Babbage 1791-1871Lucasian Professor of Mathematics, Cambridge University, 1827-1839,Charles Babbage,Difference Engine 1823Analy
4、tic Engine 1833The forerunner of modern digital computer!,ApplicationMathematical Tables – AstronomyNautical Tables – NavyBackground Any continuous function can be approximated by a polynomial --- Weie
5、rstrass Technologymechanical - gears, Jacquard’s loom, simple calculators,Difference EngineA machine to compute mathematical tables,Weierstrass:Any continuous function can be approximated by a polynomialAny polynom
6、ial can be computed from difference tablesAn examplef(n)= n2 + n + 41d1(n)= f(n) - f(n-1) = 2nd2(n)= d1(n) - d1(n-1) = 2f(n)= f(n-1) + d1(n) = f(n-1) + (d1(n-1) + 2),all you need is an adder!,Babbage’s Differen
7、ceEngine 11832,Analytic Engine,1833: Babbage’s paper was publishedconceived during a hiatus in the development of the difference engineInspiration: Jacquard Loomslooms were controlled by punched cardsThe set of car
8、ds with fixed punched holes dictated the pattern of weave ? programThe same set of cards could be used with different colored threads ? numbers1871: Babbage diesThe machine remains unrealized.,It is not clear if t
9、he analytic engine could be built even today using only mechanical technology,Babbage’s Difference Engine 2and Analytical Engine,1834 Babbage Analytical Engine,The Store: Memory unit consisting of counter wheelsThe M
10、ill: The arithmetic unit capable of 4 operations used a pair of register and produced results stored in another register in the storeOperation Cards: Specified one of Four operationsVariable Cards: Specified the memory
11、 location to be usedOutput: Printer or punch,Babbage Analytical Engine,Analytic EngineThe first conception of a general-purpose computer,The store in which all variables to be operated upon, as well as all those quanti
12、ties which have arisen from the results of the operations are placed.The mill into which the quantities about to be operated upon are always brought.,The first programmer Ada Byron aka “Lady Lovelace” 1815-52,Ada’s t
13、utor was Babbage himself!,While not using the practical technology of the era, Alan Turing developed the idea of a "Universal Machine" capable of executing anydescribable algorithm, and forming the basis for
14、the concept of "computability". Perhaps more importantly Turing's ideas differed from those of others who were solving arithmetic problems by introducing the concept of "symbol processing".,1937,
15、Alan Turing,第一臺(tái)通用電子計(jì)算機(jī)--ENIAC,1946年2月14日J(rèn). Presper Eckert&John MauchlyMoore SchoolUniversity of PennsylvaniaSize: 80 feet long 8.5 feet high18,000 vacuum tubes5000 additions/sec.,The world’s first ge
16、neral-purpose electronic computerconditional Jump and be programmable, distinguished it from earlier onesUsed for computing artillery firing tables,,Electronic Numerical Integrator and Calculator,Accumulator,28 vacuum
17、tubes,,ENIAC’S Application: Ballistic calculationsangle = f (location, tail wind, cross wind, air density, temperature, weight of shell, propellant charge, ... ),WW-2 Effort,ENIAC was NO
18、T a “stored program” device,For each problem, someone analyzed the arithmetic processing needed and prepared wiring diagrams for the computors to use when wiring the machineProcess was time consuming and error proneCle
19、aning personnel often knocked cables out of their place and just put them back somewhere,Wiring the machine,Electronic Discrete Variable Automatic Computer (EDVAC),ENIAC’s programming system was externalSequences of ins
20、tructions were executed independently of the results of the calculationHuman intervention required to take instructions “out of order”Eckert, Mauchly, John von Neumann and others designed EDVAC (1944) to solve this pro
21、blemSolution was the stored program computer? “program can be manipulated as data”First Draft of a report on EDVAC was published in 1945, but just had von Neumann’s signature!In 1973 the court of Minneapolis attrib
22、uted the honor of inventing the computer to John Atanasoff,The von Neumann Machine,Stored Program ComputerIAS(Institute for Advanced Study) Computer,1946,MainMemory,ArithmeticLogicUnit,ProgramControl Unit,I/OEquip
23、ment,,,,,,,,,存儲(chǔ)程序的思想 即構(gòu)成計(jì)算機(jī)程序的指令可同數(shù)據(jù)一樣事先存放到存儲(chǔ)器中,然后由計(jì)算機(jī)自己一條條取出執(zhí)行。這種思想很自然地引出了轉(zhuǎn)移指令和可對(duì)指令的地址部分進(jìn)行修改的概念,從而使一段程序的指令可以自動(dòng)地被有意義地多次執(zhí)行。,1949年,EDSAC開始運(yùn)行其基于累加器的結(jié)構(gòu)和其指令系統(tǒng)設(shè)計(jì)對(duì)以后一段時(shí)期的機(jī)器設(shè)計(jì)有著重要影響,第一臺(tái)全面的、可操作的、存儲(chǔ)程序計(jì)算機(jī)--EDSACThe world’s fi
24、rst full-scale,operational,stored-program computer,Maurice Wilkes,Cambridge UniversityEDSAC: Electronic Delay Storage Automatic Calculator,Bell Labs,1940: Ohl develops the PN Junction1945: Shockley's laboratory est
25、ablished1947: Bardeen and Brattain create point contact transistor (U.S. Patent 2,524,035),Diagram from patent application,Bell Labs,1951: Shockley develops a junction transistor manufacturable in quantity (U.S. Patent
26、2,623,105),Diagram from patent application,The Integrated Circuit,1959: Jack Kilby, working at TI, dreams up the idea of a monolithic “integrated circuit”Components connected by hand-soldered wires and isolated by “shap
27、ing”, PN-diodes used as resistors (U.S. Patent 3,138,743),Diagram from patent application,Integrated Circuits,1961: TI and Fairchild introduce the first logic ICs ($50 in quantity)1962: RCA develops the first MOS transi
28、stor,RCA 16-transistor MOSFET IC,Fairchild bipolar RTL Flip-Flop,The Microprocessor,1971: Intel introduces the 4004General purpose programmable computer instead of custom chip for Japanese calculator company,微處理器性能,,,,4
29、004108 kilohertz0.06 MIPS,80802 MHz 0.64 MIPS,80888 MHz0.75 MIPS,Intel386? SX CPU33 MHz2.9 MIPS,Intel486? DX CPU50 MHz41 MIPS,Pentium® Processor233 MHz,Intel® Celeron® Processor1.3 GHz,Pentium&
30、#174; 4 Processor3GHz,Sea Change in Chip Design,Intel 4004 (1971): 4-bit processor,2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm2 chip,Processor is the new transistor?,RISC II (1983): 32-bit, 5 stage pipeline, 40,
31、760 transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip,125 mm2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+DcacheRISC II shrinks to ~ 0.02 mm2 at 65 nm,Multicore,Small number of cores, shared memorySome systems hav
32、e multithreaded coresTrend to simplicity in cores (e.g. no branch prediction)Multiple threads share resources (L2 cache, maybe FP units)Deployment in embedded market as well as other sectors,IBM Power4, 2001,Sun T-1 (
33、Niagara), 2005,AMD True quad core die 2007,Cell from IBM and Sony,Intel 80核芯片(2007),80個(gè)處理核心1 Teraflop 100億次運(yùn)算/瓦特主頻3.1GHz 面積 300mm²,各CPU內(nèi)核與內(nèi)存1對(duì)1地連接,分別擁有256MBps的內(nèi)存帶寬32MB的片上靜態(tài)RAM 。單芯片整體的內(nèi)存帶寬達(dá)到了1TB/s,13.75mm * 22
34、 mm,IBM POWER7(2010),,,CPU技術(shù)發(fā)展簡(jiǎn)史,Charles Babbage’s Engines(1823)Turing Machine(1937)ENIAC(1946) EDSAC(1949)CPU MicroprocessorGeneral Purpose Microprocessor VS. special CPU for HPCMulticore, Manycore or …,,,,,連接PC
35、(WWW),分離PC(email),,,,,,,,信息家電,手持 Hand-helds,無(wú)線、手機(jī)Cellphones &phone access,游戲機(jī) Game Consoles,機(jī)頂盒 網(wǎng)絡(luò)計(jì)算機(jī)Set-tops & NCs,9百萬(wàn)套,6千萬(wàn)套,2億5千萬(wàn)套,1985,1995,2005,Sources: Network Computer Inc. & IDC,,因特網(wǎng)訪問(wèn)方式的改進(jìn),,,,,,,,
36、,,,,,,,,,,,,,,,,,,,,,,,,,,,,電腦空間與人和其他物理世界的數(shù)字接口,,平臺(tái),內(nèi)容,,,平臺(tái),內(nèi)容,,,局域網(wǎng)和家庭網(wǎng),公共和私用廣域網(wǎng),因特網(wǎng):網(wǎng)絡(luò)的網(wǎng)絡(luò),,,,,,,……,,,,,,,,,,,,,,,計(jì) 算,通信,數(shù)字化,電腦空間:螺旋上升,,驅(qū)動(dòng)后PC時(shí)代的兩大技術(shù):1) 移動(dòng)消費(fèi)類設(shè)備例如:新一代PDA、新一代移動(dòng)通信設(shè)備、 可穿戴計(jì)算機(jī)2) 支持上述設(shè)備的基礎(chǔ)設(shè)施:例如:新一代Big Fa
37、t Web Servers、 Database Servers,后PC時(shí)代(PC+時(shí)代),新的浪潮—微處理器將無(wú)處不在,,Source: Richard Newton,嵌入式微處理器,What?A programmable processor whose programming interface is not accessible to the end-user of the product.The only user-inter
38、action is through the actual application.Examples:- Sharp PDA’s are encapsulated products with fixed functionality- 3COM Palm pilots were originally intended as embedded systems. Opening up the programmers interface
39、turned them into more generic computer systems.,Some interesting numbers,The Intel 4004 was intended for an embedded application (a calculator)Of todays microprocessors95% go into embedded applicationsSSH3/4 (Hitachi)
40、: best selling RISC microprocessor(1997)ARM: best selling embedded microprocessor(2001-)50% of microprocessor revenue stems from embedded systemsOften focused on particular application areaMicrocontrollersDSPsMedia
41、 ProcessorsGraphics ProcessorsNetwork and Communication Processors,不同的評(píng)價(jià)標(biāo)準(zhǔn),,,,,,,Flexibility,Power,Cost,Performance as a Functionality Constraint(“Just-in-Time Computing”),Components of CostArea of die / yieldCode d
42、ensity (memory is the major part of die size)PackagingDesign effortProgramming costTime-to-marketReusability,VLSI工藝發(fā)展加快(Gate Length),,芯片制作流程,若? =3, 晶模成本 大致以 晶模大小的 四次方 增長(zhǎng),集成電路的成本,封裝成本: 取決于管腳數(shù)量和散熱要求,ChipDie Pack
43、age Test &Totalcostpinstypecost Assembly386DX$4 132QFP$1 $4 $9 486DX2$12 168PGA$11 $12 $35 PowerPC 601$53 304QFP$3 $21 $77 HP PA 7100$73 504PGA$35 $16 $124 DEC Alpha
44、$149 431PGA$30 $23 $202 SuperSPARC$272 293PGA$20 $34 $326 Pentium$417 273PGA$19 $37 $473,其他成本,Cost/PerformanceWhat is Relationship of Cost to Price?,Component CostsDirect Costs (add 25% to 40%) r
45、ecurring costs: labor, purchasing, scrap, warrantyGross Margin (add 82% to 186%) nonrecurring costs: R&D, marketing, sales, equipment maintenance, rental, financing cost, pretax profits, taxesAverage Discount to g
46、et List Price (add 33% to 66%): volume discounts and/or retailer markup,iPad: Apple’s profit comes from margins in hardware,+ Apple margin,$499,,,,$230,,$70,,$90,,$110,Average industry margin(approx. 30 %),Cost of mater
47、ials andmanufacturing1,Cost of sales(approx. 30 %),,Margin:40%,Source: iSuppli,功耗密度進(jìn)一步惡化,Surpassed hot-plate power density in 0.5mNot too long to reach nuclear reactor,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
48、,,,,,,,,,,,,,,1,10,100,1000,1.5m,1m,0.7m,0.5m,0.35m,0.25m,0.18m,0.13m,0.1m,0.07m,Watts/cm,2,,i386,,i486,,Pentium,processor,,Pentium Pro,processor,,Pentium II,processor,,Pentium III,processor,,Hot plate,Nuclear Reactor,,,
49、RocketNozzle,,Sun’sSurface,,,,,優(yōu)化能耗,高性能通用微處理器 (例如, Pentiums)10-100 Watts, 100-1000MIPS = 0.01 Mips/mW節(jié)能通用微處理器 (例如, StrongARM)0.5 Watts, 160 MIPS = 0.3 Mips/mW節(jié)能專用處理器(例如, MPEG2)100 Mops/mW,開關(guān)能耗,MIMD,,Multiprocessor
50、s and Multicomputer Clusters,Nine Computer Price Tiers(2000),Super server: costs more than $100,000“Mainframe”: costs more than $1 millionan array of processors, disks, tapes, comm ports,1$: embeddables e.g. gre
51、eting card 10$: wrist watch & wallet computers 100$:pocket/ palm computers 1,000$:portable computers 10,000$: personal computers (desktop) 100,000$: dep
52、artmental computers (closet) 1,000,000$:site computers (glass house) 10,000,000$:regional computers (glass castle) 100,000,000$:national centers,What is Computer Architecture?,Application,Physics,(bu
53、t there are exceptions, e.g. magnetic compass),In its broadest definition, computer architecture is the design of the abstraction layers that allow us to implement information processing applications efficiently using av
54、ailable manufacturing technologies.,Abstraction Layers in Modern Systems,Algorithm,Gates/Register-Transfer Level (RTL),Application,Instruction Set Architecture (ISA),Operating System/Virtual Machine,Microarchitecture,Dev
55、ices,Programming Language,Circuits,Physics,The End of the Uniprocessor Era,Single biggest change in the history of computing systems,Old Conventional Wisdom: Power is free, Transistors expensiveNew Conventional Wisdom:
56、“Power wall” Power expensive, Transistors free (Can put more on chip than can afford to turn on)Old CW: Sufficient increasing Instruction-Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, …)
57、New CW: “ILP wall” law of diminishing returns on more HW for ILP Old CW: Multiplies are slow, Memory access is fastNew CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for mul
58、tiply)Old CW: Uniprocessor performance 2X / 1.5 yrsNew CW: Power Wall + ILP Wall + Memory Wall = Brick WallUniprocessor performance now 2X / 5(?) yrs? Sea change in chip design: multiple “cores” (2X processors per
59、 chip / ~ 2 years)More, simpler processors are more power efficient,Conventional Wisdom in Computer Architecture,Uniprocessor Performance,VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x8
60、6: ??%/year 2002 to present,From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006,Problems with Sea Change,Algorithms, Programming Languages, Compilers, Operating Systems
61、, Architectures, Libraries, … not ready to supply Thread-Level Parallelism or Data-Level Parallelism for 1000 CPUs / chip, Architectures not ready for 1000 CPUs / chipUnlike Instruction-Level Parallelism, cannot be sol
62、ved by computer architects and compiler writers alone, but also cannot be solved without participation of architects4th Edition of textbook “Computer Architecture: A Quantitative Approach” explores shift from Instructio
63、n-Level Parallelism to Thread-Level Parallelism / Data-Level Parallelism,Instruction Set Architecture: Critical Interface,,,,,,,,,,,,,,,,,,,,,,,,,,instruction set,software,hardware,Properties of a good abstractionLasts
64、through many generations (portability)Used in many different ways (generality)Provides convenient functionality to higher levelsPermits an efficient implementation at lower levels,Instruction Set Architecture,“... th
65、e attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical
66、implementation.” – Amdahl, Blaauw, and Brooks, 1964,-- Organization of Programmable Storage-- Data Types & Data Structures: Encodings & Representations-- Instruction Formats-- Instruc
67、tion (or Operation Code) Set-- Modes of Addressing and Accessing Data Items and Instructions-- Exceptional Conditions,Example: MIPS,,,,,0,r0r1°°°r31,,,,PClohi,Programmable storage2^32 x byte
68、s31 x 32-bit GPRs (R0=0)32 x 32-bit FP regs (paired DP)HI, LO, PC,Data types ?Format ?Addressing Modes?,,Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU,
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 沒有幻燈片標(biāo)題-北京大學(xué)微處理器研究開發(fā)中心
- 北京大學(xué)微處理器研發(fā)中心簡(jiǎn)介
- 北京大學(xué)微處理器研發(fā)中心簡(jiǎn)介
- 計(jì)算機(jī)專業(yè)文獻(xiàn)翻譯---微處理器
- 北京大學(xué)計(jì)算機(jī)系歷年研究生入學(xué)考試試題
- 高等計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)
- 高等計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)-清華大學(xué)計(jì)算機(jī)系高性能所
- 2018年北京大學(xué)計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)專業(yè)方向全日制考研復(fù)試分?jǐn)?shù)線
- 北京大學(xué)計(jì)算機(jī)所gpu集群招標(biāo)采購(gòu)項(xiàng)目
- 《高等計(jì)算機(jī)系統(tǒng)結(jié)構(gòu)》課程大綱
- 基于MPC823處理器的微計(jì)算機(jī)系統(tǒng).pdf
- (計(jì)算機(jī)與系統(tǒng)結(jié)構(gòu))arm處理器
- 計(jì)算機(jī)網(wǎng)絡(luò)基礎(chǔ)標(biāo)準(zhǔn)答案北京大學(xué)
- 基于國(guó)產(chǎn)處理器計(jì)算機(jī)系統(tǒng)回卷恢復(fù)機(jī)制的研究.pdf
- 2019年北京大學(xué)計(jì)算機(jī)技術(shù)考研經(jīng)驗(yàn)分享
- 臺(tái)式計(jì)算機(jī)中微處理器散熱器散熱特性的研究.pdf
- 北京大學(xué)計(jì)算中心存儲(chǔ)陣列
- 北京大學(xué)地空學(xué)院地球和行星動(dòng)力學(xué)中心并行計(jì)算機(jī)
- 微處理器
- 計(jì)算機(jī)網(wǎng)絡(luò)基礎(chǔ)標(biāo)準(zhǔn)答案及解析北京大學(xué)
評(píng)論
0/150
提交評(píng)論