國外操作系統(tǒng)相關(guān)論文flexiblesystemcallschedulingwithexceptionlesssystemcalls_第1頁
已閱讀1頁,還剩13頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、FlexSC: Flexible System Call Scheduling with Exception-Less System CallsLivio Soares University of Toronto Michael Stumm University of TorontoAbstractFor the past 30+ years, system calls have been the de facto interface

2、used by applications to request services from the operating system kernel. System calls have almost uni- versally been implemented as a synchronous mechanism, where a special processor instruction is used to yield user-

3、space execution to the kernel. In the first part of this paper, we evaluate the performance impact of traditional synchronous system calls on system intensive workloads. We show that synchronous system calls negatively a

4、ffect performance in a significant way, primarily because of pipeline flushing and pollution of key processor structures (e.g., TLB, data and instruction caches, etc.). We propose a new mechanism for applications to requ

5、est services from the operating system kernel: exception-less system calls. They improve processor effi- ciency by enabling flexibility in the scheduling of operat- ing system work, which in turn can lead to significantl

6、y in- creased temporal and spacial locality of execution in both user and kernel space, thus reducing pollution effects on processor structures. Exception-less system calls are par- ticularly effective on multicore proce

7、ssors. They primar- ily target highly threaded server applications, such as Web servers and database servers. We present FlexSC, an implementation of exception- less system calls in the Linux kernel, and an accompany- in

8、g user-mode thread package (FlexSC-Threads), binary compatible with POSIX threads, that translates legacy synchronous system calls into exception-less ones trans- parently to applications. We show how FlexSC improves per

9、formance of Apache by up to 116%, MySQL by up to 40%, and BIND by up to 105% while requiring no modi- fications to the applications.1 IntroductionSystem calls are the de facto interface to the operating sys- tem kernel.

10、They are used to request services offered by, and implemented in the operating system kernel. While0 2000 4000 6000 8000 10000 12000 14000 160000.30.50.70.91.11.31.5Syscall impact on user-mode IPCTime (in cycles)User-mod

11、e IPC(higher is faster)Syscall exceptionLost performance (cycles)Figure 1: User-mode instructions per cycles (IPC) of Xalan (from SPEC CPU 2006) in response to a system call exception event, as measured on an Intel Core

12、i7 processor.different operating systems offer a variety of different ser- vices, the basic underlying system call mechanism has been common on all commercial multiprocessed operat- ing systems for decades. System call i

13、nvocation typically involves writing arguments to appropriate registers and then issuing a special machine instruction that raises a synchronous exception, immediately yielding user-mode execution to a kernel-mode except

14、ion handler. Two im- portant properties of the traditional system call design are that: (1) a processor exception is used to communicate with the kernel, and (2) a synchronous execution model is enforced, as the applicat

15、ion expects the completion of the system call before resuming user-mode execution. Both of these effects result in performance inefficiencies on mod- ern processors. The increasing number of available transistors on a ch

16、ip (Moore’s Law) has, over the years, led to increasingly sophisticated processor structures, such as superscalar and out-of-order execution units, multi-level caches, and branch predictors. These processor structures ha

17、ve, in turn, led to a large increase in the performance poten- tial of software, but at the same time there is a widening gap between the performance of efficient software and the performance of inefficient software, pri

18、marily due to the increasing disparity of accessing different processor re- sources (e.g., registers vs. caches vs. memory). Server and system-intensive workloads, which are of particularSyscall Instructions Cycles IPC i

19、-cache d-cache L2 L3 d-TLBstat 4972 13585 0.37 32 186 660 2559 21pread 3739 12300 0.30 32 294 679 2160 20pwrite 5689 31285 0.18 50 373 985 3160 44open+close 6631 19162 0.34 47 240 900 3534 28mmap+munmap 8977 19079 0.47 4

20、1 233 869 3913 7open+write+close 9921 32815 0.30 78 481 1462 5105 49Table 1: System call footprint of different processor structures. For the processors structures (caches and TLB), the numbers represent number of entrie

21、s evicted; the cache line for the processor is of 64-bytes. i-cache and d-cache refer to the instruction and data sections of the L1 cache, respectively. The d-TLB represents the data portion of the TLB.kernel stack, cha

22、nging the protection domain, and redi- recting execution to the registered exception handler. Sub- sequently, return from exception is necessary to resume execution in user-mode. We measured the mode switch time by imple

23、ment- ing a new system call, gettsc that obtains the time stamp counter of the processor and immediately returns to user-mode. We created a simple benchmark that in- voked gettsc 1 billion times, recording the time-stamp

24、 before and after each call. The difference between each of the three time-stamps identifies the number of cycles necessary to enter and leave the operating system kernel, namely 79 cycles and 71 cycles, respectively. Th

25、e total round-trip time for the gettsc system call is modest at 150 cycles, being less than the latency of a memory ac- cess that misses the processor caches (250 cycles on our machine).32.2 System Call FootprintThe mode

26、 switch time, however, is only part of the cost of a system call. During kernel-mode execution, processor structures including the L1 data and instruction caches, translation look-aside buffers (TLB), branch prediction t

27、a- bles, prefetch buffers, as well as larger unified caches (L2 and L3), are populated with kernel specific state. The re- placement of user-mode processor state by kernel-mode processor state is referred to as the proce

28、ssor state pollu- tion caused by a system call. To quantify the pollution caused by system calls, we used the Core i7 hardware performance counters (HPC). We ran a high instruction per cycle (IPC) workload, Xalan, from t

29、he SPEC CPU 2006 benchmark suite that is known to invoke few system calls. We configured an HPC to trigger infrequently (once every 10 million user- mode instructions) so that the processor structures would be dominated

30、with application state. We then set up the HPC exception handler to execute specific system calls, while measuring the replacement of application state in the processor structures caused by kernel execution (but not by t

31、he performance counter exception handler itself).3For all experiments presented in this paper, user-mode applications execute in 64-bit mode and when using synchronous system calls, use the “syscall” x86 64 instruction,

32、which is currently the default in Linux.Table 1 shows the footprint on several processor struc- tures for three different system calls and three system call combinations. The data shows that, even though the num- ber of

33、i-cache lines replaced is modest (between 2 and 5 KB), the number of d-cache lines replaced is signifi- cant. Given that the size of the d-cache on this processor is 32 KB, we see that the system calls listed pollute at

34、least half of the d-cache, and almost all of the d-cache in the “open+write+close” case. The 64 entry first level d- TLB is also significantly polluted by most system calls. Finally, it is interesting to note that the sy

35、stem call impact on the L2 and L3 caches is larger than on the L1 caches, primarily because the L2 and L3 caches use more aggres- sive prefetching.2.3 System Call Impact on User IPCUltimately, the most important measure

36、of the real cost of system calls is the performance impact on the applica- tion. To quantify this, we executed an experiment similar to the one described in the previous subsection. However, instead of measuring kernel-m

37、ode events, we only mea- sured user-mode instructions per cycle (IPC), ignoring all kernel execution. Ideally, user-mode IPC should not de- crease as a result of invoking system calls, since the cy- cles and instructions

38、 executed as part of the system call are ignored in our measurements. In practice, however, user-mode IPC is affected by two sources of overhead:Direct: The processor exception associated with the sys- tem call instructi

39、on that flushes the processor pipeline.Indirect: System call pollution on the processor struc- tures, as quantified in Table 1.Figures 2 and 3 show the degradation in user-mode IPC when running Xalan (from SPEC CPU 2006)

40、 and SPEC- JBB, respectively, given different frequencies of pwrite calls. These benchmarks were chosen since they have been created to avoid significant use of system services, and should spend only 1-2% of time executi

41、ng in kernel- mode. The graphs show that different workloads can have different sensitivities to system call pollution. Xalan has a baseline user-mode IPC of 1.46, but the IPC degrades by up to 65% when executing a pwrit

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論