版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、2950 英文單詞, 英文單詞,1.6 萬(wàn)英文字符,中文 萬(wàn)英文字符,中文 4900 字文獻(xiàn)出處: 文獻(xiàn)出處:Errattahi R , Hannani A E , Ouahmane H . Automatic Speech Recognition Errors Detection and Correction: A Review[J]. Procedia Computer Science, 2018, 128:32-37.Automa
2、tic Speech Recognition Errors Detection and Correction: A ReviewRahhal Errattahi, Asmaa El Hannani, Hassan OuahmaneAbstractEven though Automatic Speech Recognition (ASR) has matured to the point of commercial application
3、s, high error rate in some speech recognition domains remain as one of the main impediment factors to the wide adoption of speech technology, and especially for continuous large vocabulary speech recognition applications
4、. The persistent presence of ASR errors have intensified the need to find alternative techniques to automatically detect and correct such errors. The correction of the transcription errors is very crucial not only to imp
5、rove the speech recognition accuracy, but also to avoid the propagation of the errors to the subsequent language processing modules such as machine translation. In this paper, basic principles of ASR evaluation are first
6、 summarized, and then the state of the current ASR errors detection and correction research is reviewed. We focus on emerging techniques using word error rate metric.Keywords: Automatic Speech Recognition; ASR Error Dete
7、ction; ASR Error Correction; ASR evaluation;1. IntroductionAutomatic Speech Recognition (ASR) systems aims at converting a speech signal into a sequence of words either for text-based communication purposes or for device
8、 controlling. The purpose of evaluating ASR systems is to simulate human judgement of the performance of the systems in order to measure their usefulness and assess the remaining difficulties and especially when comparin
9、g systems. The standard metric of ASR evaluation is the Word Error Rate, which is defined as the proportion of word errors to words processed.ASR has matured to the point of commercial applications by providing transcrip
10、tion with an acceptable level of performance which allows integration into many applications. In general, ASR systems are effective when the conditions are well controlled. Nevertheless, they are too dependent on the tas
11、k being performed and the results are far from ideal, and especially for Large Vocabulary Continuous Speech Recognition (LVCSR) applications. This later still one of the most challenging tasks in the field, due to a numb
12、er of factors, including poor articulation, variable speaking rate and high degree of acoustic variability caused by noise, side-speech, accents, sloppy pronunciation, hesitation, repetition, interruptions and channel mi
13、smatch, and/or distortions. To deal with all these problems, there has been a plethora of algorithms and technologies proposed by the scientific communities for all steps of LVCSR over the last decade: pre-processing, fe
14、ature extraction, acoustic modeling, language modeling, decoding and result post-processing. Nevertheless LVCSR systems are not yet robust with error rates of up to 50% under certain conditions [21],[8].The persistent pr
15、esence of ASR errors motivates the attempt to find alternative techniques to assist users in correcting the transcription errors or to totally automate the correction process. evaluation procedure. In other words, the re
16、ference and recognised words get matched in order to decide which word have been deleted or inserted, and which reference- recognised string pairs have been aligned to each other, which may result in a hit or a substitut
17、ion.This is normally done by using the Viterbi Edit Distance [17] to efficiently select the reference and the recognised word sequence alignment for which the weighted error score is minimized. The Edit Distance usually
18、aligns an identical weights (1 for the Levensthein distance) to all three, insertion, substitution and deletion. Yet, unified weights may present a doubt to choose the best path alignment in the case when we have differe
19、nt ones which have the same score.To avoid this problem Morris et al. [12] suggest using different weights, such that substitution will be favoured than insertion and deletion. In general, it’s recommended to put WI = WD
20、 , and WS < WI + WS . Where WI , WS and WD are respectively the weight of insertion, substitution, and deletion.2.3. ASR Evaluation MetricsAccording to McCowan et al. [11] an ideal ASR evaluation metric should be: (i)
21、 Direct; measure ASR component independently on the ASR application, (ii) Objective; the measure should be calculated in an automated manner,(iii) Interpretable; the absolute value of the measure must give an idea about
22、the performance, and (iv) Modular; the evaluation measure should be general to allow thorough application-dependent analysis.Word Error Rate (WER) is the most popular metric for ASR evaluation, it measures the percentage
23、 of incorrect words (Substitutions (S), Insertions (I), Deletions (D)) regarding the total number of words processed. It is defined asWER = =(1)𝑆 + 𝐷 + 𝐼𝑁1𝑆 + 𝐷 + w
24、868;𝐻 + 𝑆 + 𝐷where I = total number of insertions, D = total number of deletions, S = total number of substitutions, H = total number of hits, and N1 = total number of input words.Despite of bei
25、ng the most commonly used, WER has many shortcomings [10]. First of all, WER is not a true percentage because it has no upper bound, so it doesn’t tell you how good a system is, but only that one is better than another.
26、Moreover, WER is not D/I symmetric, so in noisy conditions WER could exceed 100%, for the fact that it gives far more weight to insertions than to deletions.The WER still effective for speech recognition where errors can
27、 be corrected by typing, such as, dictation. However, for almost any other type of speech recognition systems, where the goal is more than transcription, it is necessary to look for an alternative, or additional, evaluat
28、ion framework.Many researchers have proposed alternative measures to solve the evident limitations of WER. In [12] Andrew et al. introduced two information theoretic measures of word information communicated. The first o
29、ne, named Relative Information Lost (RIL), is based on Mutual Information (I, or MI) [7], which measures the statistical dependence between the input words X and output words Y, and is calculated using the Shannon Entrop
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- [雙語(yǔ)翻譯]語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述(英文)
- 2018年語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述
- 2018年語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述.DOCX
- 2018年語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述(英文).PDF
- 語(yǔ)音識(shí)別外文文獻(xiàn)翻譯
- 語(yǔ)音識(shí)別的翻譯
- 語(yǔ)音識(shí)別文獻(xiàn)翻譯
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(節(jié)選)
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(原文)
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(原文).PDF
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述中英全
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(節(jié)選).DOCX
- 外文翻譯--基于網(wǎng)絡(luò)的自動(dòng)語(yǔ)音識(shí)別能度語(yǔ)言模型
- 語(yǔ)音識(shí)別的綜述【文獻(xiàn)綜述】
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門控系統(tǒng)設(shè)計(jì)
- 機(jī)器人語(yǔ)音識(shí)別算法的研究外文翻譯
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門控系統(tǒng)設(shè)計(jì)
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門控系統(tǒng)設(shè)計(jì)
- 基于語(yǔ)音識(shí)別和語(yǔ)音播報(bào)設(shè)計(jì)綜述【文獻(xiàn)綜述】
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門控系統(tǒng)設(shè)計(jì)(英文)
評(píng)論
0/150
提交評(píng)論