[雙語(yǔ)翻譯]大數(shù)據(jù)外文翻譯--大數(shù)據(jù)中的數(shù)據(jù)學(xué)習(xí)_第1頁(yè)
已閱讀1頁(yè),還剩11頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、中文 中文 5850 字, 字,3500 英文單詞, 英文單詞,18500 英文字符 英文字符文獻(xiàn)出處: 文獻(xiàn)出處:Torrecilla J L, Romo J. Data learning from big data[J]. Statistics John Walker, 2014; James, 2018). The ideas behind this interest are mainly two. First, the fac

2、t that at present, most activities generate data (with very low cost) that contains (potentially valuable) information. The second one is well summarized in John Walker (2014): ‘‘Data- driven decisions are better decisio

3、ns - it is as simple as that. Using big data enables managers to decide on the basis of evidence rather than intuition’’. The opportunities offered by big data are undeniable, but there is still a debate about the scope

4、and usefulness of this (Secchi, 2018; Bühlmann and van de Geer, 2018). The opinions of the most fervent followers speak of the end of the theory and the models and, in articles like the controversial ‘‘The end of th

5、eory’’ (Anderson, 2008) they argue that ‘‘with enough data, the numbers speak for themselves’’. On the other hand, there have been more critical voices that question whether the optimism and the faith that is being put i

6、nto the big data are really justified. In this line, Tim Harford wonders if ‘‘we are making a mistake’’ in another provocative article (Harford, 2014). In this paper we will review some of the big data aspects that can g

7、enerate doubts from the point of view of a statistician trying to scrutinize if the data are sufficient by themselves or it is necessary to give them a sense.First, it is convenient to be more specific. Although there is

8、 no single definition, there seems to be a certain consensus that big data encompasses the study of problems so ‘‘Big’’ that conventional tools and models cannot handle them, either because they are not adequate or becau

9、se they require too much time. In any case, whatever the definition we choose or where we put the emphasis, what is clear is that current technology generates huge amounts of data, so we have to be able to extract the be

10、st information from them and use it to make the best decisions. How to get it and the challenges associated to this new framework have become common discussion topic in the last years (Lynch, 2008; Fan et al., 2014; Gand

11、omi and Haider, 2015) and the best way to tackle the problem has also been subject of debate. As an example, we can cite the former paper by Breiman et al. (2001) about the two cultures of statistical modeling: stochasti

12、c models and algorithms (see Dunson, 2018 for recent discussion in the context of big data). In what follows, we discuss the role of statistics regarding some of the issues raised by big data in this new and statistics (

13、Singh and Reddy, 2015).Furthermore, non-relational or NoSQL databases arise from the need to both treat and search the data quickly and organizing the unstructured information. These models go a step further the parallel

14、ization of conventional relational databases (which is possible using Hive or HBase) by proposing new approaches that allow for greater dynamism and facilitate maintenance and scalability.We could go deeper into these po

15、ints, which are of great interest, and talk about other technical difficulties related to velocity (for example, in visualization or with the consistency of data), but the central task of statisticians here is, firstly,

16、the development of computationally efficient algorithms. This aspect, which is usually forgotten or overlooked in the statistical literature, is critical in this context because even an optimal algorithm is completely us

17、eless if it cannot be applied.3. ComplexityMostly, the complexity inherent to big data comes from the high dimensionality of the observations and the unstructured nature of the data we generate with smart phones, sensors

18、, social networks, internet searches, GPS devices, emails, and so on.Problems associated with the dimensionality are well known by statisticians and other researchers. They have been extensively studied and many proposal

19、s have been done. Beyond the particular properties of each technique, the dimension reduction methods can be grouped in two big families that we could call projection data and variable selection. Both approaches (variabl

20、e selection and projection methods) have been extensively studied and compared at different contexts. The difference between these two approaches is the role of the original variables in the reduced space. While variable

21、 selection is restricted to the original variables, projection methods allow certain combinations of them in order to obtain the new components. This makes variable selection techniques provide meaningful reductions (hig

22、hly appreciated in areas such as biology or medicine).Both methodologies have proven to be effective in a wide range of problems and are already being applied to big data questions where the issues typically associated w

23、ith the high dimension take on special relevance. For example, Singh et al.(2014) uses random forest in a distributed framework for variable selection, an evolutionary algorithm based on MapReduce is proposed in Peralta

24、et al. (2015) for big data classification and Bolón-Canedo et al. (2015) provides a survey of some feature selection algorithms for big data problems ranging from DNA microarray analysis to face recognition. Further

25、more, parallel penalized coordinates descent methods as those used for lasso optimization and others are studied in Richtárik and Taká? (2016), and fast versions of other traditional algorithms have been propos

26、ed, as this version of PCA based on randomization (Abraham and Inouye, 2014). Finally, reduction methods applied to more complex structures can also be found, for example subgraphs (Pan et al., 2015).An important exampl

27、e of big data with high dimensional observations is functional data. Functional Data Analysis (FDA) deals with objects of infinite dimension and problem related with this functional nature are sometimes similar those ass

28、ociated with big data (see, e.g., Ahmed, 2017; Goia and Vieu, 2016). In particular, a very relevant question in functional data analysis is dimension reduction (see Vieu, 2018).Despite the number of works about these top

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論