精华内容
下载资源
问答
  • What Types of Movies Do You like ?

    千次阅读 2019-10-08 18:22:19
    [1] fiction n.虚构作品;小说[2] non-fiction n.非虚构作品,纪实性文学[3] poem n.诗,诗体 [4] prose n.散文[5] novel n.(长篇)小说[6] novella n.短篇故事,中篇小说[7]novelette n.微型小说[8] essay n....

    [1] fiction n.虚构作品;小说
    [2] non-fiction n.非虚构作品,纪实性文学
    [3] poem n.诗,诗体
    [4] prose n.散文
    [5] novel n.(长篇)小说
    [6] novella n.短篇故事,中篇小说
    [7]novelette n.微型小说
    [8] essay n.随笔,小品文
    [9] biography n.传记
    [10] auto-biography n.自传

    [1] blockbuster n.大片
    a Hollywood blockbuster 好莱坞大片
    [2] visual and sound effect 视听效果
    [3] light effect 灯光效果
    [4] director n.导演
    [5] shoot v.拍摄
    [6] close-up 特写镜头
    [7] caption/subtitle n.字幕
    [8] cast n.演员表,全体演员
    [9] documentary n.纪录片
    [10] cartoon n.卡通片
    [11] anime n.动漫
    [12] romance n.爱情片
    [13] horror movie n.恐怖片
    [14] thriller n.惊悚片
    [15] disaster n.灾难片
    [16] fantasy n.奇幻片
    [17] sci-fi (science-fiction) n.科幻片
    [18] drama n.剧情片
    [19] action n.动作片

    转载于:https://www.cnblogs.com/ron123/p/5311299.html

    展开全文
  • Account Receipt      Account Alias Receipt Cycle Count Adjust     Intransit Receipt      ISO(Direct)     ISO(Intransit)   PO Receipt ...Requisition Move O
    Account Receipt
     
     
     Account Alias Receipt

    Cycle Count Adjust
     
     
    Intransit Receipt 
     
     
    ISO(Direct)
     
     
    ISO(Intransit)
     
    PO Receipt

    RMA Receipt

    Requisition Move Order

    Sales Order Pick

    Sales Order Issue

    Sales Order Issue(DropShip)

    WIP Component Issue

    WIP Move Transaction(Completion Return)

    WIP Move Transaction(Completion)

     

    转载请注明出处: http://blog.csdn.net/pan_tian/article/details/7631181
    展开全文
  • Types of Data

    千次阅读 2013-09-23 12:04:53
    Transactional Data is really what drives the business indicators of the enterprise and it relies entirely on Master Data. Examples: include sales orders, invoices, purchase orders, shipping documents,...
    企业中的数据都如何分类?

    粗略的分类
    如果粗略点的分类话,可以分为两类数据:主数据和事务型数据。
    主数据(Master Data)
    “Master Data is your business critical data that is stored in disparate systems spread across your Enterprise.”
    Master data describe the people, places, and things that are involved in an organization’s business.
    Because these data tend to be used by multiple business processes and IT systems,standardizing master data formats and synchronizing values are critical for successful system integration.
    通常主数据可以分为四类:
    • Parties(参与方): represents all parties the enterprise conducts business with such as customers, prospects, individuals, suppliers, partners, etc.
    • Places: represents the physical places and their segmentations such as geographies, locations, subsidiaries, sites, areas, zones, etc.
    • Things: usually represents what the enterprise actually sells such as products, services, packages, items, financial services, etc.
    • Financial and Organizational: represents all roll-up hierarchies used in many places for reporting and accounting purposes such as organization structures, sales territories, chart of accounts, cost centers, business units, profit centers, price lists, etc.
    事务型数据(Transactional Data)
    Such as purchase orders, invoices or financial statements, is not usually considered master data since it actually registers a “fact” that happened at a certain point in time. 
    Transactional Data is really what drives the business indicators of the enterprise and it relies entirely on Master Data.
    Examples: include sales orders, invoices, purchase orders, shipping documents, passport applications, credit card payments, and insurance claims.
    These data are typically grouped into transactional records, which include associated master data.

    两种类型数据的关系

    --------------------------------------------------------------------------------------------------------------------------------------------
    更详细的分类

    也有人把数据的类型分的更细一些。如上图中的六类,数据模型中的蓝色越深代表语义相关性越强和数据质量越重要,黄色越深代表数据的数据数量越多、更新的频率越快、实时抓取的数据越快、数据的生命越短。
    从中可以看到,元数据的数据语义性最强,几乎不更新,数据量最少,生命周期最长。

    Metadata
    This is data that describes the data held in the enterprise information architecture,
    e.g. definitions of tables and columns in the system catalog of a database, or entities and attributes in a data model. 

    Reference Data
    Tables in databases that are also called "domains", or "lookup tables". These are used to hold information about entities the enterprise does manage in its business (e.g. countries and currencies), or hold information that categorizes the enterprise's information. We define reference data this way: Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise.

    Master Data
    详细的解释见上面。

    Enterprise Structure Data
    Data that describes the structure of the enterprise, e.g. organizational structure or chart of accounts. This information is used to track business activities by responsibility. Formal definition: Data that permits business activity to be reported or analyzed by business responsibility.

    Transaction Activity Data
    This is the traditional focus of IT. It is the data that forms the transactions processed by the operational systems of the enterprise, e.g. sales, trades, etc.

    Transaction Audit Data
    An individual transaction may pass through several steps. in each step its state may change. Audit information tracks these state changes. Web logs and database logs also track this kind of data.

    --------------------------------------------------------------------------------------------------------------------------------------------
    按存储形式来划分
    结构化数据:即存储在数据库中的数据。
    非结构化数据:顾名思义,是存储在文件系统的信息,而不是数据库,如文件,邮件,社交媒体等。 据IDC的一项调查报告中指出:企业中80%的数据都是非结构化数据,这些数据每年都按指数增长60%。

    结构化数据:先有结构后有数据。
    非结构化数据:有数据,无结构。

    大数据时代最大的挑战也是来自非结构化数据的处理。并且很多时候结构化数据并不是决策最关键点。传统的BI(商业智能)分析类软件还主要是基于结构化数据,只回答一些问题 Who,What,When,Where,但是没有回答Why,How。要回答Why和How,未来可能将依赖于针对非结构化数据的分析。
    比如:从传统BI中,你能看到一个产品的销量比较差,但是你可能很难知道销量差的原因,针对非结构化数据的BI可以分析社交网络中的产品相关负面关键词,最终知道销量差的根结。




    参考:

     
    展开全文
  • R语言笔记一

    万次阅读 多人点赞 2016-06-19 21:44:10
    BUT: The one exception is a list, which is represented as a vector but can contain objects of different classes (indeed, that’s usually why we use them) Empty vectors can be created with the ...

    常用函数

    object.size() ##查询数据大小
    names() ##查询数据变量名称
    head(x, 10) ,tail(x, 10) ##查询数据前/后10行
    summary() ##对数据集的详细统计呈现
    table(x$y) ##对y值出现次数统计
    str() ##查询数据集/函数的详细结构
    nrow(),ncol() ##查询行列数
    sqrt(x) ##square root取x的平方根
    abs(x) ##absolute value取x的绝对值
    names(vect2)<-c(“foo”,”bar”,”norf”) ##给向量命名
    identical(vect,vect2) ##TRUE 检查两个向量是否一样
    vect[c(“foo”,”bar”)] ##用名字选取向量
    colnames(my_data)<-cnames ##修改数据框的列名
    t() ##互换数据框的行列
    length(“”)统计字符数,空字符时计数为1
    nchar(“”)统计字符数,空字符时计数为0
    tolower()将字符转换为小写
    toupper()将字符转换为大写
    chartr(“A”,”B”,x):字符串x中使用B替换A
    na.omit(),移除所有含有缺失值的观测(行删除,listwise deletion)
    paste()

    paste("Var",1:5,sep="")
    [1] "Var1" "Var2" "Var3" "Var4" "Var5"
    
    > x<-list(a='aaa',b='bbb',c="ccc")
    > y<-list(d="163.com",e="qq.com")
    > paste(x,y,sep="@")
    [1] "aaa@163.com" "bbb@qq.com"  "ccc@163.com"
    
    #增加collapse参数,设置分隔符
    > paste(x,y,sep="@",collapse=';')
    [1] "aaa@163.com;bbb@qq.com;ccc@163.com"
    > paste(x,collapse=';')
    [1] "aaa;bbb;ccc"
    

    strsplit()字符串拆分

    strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
    x为需要拆分的字串向量
    split为拆分位置的字串向量,默认为正则表达式匹配(fixed=FALSE),
    设置fixed=TRUE,表示使用普通文本匹配或正则表达式的精确匹配。普通文本的运算速度快
    perl=TRUE/FALSE的设置和perl语言版本有关,如果正则表达式很长,正确设置表达式并且使用perl=TRUE可以提高运算速度。
    useBytes设置是否逐个字节进行匹配,默认为FALSE,即按字符而不是字节进行匹配。
    strsplit得到的结果是列表,后面要怎么处理就得看情况而定了
    

    字符串替换:sub(),gsub()

    严格地说R语言没有字符串替换的函数
    R语言对参数都是传值不传址
    sub和gsub的区别是前者只做一次替换,gsub把满足条件的匹配都做替换
    > text<-c("Hello, Adam","Hi,Adam!","How are you,Ava")
    > sub(pattern="Adam",replacement="word",text)
    [1] "Hello, word"     "Hi,word!"        "How are you,Ava"
    > sub(pattern="Adam|Ava",replacement="word",text)
    [1] "Hello, word"      "Hi,word!"         "How are you,word"
    > gsub(pattern="Adam|Ava",replacement="word",text)
    [1] "Hello, word"      "Hi,word!"         "How are you,word"
    

    字符串提取substr(), substring()

    substr和substring函数通过位置进行字符串拆分或提取,它们本身并不使用正则表达式
    结合正则表达式函数regexpr、gregexpr或regexec使用可以非常方便地从大量文本中提取所需信息
    语法格式
    substr(x, start, stop) 
    substring(text, first, last = 1000000L)
    第 1个参数均为要拆分的字串向量,第2个参数为截取的起始位置向量,第3个参数为截取字串的终止位置向量
    substr返回的字串个数等于第一个参数的长度
    substring返回字串个数等于三个参数中最长向量长度,短向量循环使用
    > x <- "123456789" 
    > substr(x, c(2,4), c(4,5,8)) 
    [1] "234" 
    > substring(x, c(2,4), c(4,5,8)) 
    [1] "234"     "45"      "2345678"
    因为x的向量长度为1,substr获得的结果只有1个字串,
    即第2和第3个参数向量只用了第一个组合:起始位置2,终止位置4。
    substring的语句三个参数中最长的向量为c(4,5,8),执行时按短向量循环使用的规则第一个参数事实上就是c(x,x,x),
    第二个参数就成了c(2,4,2),最终截取的字串起始位置组合为:2-4, 4-5和2-8。
    

    Workspace and Files

    ls() ##查询工作区对象
    list.files(), dir() ##列出工作目录所有文件
    dir.create(“testdir”) ##创建testdir目录
    file.create(“mytest.R”) ##创建mytest.R文件
    file.exists(“mytest.R”) ##查询文件是否存在
    file.info(“mytest.R”) , file.info(“mytest.R”)$mode ##查询文件包含信息,或特定信息
    file.rename(“mytest.R”,”mytest2.R”) ##重命名为mytest2.R
    file.remove(“mytest.R”) ##删文件
    file.copy(“mytest2.R”,”mytest3.R”) ##复制为mytest3.R文件
    file.path(“mytest3.R”) ##在众多工作文件中,指定提供某个文件的相对路径。
    file.path(“folder1”,”folder2”) ##”folder1/folder2”也能创建独立于系统的路径供R工作。?

    Create a directory in the current working directory called “testdir2” and a subdirectory for it called “testdir3”, all in one command by using dir.create() and file.path().

     dir.create(file.path('testdir2','testdir3'),recursive = TRUE)
    
     unlink("testdir2", recursive = TRUE)    ##删除目录及所有(没有recursive=T,R会阻止)。名称源于unix命令。
    setwd('testdir')     ##设testdir目录,为工作目录
    > old.dir <- getwd()
    args()  ##查询函数参数构成
    sample(x) ##也可以对x重新排序
    > sample(1:6, 4, replace = TRUE)
    [1] 4 5 1 3
    
    >flips <- sample(c(0,1),100,replace = TRUE, prob = c(0.3,0.7)) #prob设定0和1出现的概率
    
    > flips
      [1] 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
     [47] 1 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0 0 1 1 1
     [93] 1 1 1 1 1 0 1 1
    

    Sequence of Numbers

    > 1:10
     [1]  1  2  3  4  5  6  7  8  9 10
    
    >pi:10   ##real numbers 实数
    [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
    

    ?‘:’查询操作符号:

    > seq(1,10)
     [1]  1  2  3  4  5  6  7  8  9 10
    
    > seq(0, 10, by=0.5)
     [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5
    [19]  9.0  9.5 10.0
    
    >my_seq<- seq(5,10,length=30)  ##在区间(5, 10)等距生成30个数
    > 1:length(my_seq)
     [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
    > seq(along.with = my_seq)
     [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
    
    > seq_along(my_seq)  **
     [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
    
    >rep(c(0,1,2),times=10)
     [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
    >rep(c(0,1,2),each=10)
     [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
    

    Vector

    > paste(1:3,c("X", "Y", "Z"),sep="")
    [1] "1X" "2Y" "3Z"
    

    * Vector recycling!*

    > paste(LETTERS, 1:4, sep = "-")
     [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1" "N-2" "O-3"
    [16] "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4" "Y-1" "Z-2"
    

    数据类型

    对象与属性 Objects and Attributes

    Objects

    R has five basic or “atomic” classes of objects:

    • character
    • numeric (real numbers)
    • integer
    • complex
    • logical (True/False)

    The most basic object is a vector

    • A vector can only contain objects of the same class
    • BUT: The one exception is a list, which is represented as a vector but can contain objects of different classes (indeed, that’s usually why we use them)

    Empty vectors can be created with the vector() function.

    Numbers

    • Numbers in R a generally treated as numeric objects (i.e. double precision real numbers)
    • If you explicitly want an integer, you need to specify the L suffix
    • Ex: Entering *1* gives you a numeric object; entering *1L* explicitly gives you an integer **
    • There is also a special number *Inf* which represents infinity; e.g. 1 / 0; Inf can be used in ordinary calculations; e.g. 1 / Inf is 0
    • The value *NaN* represents an undefined value (“not a number”); e.g. 0 / 0; *NaN* can also be thought of as a missing value (more on that later)

    Attributes

    R objects can have attributes

    • names, dimnames
    • dimensions (e.g. matrices, arrays)
    • class
    • length
    • other user-defined attributes/metadata
      Attributes of an object can be accessed using the attributes() function

    向量与列表 Vectors and Lists

    Creating Vectors

    The c() function can be used to create vectors of objects.

    > x <- c(0.5, 0.6) ## numeric
    > x <- c(TRUE, FALSE) ## logical
    > x <- c(T, F) ## logical
    > x <- c("a", "b", "c") ## character
    > x <- 9:29 ## integer
    > x <- c(1+0i, 2+4i) ## complex
    

    Using the vector() function

    > x <- vector("numeric", length = 10)
    > x
     [1] 0 0 0 0 0 0 0 0 0 0
    

    Mixing Objects

    When different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class.

    > y <- c(1.7, "a") ## character
    > y <- c(TRUE, 2) ## numeric
    > y <- c("a", TRUE) ## character
    

    Explicit Coercion 强制明确

    Objects can be explicitly coerced from one class to another using the as.* functions, if available.

    > x <- 0:6
    > class(x)
    [1] "integer"
    > as.numeric(x)
    [1] 0 1 2 3 4 5 6
    > as.logical(x)
    [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
    > as.character(x)
    [1] "0" "1" "2" "3" "4" "5" "6"
    

    Nonsensical coercion results in NAs

    > x <- c("a", "b", "c")
    > as.numeric(x)
    [1] NA NA NA
    Warning message:
    NAs introduced by coercion
    > as.logical(x)
    [1] NA NA NA
    > as.complex(x)
    [1] NA NA NA
    Warning message:
    NAs introduced by coercion 
    

    Lists

    Lists are a special type of vector that can contain elements of different classes. Lists are a very important data type in R and you should get to know them well.

    > x <- list(1, "a", TRUE, 1 + 4i)
    > x
    [[1]]
    [1] 1
    [[2]]
    [1] "a"
    [[3]]
    [1] TRUE
    [[4]]
    [1] 1+4i
    

    矩阵 Matrices

    Matrices

    Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

    > m <- matrix(nrow = 2, ncol = 3)
    > m
     [,1] [,2] [,3]
    [1,] NA NA NA
    [2,] NA NA NA
    > dim(m)
    [1] 2 3
    > attributes(m) **
    $dim
    [1] 2 3
    

    Matrices (cont’d)

    Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.

    > m <- matrix(1:6, nrow = 2, ncol = 3)
    > m
     [,1] [,2] [,3]
    [1,] 1 3 5
    [2,] 2 4 6
    

    Matrices can also be created directly from vectors by adding a dimension attribute.**

    > m <- 1:10
    > m
    [1] 1 2 3 4 5 6 7 8 9 10
    > dim(m) <- c(2, 5)  **
    > m
     [,1] [,2] [,3] [,4] [,5]
    [1,] 1 3 5 7 9
    [2,] 2 4 6 8 10
    

    cbind-ing and rbind-ing

    Matrices can be created by column-binding or row-binding with cbind() and rbind().

    > x <- 1:3
    > y <- 10:12
    > cbind(x, y)
     x y
    [1,] 1 10
    [2,] 2 11
    [3,] 3 12
    > rbind(x, y)
     [,1] [,2] [,3]
    x 1 2 3
    y 10 11 12
    

    因子 Factors

    Factors are used to represent categorical data. Factors can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label.

    • Factors are treated specially by modelling functions like *lm()* and *glm()*
    • Using factors with labels is *better* than using integers because factors are self-describing; having a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.

       x <- factor(c("yes", "yes", "no", "yes", "no"))
       x
      [1] yes yes no yes no
      Levels: no yes
       table(x)
      x
      no yes
      2 3
       unclass(x)
      [1] 2 2 1 2 1
      attr(,"levels")
      [1] "no" "yes"
      

    The order of the levels can be set using the levels argument to factor(). This can be important in linear modelling because the first level is used as the baseline level.

    > x <- factor(c("yes", "yes", "no", "yes", "no"),
     levels = c("yes", "no")) **
    > x
    [1] yes yes no yes no
    Levels: yes no
    

    缺失值 Missing Values

    Missing values are denoted by NA or NaN for undefined mathematical operations.

    • is.na() is used to test objects if they are NA
    • is.nan() is used to test for NaN
    • NA values have a class also, so there are integer NA, character NA, etc
    • A NaN value is also NA but the converse is not true

      > x <- c(1, 2, NA, 10, 3)
      > is.na(x)
      [1] FALSE FALSE TRUE FALSE FALSE
      > is.nan(x)
      [1] FALSE FALSE FALSE FALSE FALSE
      > x <- c(1, 2, NaN, NA, 4)
      > is.na(x)
      [1] FALSE FALSE TRUE TRUE FALSE
      > is.nan(x)
      [1] FALSE FALSE TRUE FALSE FALSE
      

    数据框 Data Frames

    Data frames are used to store tabular data (表格数据)

    • They are represented as a special type of list where every element of the list has to have the same length
    • Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
    • Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class
    • Data frames also have a special attribute called *row.names*
    • Data frames are usually created by calling *read.table()* or *read.csv()*
    • Can be converted to a matrix by calling *data.matrix()* *

      > x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
      > x
       foo bar
      1 1 TRUE
      2 2 TRUE
      3 3 FALSE
      4 4 FALSE
      > nrow(x)
      [1] 4
      > ncol(x)
      [1] 2
      

    Names Attribute 名字属性

    Names

    R objects can also have names, which is very useful for writing readable code and self-describing objects.

    > x <- 1:3
    > names(x)
    NULL
    > names(x) <- c("foo", "bar", "norf")
    > x
    foo bar norf
     1 2 3
    > names(x)
    [1] "foo" "bar" "norf"
    

    Lists can also have names.

    > x <- list(a = 1, b = 2, c = 3)
    > x
    $a
    [1] 1
    $b
    [1] 2
    $c
    [1] 3
    

    And matrices.

    > m <- matrix(1:4, nrow = 2, ncol = 2)
    > dimnames(m) <- list(c("a", "b"), c("c", "d")) ***
    > m
     c d
    a 1 3
    b 2 4
    

    Summary

    Data Types

    • atomic classes: numeric, logical, character, integer, complex \
    • vectors, lists
    • factors
    • missing values
    • data frames
    • names

    Reading Writing Data

    Reading Data

    There are a few principal functions reading data into R.

    • *read.table()*, *read.csv()*, for reading tabular data
    • *readLines()*, for reading lines of a text file
    • *source()*, for reading in R code files (inverse of dump)**
    • *dget()*, for reading in R code files (inverse of dput)**
    • *load()*, for reading in saved workspaces
    • *unserialize()*, for reading single R objects in binary form

    Writing Data

    There are analogous functions for writing data to files.

    • write.table()
    • writeLines()
    • dump()
    • dput()
    • save()
    • serialize()

    Reading Data Files with read.table *

    The read.table function is one of the most commonly used functions for reading data. It has a few important arguments:

    • *file*, the name of a file, or a connection
    • *header*, logical indicating if the file has a header line
    • *sep*, a string indicating how the columns are separated
    • *colClasses*, a character vector indicating the class of each column in the dataset
    • *nrows*, the number of rows in the dataset
    • *comment.char()*, a character string indicating the comment character
    • *skip*, the number of lines to skip from the beginning
    • *stringsAsFactors*, should character variables be coded as factors?

    read.table
    For small to moderately sized datasets, you can usually call read.table without specifying any other arguments.

    data <- read.table("foo.txt")

    R will automatically

    • skip lines that begin with a #
    • figure out how many rows there are (and how much memory needs to be allocated
    • figure what type of variable is in each column of the table Telling R all these things directly makes R run faster and more efficiently.
    • *read.csv* is identical to *read.table* except that the default separator is a comma.

    Reading in Larger Datasets with read.table

    With much larger datasets, doing the following things will make your life easier and will prevent R from choking.

    • Read the help page for read.table, which contains many hints
    • Make a rough calculation of the memory required to store your dataset. If the dataset is larger than the amount of RAM on your computer, you can probably stop right here.
    • Set comment.char = "" if there are no commented lines in your file. **
    • Use the *colClasses* argument. Specifying this option instead of using the default can make ’read.table’ run MUCH faster, often twice as fast. In order to use this option, you have to know the class of each column in your data frame. If all of the columns are “numeric”, for example, then you can just set *colClasses = "numeric"*. A quick an dirty way to figure out the classes of each column is the following:
    initial <- read.table("datatable.txt", nrows = 100) ***
    classes <- sapply(initial, class)
    tabAll <- read.table("datatable.txt",
                          colClasses = classes)
    • Set *nrows*. This doesn’t make R run faster but it helps with memory usage. A mild overestimate is okay. You can use the Unix tool *wc* to calculate the number of lines in a file.

    Know Thy System

    In general, when using R with larger datasets, it’s useful to know a few things about your system.

    • How much memory is available?
    • What other applications are in use?
    • Are there other users logged into the same system?
    • What operating system?
    • Is the OS 32 or 64 bit?

    Calculating Memory Requirements

    I have a data frame with 1,500,000 rows and 120 columns, all of which are numeric data. Roughly, how much memory is required to store this data frame?
    1,500,000 × 120 × 8 bytes/numeric

    = 1440000000 bytes
    = 1440000000 / bytes/MB
    = 1,373.29 MB
    = 1.34 GB

    Textual Formats

    • *dumping* and *dputing* are useful because the resulting textual format is edit-able, and in the case of corruption, potentially recoverable.
    • *Unlike* writing out a table or csv file, *dump* and *dput* preserve the *metadata* (sacrificing some readability), so that another user doesn’t have to specify it all over again.
    • *Textual* formats can work much better with version control programs like subversion or git which can only track changes meaningfully in text files
    • Textual formats can be longer-lived; if there is corruption somewhere in the file, it can be easier to fix the problem
    • Textual formats adhere to the “Unix philosophy”
    • Downside: The format is not very space-efficient

    dput-ting R Objects ?

    Another way to pass data around is by deparsing the R object with dput and reading it back in using dget.

    > y <- data.frame(a = 1, b = "a")
    > dput(y)
    structure(list(a = 1,
                     b = structure(1L, .Label = "a",
                                            class = "factor")),
                .Names = c("a", "b"), row.names = c(NA, -1L),
                class = "data.frame")
    > dput(y, file = "y.R")
    > new.y <- dget("y.R")
    > new.y
         a    b
    1   1    a
    

    Dumping R Objects ?

    Multiple objects can be deparsed(逆分析) using the dump function(转储功能) and read back in using source.

    > x <- "foo"
    > y <- data.frame(a = 1, b = "a")
    > dump(c("x", "y"), file = "data.R")
    > rm(x, y)
    > source("data.R")
    > y
        a  b
    1  1  a
    > x
    [1] "foo"
    

    Interfaces to the Outside World

    Data are read in using connection interfaces. Connections can be made to files (most common) or to other more exotic things.

    • *file*, opens a connection to a file
    • *gzfile*, opens a connection to a file compressed with gzip
    • *bzfile*, opens a connection to a file compressed with bzip2
    • *url*, opens a connection to a webpage

    File Connections **

    > str(file)
    function (description = "", open = "", blocking = TRUE,
                encoding = getOption("encoding"))
    
     1. *description* is the name of the file
     2. *open* is a code indicating
        - “r” read only
        - “w” writing (and initializing a new file)
        - “a” appending
        - “rb”, “wb”, “ab” reading, writing, or appending in binary mode (Windows)
    

    Connections

    In general, connections are powerful tools that let you navigate files or other external objects. In practice, we often don’t need to deal with the connection interface directly.

    con <- file("foo.txt", "r") **
    data <- read.csv(con)
    close(con)

    is the same as

    data <- read.csv("foo.txt")

    Reading Lines of a Text File

    > con <- gzfile("words.gz")
    > x <- readLines(con, 10)
    > x
     [1] "1080"        "10-point"   "10th"         "11-point"
     [5] "12-point"  "16-point"   "18-point"  "1st"
     [9] "2"              "20-point"
    

    writeLines takes a character vector and writes each element one line at a time to a text file.
    readLines can be useful for reading in lines of webpages

    ## This might take time
    con <- url("http://www.jhsph.edu", "r")
    x <- readLines(con)
    > head(x)
    [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">"
    [2] ""
    [3] "<html>"
    [4] "<head>"
    [5] "\t<meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8

    Subsetting

    There are a number of operators that can be used to extract subsets of R objects.

    • [ always returns an object of the same class as the original; can be used to select more than one element (there is one exception)
    • [[ is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame
    • $ is used to extract elements of a list or data frame by name; semantics are similar to that of [[.

      x <- c(“a”, “b”, “c”, “c”, “d”, “a”)
      x[1]
      [1] “a”
      x[2]
      [1] “b”
      x[1:4]
      [1] “a” “b” “c” “c”
      x[x > “a”]
      [1] “b” “c” “c” “d”
      u <- x > “a”
      u
      [1] FALSE TRUE TRUE TRUE TRUE FALSE
      x[u]
      [1] “b” “c” “c” “d”

    Subsetting Lists

    > x <- list(foo = 1:4, bar = 0.6)
    > x[1]
    $foo
    [1] 1 2 3 4
    > x[[1]]
    [1] 1 2 3 4
    > x$bar
    [1] 0.6
    > x[["bar"]]
    [1] 0.6
    > x["bar"]
    $bar
    [1] 0.6
    
    > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
    > x[c(1, 3)]
    $foo
    [1] 1 2 3 4
    $baz
    [1] "hello"
    

    The [[ operator can be used with computed indices; $ can only be used with literal names.

    > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
    > name <- "foo"
    > x[[name]]     ## computed index for ‘foo’ **
    [1] 1 2 3 4
    > x$name       ## element ‘name’ doesn’t exist!
    NULL
    > x$foo
    [1] 1 2 3 4       ## element ‘foo’ does exist
    

    Subsetting Nested Elements of a List
    The [[ can take an integer sequence.

    > x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))
    > x[[c(1, 3)]]    **
    [1] 14
    > x[[1]][[3]]
    [1] 14
    > x[[c(2, 1)]]
    [1] 3.14
    

    Subsetting a Matrix

    Matrices can be subsetted in the usual way with (i,j) type indices.

    > x <- matrix(1:6, 2, 3)
    > x[1, 2]
    [1] 3
    > x[2, 1]
    [1] 2
    

    Indices can also be missing. **

    > x[1, ]
    [1] 1 3 5
    > x[, 2]
    [1] 3 4
    

    By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 × 1 matrix. This behavior can be turned off by setting drop = FALSE.

    > x <- matrix(1:6, 2, 3)
    > x[1, 2]
    [1] 3
    > x[1, 2, drop = FALSE] **
        [,1] 
    [1,]   3
    

    Similarly, subsetting a single column or a single row will give you a vector, not a matrix (by default).

    > x <- matrix(1:6, 2, 3)
    > x[1, ]
    [1] 1 3 5
    > x[1, , drop = FALSE]
      [,1]    [,2]    [,3]
    [1,]   1       3       5
    

    Partial Matching

    Partial matching of names is allowed with [[ and $

    > x <- list(aardvark = 1:5)
    > x$a
    [1] 1 2 3 4 5
    > x[["a"]]
    NULL
    > x[["a", exact = FALSE]] ***
    [1] 1 2 3 4 5 
    

    Removing NA Values *

    A common task is to remove missing values (NAs).

    > x <- c(1, 2, NA, 4, NA, 5)
    > bad <- is.na(x)
    > x[!bad]
    [1] 1 2 4 5
    

    What if there are multiple things and you want to take the subset with no missing values?

    > x <- c(1, 2, NA, 4, NA, 5)
    > y <- c("a", "b", NA, "d", NA, "f")
    > good <- complete.cases(x, y) ***
    > good
    [1] TRUE TRUE FALSE TRUE FALSE TRUE
    > x[good]
    [1] 1 2 4 5
    > y[good]
    [1] "a" "b" "d" "f"
    
    > airquality[1:6, ]
          Ozone     Solar.R    Wind       Temp     Month   Day
    1       41       190       7.4         67       5       1
    2       36       118       8.0         72       5       2
    3       12       149       12.6       74       5       3
    4       18       313       11.5       62       5       4
    5       NA       NA       14.3       56       5       5
    6       28       NA 14.9 66 5 6
    > good <- complete.cases(airquality)
    > airquality[good, ] [1:6, ]   ***
             Ozone Solar.R   Wind      Temp       Month     Day
    1       41       190       7.4         67       5       1
    2       36       118       8.0         72       5       2
    3       12       149       12.6       74       5       3
    4       18       313       11.5       62       5       4
    7       23       299       8.6         65       5       7
    

    Vectorized Operations 向量化操作

    Many operations in R are vectorized making code more efficient, concise, and easier to read.

    > x <- 1:4; y <- 6:9
    > x + y
    [1] 7 9 11 13
    > x > 2
    [1] FALSE FALSE TRUE TRUE
    > x >= 2
    [1] FALSE TRUE TRUE TRUE
    > y == 8
    [1] FALSE FALSE TRUE FALSE
    > x * y
    [1] 6 14 24 36
    > x / y
    [1] 0.1666667 0.2857143 0.3750000 0.4444444
    

    Vectorized Matrix Operations

    > x <- matrix(1:4, 2, 2); y <- matrix(rep(10, 4), 2, 2) ?
    > x * y             ## element-wise multiplication
            [,1]    [,2]
    [1,]    10    30
    [2,]    20    40
    > x / y
         [,1]    [,2]
    [1,]    0.1    0.3
    [2,]    0.2    0.4
    > x %*% y       ## true matrix multiplication
              [,1]    [,2]
    [1,]      40    40
    [2,]      60    60
    

    Missing Value

    is.na(mydata) 与 mydata == NA 结果一样

    R uses ‘one-based indexing‘, which (you
    | guessed it!) means the first element of a vector is considered element 1.

    x[c(2, 10)] ##取x的第2个和第10个数
    x[c(-2, -10)] ##取除去第2个和第10个的所有数
    x[-c(2, 10)] ##同上

    展开全文
  • A CAPTCHA or a “Completely Automated Public Turing test to tell Computers and Humans Apart,” comes in several shapes, sizes and types. These all work quite well against spam, but some are ...
  • TYPES OF TESTING

    千次阅读 2010-02-24 13:43:00
    TYPES OF TESTING 1. Black Box Testing. 21.1 FUNCTIONAL TESTING.. 21.2 STRESS TESTING.. 21.3 LOAD TESTING.. 31.4 AD-HOC TESTING.. 31.5 EXPLORATORY TESTING.. 31.6 USABILITY TESTING..
  • 流利说 Level 4 全文

    万次阅读 多人点赞 2019-05-22 10:52:40
    Mountains are formed by forces deep within the Earth, and are made of different types of rocks. Rivers are streams of water that usually begin in mountains and flow into the sea. Many early cities ...
  • oracle创建jobs定时任务报错:PLS-00306: wrong number or types of arguments in call to 'JOB' 原脚本: begin sys.dbms_job.submit(job => job, what => 'xxx;', next_...
  • 基础知识, 还要整清楚一点地,转贴地址: http://www.sap-img.com/abap/what-are-different-types-of-internal-tables-and-their-usage.htm Standard Internal Tables
  • 原文: http://www.softwaretestinghelp.com/what-type-of-database-questions-are-asked-in-interview-for-testing-positions/...What Types of Database Questions are Asked in Interview for Testing Positions? –
  • 流利说 Level 3 全文

    万次阅读 多人点赞 2019-05-22 10:51:17
    Lesson 4 Types of Words Dialogue Lesson 5 Good News & Bad News 4/4 Listening Lesson 1 Leonardo da Vinci 1-2 Vocabulary Lesson 3 Sources of Pollution Lesson 4 Historical Figures...
  • Thinking with Types

    2019-07-04 11:05:24
    Thinking with Types started, as so many of my projects do, accidentally. I was unemployed, bored, and starting to get tired of answering the same questions over and over again in Haskell chat-rooms. ...
  • he 3 Types of Buyers, and How to Optimize for Each One [Guest post by Jeremy Smith.] I absolutely love buyer psychology and neuroeconomics. Want to know why? ● Because it’s like a secret ...
  • Java - The Basics of Java Generics

    万次阅读 2019-10-15 11:41:26
    分享一个大牛的人工智能教程。零基础!通俗易懂!风趣幽默!希望你也加入到人工智能的队伍中来...Java Generics were introduced in JDK 5.0 with the aim of reducing bugs and adding an extra layer of abstract...
  • Understand the implementation of basic data types and why using the correct type is so important Work with XML data through the XML data type Construct XML data from relational result sets Store ...
  •  What is the location of the directory of C header files that match your running kernel? [/usr/src/linux/include]  如果直接回车,会显示 The path "/usr/src/linux/include" is not an existing ...
  • Proof of Stake FAQ

    万次阅读 2019-03-26 09:52:08
    Contents ... What are the benefits of proof of stake as opposed to proof of work? How does proof of stake fit into traditional Byzantine fault tolerance research? What is the "n...
  • 产生conflicting types for的两个原因

    万次阅读 2018-12-11 14:52:25
    a.c:3:18: warning: its scope is only this definition or declaration, which is probably not what you want a.c:9:6: error: conflicting types for ‘func’ void func(struct A *A) ^ a.c:3:6: note: ...
  • GCC内联函数:__builtin_types_compatible_p

    千次阅读 2016-07-06 11:35:54
    #if 0 — Built-in Function: int __builtin_types_compatible_p (type1, type2)...You can use the built-in function __builtin_types_compatible_p to determine whether two types are the same. This built-in fun
  • What is The Rule of Three?

    千次阅读 2014-10-09 12:45:40
    C++ treats variables of user-defined types with value semantics. This means that objects are implicitly copied in various contexts, and we should understand what "copying an object
  • What is the Difference Between Proxy Types?

    千次阅读 2013-03-05 20:15:16
    原文地址:http://chris.olstrom.com/privacy/proxy-types/...What is the Difference Between Proxy Types? Written by Chris Olstrom in Explanation, Privacy I indicated in another article that p
  • error: conflicting types for xxx error: previous implicit declaration of xxx was here 原因与解决办法: 一、函数使用的位置位于声明之前,或未声明函数原型就使用函数; 这种情况往往出现在函数实现文件中...
  • Elements of Programming (PDF 版本)

    热门讨论 2015-09-09 10:40:28
    The book presents a number of algorithms and requirements for types on which they are defined. The code for these descriptions—also available on the Web—is written in a small subset of C++ meant to ...
  • Discover how stronger types mean cleaner, more efficient, and optimized ... You'll see how to create a set of reusable tools that unify and ease the scalar types of PHP. ...
  • Unit 4: Sentence Types

    千次阅读 2014-10-04 17:35:44
    SENTENCE TYPES   Simple Sentences   Compound Sentences   Complex Sentences   Compound-Complex Sentences When you start to put together all the clauses and phrase
  • Lecture 3 - Types of LearningLearning with Different Output Space
  • This post assumes and requires that you have read the introductory post to this series which also includes a table of content... With that out of the way let’s look at restrictions around compound types
  • What happened when new an object in JVM ?

    千次阅读 2019-10-29 12:07:20
    原文链接:https://www.javaspring.net/java/what-happened-when-new-an-object-in-jvm I. Introduction As you know, Java is an object-oriented programming ... We usually use a variety of objects while wri...
  • machine learning definition and types

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 83,056
精华内容 33,222
关键字:

oftypeswhat