精华内容
下载资源
问答
  • 4-1 R语言函数 lapply

    2018-11-11 15:44:00
    #lapply函数 #可以循环处理列表中的每一个元素 #lapply(参数):lapply(列表,函数/函数名,其他参数) #总是返回一个列表 #sapply:简化结果 #结果列表元素长度均为1,返回向量 #结果列表元素长度相同且大于1,返回矩阵...
    #lapply函数
    #可以循环处理列表中的每一个元素
    #lapply(参数):lapply(列表,函数/函数名,其他参数)
    #总是返回一个列表
    
    #sapply:简化结果
    #结果列表元素长度均为1,返回向量
    #结果列表元素长度相同且大于1,返回矩阵
    
    > str(lapply)
    function (X, FUN, ...)  
    
    > ?str
    
    > x <- list(a=1:10,b=c(11,21,31,41,51))
    > x
    $`a`
     [1]  1  2  3  4  5  6  7  8  9 10
    
    $b
    [1] 11 21 31 41 51
    
    > lapply(x,mean)
    $`a`
    [1] 5.5
    
    $b
    [1] 31
    
    > x <- 1:4
    > lapply(x,runif)
    [[1]]
    [1] 0.5754994
    
    [[2]]
    [1] 0.3157821 0.7646459
    
    [[3]]
    [1] 0.2289793 0.1715219 0.6473963
    
    [[4]]
    [1] 0.634688171 0.326673566 0.007179751 0.687418686
    
    > lapply(x,runif,min=0,max=100)
    [[1]]
    [1] 40.30112
    
    [[2]]
    [1] 31.06171 64.75319
    
    [[3]]
    [1] 45.190536  8.243788 98.328863
    
    [[4]]
    [1] 22.22585 18.63806 57.53813 54.82982
    
    > x <- list(a=matrix(1:6,2,3),b=matrix(4:7,2,2))
    > lapply(x,function(m) m[1,])
    $`a`
    [1] 1 3 5
    
    $b
    [1] 4 6
    
    > x <- list(a=1:10,b=c(11,21,31,41,51))
    > lapply(x,mean)
    $`a`
    [1] 5.5
    
    $b
    [1] 31
    
    
    > sapply(x,mean)
       a    b 
     5.5 31.0 
    
    > class(sapply(x, mean))
    [1] "numeric"
    

    转载于:https://www.cnblogs.com/hankleo/p/9942281.html

    展开全文
  • R语言apply、lapply、sapply、tspply函数 在描述性统计量方面,R语言提供了数不胜数的选择,这里主要讲apply、lapply、sapply、tspply函数的用法和区别。这四者的基本用法非常相似,都是对的行或者列执行同样的运算...
  • R语言笔记_lapply与apply

    2021-05-10 21:37:56
    lapply与apply是R中循环语句 apply(a,1,function(x)) 对每一行执行命令,并返回结果; apply(a,2,function(x)) 对每一列执行命令,并返回结果; lapply lapply(a,1,function(x)) 指定的函数对列表中每一个向量执行...

    lapply与apply是R中循环语句

    apply(a,1,function(x)) 对每一行执行命令,并返回结果;
    apply(a,2,function(x)) 对每一列执行命令,并返回结果;

    lapply
    lapply(a,1,function(x)) 指定的函数对列表中每一个向量执行命令,并返回结果;

    展开全文
  • R语言swirl教程10——lapply and sapply | In this lesson, you’ll learn how to use lapply() and sapply(), the two most important members of R’s *apply family of functions, also known as loop functions....

    R语言swirl教程(R Programming)10——lapply and sapply

    | In this lesson, you’ll learn how to use lapply() and sapply(), the two most important members of R’s *apply family of functions, also known as loop functions.

    | These powerful functions, along with their close relatives (vapply() and tapply(), among others) offer a concise and convenient means of implementing the Split-Apply-Combine strategy for data analysis.

    | Each of the *apply functions will SPLIT up some data into smaller pieces, APPLY a function to each piece, then COMBINE the results. A more detailed discussion of this strategy is found in Hadley Wickham’s Journal of Statistical Software paper titled ‘The Split-Apply-Combine Strategy for Data Analysis’.

    | Throughout this lesson, we’ll use the Flags dataset from the UCI Machine Learning Repository. This dataset contains details of various nations and their flags. More information may be found here: http://archive.ics.uci.edu/ml/datasets/Flags

    | Let’s jump right in so you can get a feel for how these special functions work!

    | I’ve stored the dataset in a variable called flags. Type head(flags) to preview the first six lines (i.e. the ‘head’) of the dataset.

    head(flags)
    name landmass zone area population language religion bars stripes colours red green blue gold white
    1 Afghanistan 5 1 648 16 10 2 0 3 5 1 1 0 1 1
    2 Albania 3 1 29 3 6 6 0 0 3 1 0 0 1 0
    3 Algeria 4 1 2388 20 8 2 2 0 3 1 1 0 0 1
    4 American-Samoa 6 3 0 0 1 1 0 0 5 1 0 1 1 1
    5 Andorra 3 1 0 0 6 0 3 0 3 1 0 1 1 0
    6 Angola 4 2 1247 7 10 5 0 2 3 1 0 0 1 0
    black orange mainhue circles crosses saltires quarters sunstars crescent triangle icon animate text topleft
    1 1 0 green 0 0 0 0 1 0 0 1 0 0 black
    2 1 0 red 0 0 0 0 1 0 0 0 1 0 red
    3 0 0 green 0 0 0 0 1 1 0 0 0 0 green
    4 0 1 blue 0 0 0 0 0 0 1 1 1 0 blue
    5 0 0 gold 0 0 0 0 0 0 0 0 0 0 blue
    6 1 0 red 0 0 0 0 1 0 0 1 0 0 red
    botright
    1 green
    2 red
    3 white
    4 red
    5 red
    6 black

    | You may need to scroll up to see all of the output. Now, let’s check out the dimensions of the dataset using dim(flags).

    dim(flags)
    [1] 194 30

    | This tells us that there are 194 rows, or observations, and 30 columns, or variables. Each observation is a country and each variable describes some characteristic of that country or its flag. To open a more complete description of the dataset in a separate text file, type viewinfo() when you are back at the prompt (>).

    | As with any dataset, we’d like to know in what format the variables have been stored. In other words, what is the ‘class’ of each variable? What happens if we do class(flags)? Try it out.

    class(flags)
    [1] “data.frame”

    | That just tells us that the entire dataset is stored as a ‘data.frame’, which doesn’t answer our question. What we really need is to call the class() function on each individual column. While we could do this manually (i.e. one column at a time) it’s much faster if we can automate the process. Sounds like a loop!

    | The lapply() function takes a list as input, applies a function to each element of the list, then returns a list of the same length as the original one. Since a data frame is really just a list of vectors (you can see this with as.list(flags)), we can use lapply() to apply the class() function to each column of the flags dataset. Let’s see it in action!

    | Type cls_list <- lapply(flags, class) to apply the class() function to each column of the flags dataset and store the result in a variable called cls_list. Note that you just supply the name of the function you want to apply (i.e. class), without the usual parentheses after it.

    cls_list <- lapply(flags, class)

    | Type cls_list to view the result.

    cls_list
    $name
    [1] “factor”
    $landmass
    [1] “integer”
    $zone
    [1] “integer”
    $area
    [1] “integer”
    $population
    [1] “integer”
    $language
    [1] “integer”
    $religion
    [1] “integer”
    $bars
    [1] “integer”
    $stripes
    [1] “integer”
    $colours
    [1] “integer”
    $red
    [1] “integer”
    $green
    [1] “integer”
    $blue
    [1] “integer”
    $gold
    [1] “integer”
    $white
    [1] “integer”
    $black
    [1] “integer”
    $orange
    [1] “integer”
    $mainhue
    [1] “factor”
    $circles
    [1] “integer”
    $crosses
    [1] “integer”
    $saltires
    [1] “integer”
    $quarters
    [1] “integer”
    $sunstars
    [1] “integer”
    $crescent
    [1] “integer”
    $triangle
    [1] “integer”
    $icon
    [1] “integer”
    $animate
    [1] “integer”
    $text
    [1] “integer”
    $topleft
    [1] “factor”
    $botright
    [1] “factor”

    | The ‘l’ in ‘lapply’ stands for ‘list’. Type class(cls_list) to confirm that lapply() returned a list.

    class(cls_list)
    [1] “list”

    | As expected, we got a list of length 30 – one element for each variable/column. The output would be considerably more compact if we could represent it as a vector instead of a list.

    | You may remember from a previous lesson that lists are most helpful for storing multiple classes of data. In this case, since every element of the list returned by lapply() is a character vector of length one (i.e. “integer” and “vector”), cls_list can be simplified to a character vector. To do this manually, type as.character(cls_list).

    as.character(cls_list)
    [1] “factor” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer” “integer”
    [12] “integer” “integer” “integer” “integer” “integer” “integer” “factor” “integer” “integer” “integer” “integer”
    [23] “integer” “integer” “integer” “integer” “integer” “integer” “factor” “factor”

    | sapply() allows you to automate this process by calling lapply() behind the scenes, but then attempting to simplify (hence the ‘s’ in ‘sapply’) the result for you. Use sapply() the same way you used lapply() to get the class of each column of the flags dataset and store the result in cls_vect. If you need help, type ?sapply to bring up the documentation.

    cls_vect <- sapply(flags, class)

    | Use class(cls_vect) to confirm that sapply() simplified the result to a character vector.

    class(cls_vect)
    [1] “character”

    | In general, if the result is a list where every element is of length one, then sapply() returns a vector. If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix. If sapply() can’t figure things out, then it just returns a list, no different from what lapply() would give you.

    | Let’s practice using lapply() and sapply() some more!

    | Columns 11 through 17 of our dataset are indicator variables, each representing a different color. The value of the indicator variable is 1 if the color is present in a country’s flag and 0 otherwise.

    | Therefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on their flag, we can just add up all of the 1s and 0s in the ‘orange’ column. Try sum(flags$orange) to see this.

    sum(flags$orange)
    [1] 26

    | Now we want to repeat this operation for each of the colors recorded in the dataset.

    | First, use flag_colors <- flags[, 11:17] to extract the columns containing the color data and store them in a new data frame called flag_colors. (Note the comma before 11:17. This subsetting command tells R that we want all rows, but only columns 11 through 17.)

    flag_colors <- flags[, 11:17]

    | Use the head() function to look at the first 6 lines of flag_colors.

    head(flag_colors)
    red green blue gold white black orange
    1 1 1 0 1 1 1 0
    2 1 0 0 1 0 1 0
    3 1 1 0 0 1 0 0
    4 1 0 1 1 1 0 1
    5 1 0 1 1 0 0 0
    6 1 0 0 1 0 1 0

    | To get a list containing the sum of each column of flag_colors, call the lapply() function with two arguments. The first argument is the object over which we are looping (i.e. flag_colors) and the second argument is the name of the function we wish to apply to each column (i.e. sum). Remember that the second argument is just the name of the function with no parentheses, etc.

    lapply(flag_colors, sum)
    $red
    [1] 153
    $green
    [1] 91
    $blue
    [1] 99
    $gold
    [1] 91
    $white
    [1] 146
    $black
    [1] 52
    $orange
    [1] 26

    | This tells us that of the 194 flags in our dataset, 153 contain the color red, 91 contain green, 99 contain blue, and so on.

    | The result is a list, since lapply() always returns a list. Each element of this list is of length one, so the result can be simplified to a vector by calling sapply() instead of lapply(). Try it now.

    sapply(flag_colors, sum)
    red green blue gold white black orange
    153 91 99 91 146 52 26

    | Perhaps it’s more informative to find the proportion of flags (out of 194) containing each color. Since each column is just a bunch of 1s and 0s, the arithmetic mean of each column will give us the proportion of 1s. (If it’s not clear why, think of a simpler situation where you have three 1s and two 0s – (1 + 1 + 1 + 0 + 0)/5 = 3/5 = 0.6).

    | Use sapply() to apply the mean() function to each column of flag_colors. Remember that the second argument to sapply() should just specify the name of the function (i.e. mean) that you want to apply.

    sapply(flag_colors, mean)
    red green blue gold white black orange
    0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412 0.1340206

    | In the examples we’ve looked at so far, sapply() has been able to simplify the result to vector. That’s because each element of the list returned by lapply() was a vector of length one. Recall that sapply() instead returns a matrix when each element of the list returned by lapply() is a vector of the same length (> 1).

    | To illustrate this, let’s extract columns 19 through 23 from the flags dataset and store the result in a new data frame called flag_shapes. flag_shapes <- flags[, 19:23] will do it.

    flag_shapes <- flags[, 19:23]

    | Each of these columns (i.e. variables) represents the number of times a particular shape or design appears on a country’s flag. We are interested in the minimum and maximum number of times each shape or design appears.

    | The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use lapply() to apply the range function to each column of flag_shapes. Don’t worry about storing the result in a new variable. By now, we know that lapply() always returns a list.

    lapply(flag_shapes, range)
    $circles
    [1] 0 4
    $crosses
    [1] 0 2
    $saltires
    [1] 0 1
    $quarters
    [1] 0 4
    $sunstars
    [1] 0 50

    | Do the same operation, but using sapply() and store the result in a variable called shape_mat.

    shape_mat <- sapply(flag_shapes, range)

    | View the contents of shape_mat.

    shape_mat
    circles crosses saltires quarters sunstars
    [1,] 0 0 0 0 0
    [2,] 4 2 1 4 50

    | Each column of shape_mat gives the minimum (row 1) and maximum (row 2) number of times its respective shape appears in different flags.

    | Use the class() function to confirm that shape_mat is a matrix.

    class(shape_mat)
    [1] “matrix”

    | As we’ve seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for each of the examples we’ve looked at so far. Let’s look at an example where sapply() can’t figure out how to simplify the result and thus returns a list, no different from lapply().

    | When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the ‘unique’ elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)).

    unique(c(3,4,5,5,5,6,6))
    [1] 3 4 5 6

    | We want to know the unique values for each variable in the flags dataset. To accomplish this, use lapply() to apply the unique() function to each column in the flags dataset, storing the result in a variable called unique_vals.

    unique_vals <- lapply(flags, unique_vals)
    Error in match.fun(FUN) : object ‘unique_vals’ not found
    unique_vals <- lapply(flags, unique)

    | Print the value of unique_vals to the console.

    unique_vals
    $name
    [1] Afghanistan Albania Algeria American-Samoa
    [5] Andorra Angola Anguilla Antigua-Barbuda
    [9] Argentina Argentine Australia Austria
    [13] Bahamas Bahrain Bangladesh Barbados
    [17] Belgium Belize Benin Bermuda
    [21] Bhutan Bolivia Botswana Brazil
    [25] British-Virgin-Isles Brunei Bulgaria Burkina
    [29] Burma Burundi Cameroon Canada
    [33] Cape-Verde-Islands Cayman-Islands Central-African-Republic Chad
    [37] Chile China Colombia Comorro-Islands
    [41] Congo Cook-Islands Costa-Rica Cuba
    [45] Cyprus Czechoslovakia Denmark Djibouti
    [49] Dominica Dominican-Republic Ecuador Egypt
    [53] El-Salvador Equatorial-Guinea Ethiopia Faeroes
    [57] Falklands-Malvinas Fiji Finland France
    [61] French-Guiana French-Polynesia Gabon Gambia
    [65] Germany-DDR Germany-FRG Ghana Gibraltar
    [69] Greece Greenland Grenada Guam
    [73] Guatemala Guinea Guinea-Bissau Guyana
    [77] Haiti Honduras Hong-Kong Hungary
    [81] Iceland India Indonesia Iran
    [85] Iraq Ireland Israel Italy
    [89] Ivory-Coast Jamaica Japan Jordan
    [93] Kampuchea Kenya Kiribati Kuwait
    [97] Laos Lebanon Lesotho Liberia
    [101] Libya Liechtenstein Luxembourg Malagasy
    [105] Malawi Malaysia Maldive-Islands Mali
    [109] Malta Marianas Mauritania Mauritius
    [113] Mexico Micronesia Monaco Mongolia
    [117] Montserrat Morocco Mozambique Nauru
    [121] Nepal Netherlands Netherlands-Antilles New-Zealand
    [125] Nicaragua Niger Nigeria Niue
    [129] North-Korea North-Yemen Norway Oman
    [133] Pakistan Panama Papua-New-Guinea Parguay
    [137] Peru Philippines Poland Portugal
    [141] Puerto-Rico Qatar Romania Rwanda
    [145] San-Marino Sao-Tome Saudi-Arabia Senegal
    [149] Seychelles Sierra-Leone Singapore Soloman-Islands
    [153] Somalia South-Africa South-Korea South-Yemen
    [157] Spain Sri-Lanka St-Helena St-Kitts-Nevis
    [161] St-Lucia St-Vincent Sudan Surinam
    [165] Swaziland Sweden Switzerland Syria
    [169] Taiwan Tanzania Thailand Togo
    [173] Tonga Trinidad-Tobago Tunisia Turkey
    [177] Turks-Cocos-Islands Tuvalu UAE Uganda
    [181] UK Uruguay US-Virgin-Isles USA
    [185] USSR Vanuatu Vatican-City Venezuela
    [189] Vietnam Western-Samoa Yugoslavia Zaire
    [193] Zambia Zimbabwe
    194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla Antigua-Barbuda … Zimbabwe
    $landmass
    [1] 5 3 4 6 1 2
    $zone
    [1] 1 3 2 4
    $area
    [1] 648 29 2388 0 1247 2777 7690 84 19 1 143 31 23 113 47 1099 600 8512
    [19] 6 111 274 678 28 474 9976 4 623 1284 757 9561 1139 2 342 51 115 9
    [37] 128 43 22 49 284 1001 21 1222 12 18 337 547 91 268 10 108 249 239
    [55] 132 2176 109 246 36 215 112 93 103 3268 1904 1648 435 70 301 323 11 372
    [73] 98 181 583 236 30 1760 3 587 118 333 1240 1031 1973 1566 447 783 140 41
    [91] 1267 925 121 195 324 212 804 76 463 407 1285 300 313 92 237 26 2150 196
    [109] 72 637 1221 99 288 505 66 2506 63 17 450 185 945 514 57 5 164 781
    [127] 245 178 9363 22402 15 912 256 905 753 391
    $population
    [1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 35 4 24 2 11 1008 5 47
    [23] 31 54 17 61 14 684 157 39 57 118 13 77 12 56 18 84 48 36 22 29 38 49
    [45] 45 231 274 60
    $language
    [1] 10 6 8 1 2 4 3 5 7 9
    $religion
    [1] 2 6 1 0 5 3 4 7
    $bars
    [1] 0 2 3 1 5
    $stripes
    [1] 3 0 2 1 5 9 11 14 4 6 13 7
    $colours
    [1] 5 3 2 8 6 4 7 1
    $red
    [1] 1 0
    $green
    [1] 1 0
    $blue
    [1] 0 1
    $gold
    [1] 1 0
    $white
    [1] 1 0
    $black
    [1] 1 0
    $orange
    [1] 0 1
    $mainhue
    [1] green red blue gold white orange black brown
    Levels: black blue brown gold green orange red white
    $circles
    [1] 0 1 4 2
    $crosses
    [1] 0 1 2
    $saltires
    [1] 0 1
    $quarters
    [1] 0 1 4
    $sunstars
    [1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50
    $crescent
    [1] 0 1
    $triangle
    [1] 0 1
    $icon
    [1] 1 0
    $animate
    [1] 0 1
    $text
    [1] 0 1
    $topleft
    [1] black red green blue white orange gold
    Levels: black blue gold green orange red white
    $botright
    [1] green red white black blue gold orange brown
    Levels: black blue brown gold green orange red white

    | Since unique_vals is a list, you can use what you’ve learned to determine the length of each element of unique_vals (i.e. the number of unique values for each variable). Simplify the result, if possible. Hint: Apply the length() function to each element of unique_vals.

    sapply(unique_vals, length)
    name landmass zone area population language religion bars stripes colours
    194 6 4 136 48 10 8 5 12 8
    red green blue gold white black orange mainhue circles crosses
    2 2 2 2 2 2 2 8 4 3
    saltires quarters sunstars crescent triangle icon animate text topleft botright
    2 3 14 2 2 2 2 2 7 8

    | The fact that the elements of the unique_vals list are all vectors of different length poses a problem for sapply(), since there’s no obvious way of simplifying the result.

    | Use sapply() to apply the unique() function to each column of the flags dataset to see that you get the same unsimplified list that you got from lapply().

    sapply(flags, unique)
    $name
    [1] Afghanistan Albania Algeria American-Samoa
    [5] Andorra Angola Anguilla Antigua-Barbuda
    [9] Argentina Argentine Australia Austria
    [13] Bahamas Bahrain Bangladesh Barbados
    [17] Belgium Belize Benin Bermuda
    [21] Bhutan Bolivia Botswana Brazil
    [25] British-Virgin-Isles Brunei Bulgaria Burkina
    [29] Burma Burundi Cameroon Canada
    [33] Cape-Verde-Islands Cayman-Islands Central-African-Republic Chad
    [37] Chile China Colombia Comorro-Islands
    [41] Congo Cook-Islands Costa-Rica Cuba
    [45] Cyprus Czechoslovakia Denmark Djibouti
    [49] Dominica Dominican-Republic Ecuador Egypt
    [53] El-Salvador Equatorial-Guinea Ethiopia Faeroes
    [57] Falklands-Malvinas Fiji Finland France
    [61] French-Guiana French-Polynesia Gabon Gambia
    [65] Germany-DDR Germany-FRG Ghana Gibraltar
    [69] Greece Greenland Grenada Guam
    [73] Guatemala Guinea Guinea-Bissau Guyana
    [77] Haiti Honduras Hong-Kong Hungary
    [81] Iceland India Indonesia Iran
    [85] Iraq Ireland Israel Italy
    [89] Ivory-Coast Jamaica Japan Jordan
    [93] Kampuchea Kenya Kiribati Kuwait
    [97] Laos Lebanon Lesotho Liberia
    [101] Libya Liechtenstein Luxembourg Malagasy
    [105] Malawi Malaysia Maldive-Islands Mali
    [109] Malta Marianas Mauritania Mauritius
    [113] Mexico Micronesia Monaco Mongolia
    [117] Montserrat Morocco Mozambique Nauru
    [121] Nepal Netherlands Netherlands-Antilles New-Zealand
    [125] Nicaragua Niger Nigeria Niue
    [129] North-Korea North-Yemen Norway Oman
    [133] Pakistan Panama Papua-New-Guinea Parguay
    [137] Peru Philippines Poland Portugal
    [141] Puerto-Rico Qatar Romania Rwanda
    [145] San-Marino Sao-Tome Saudi-Arabia Senegal
    [149] Seychelles Sierra-Leone Singapore Soloman-Islands
    [153] Somalia South-Africa South-Korea South-Yemen
    [157] Spain Sri-Lanka St-Helena St-Kitts-Nevis
    [161] St-Lucia St-Vincent Sudan Surinam
    [165] Swaziland Sweden Switzerland Syria
    [169] Taiwan Tanzania Thailand Togo
    [173] Tonga Trinidad-Tobago Tunisia Turkey
    [177] Turks-Cocos-Islands Tuvalu UAE Uganda
    [181] UK Uruguay US-Virgin-Isles USA
    [185] USSR Vanuatu Vatican-City Venezuela
    [189] Vietnam Western-Samoa Yugoslavia Zaire
    [193] Zambia Zimbabwe
    194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla Antigua-Barbuda … Zimbabwe
    $landmass
    [1] 5 3 4 6 1 2
    $zone
    [1] 1 3 2 4
    $area
    [1] 648 29 2388 0 1247 2777 7690 84 19 1 143 31 23 113 47 1099 600 8512
    [19] 6 111 274 678 28 474 9976 4 623 1284 757 9561 1139 2 342 51 115 9
    [37] 128 43 22 49 284 1001 21 1222 12 18 337 547 91 268 10 108 249 239
    [55] 132 2176 109 246 36 215 112 93 103 3268 1904 1648 435 70 301 323 11 372
    [73] 98 181 583 236 30 1760 3 587 118 333 1240 1031 1973 1566 447 783 140 41
    [91] 1267 925 121 195 324 212 804 76 463 407 1285 300 313 92 237 26 2150 196
    [109] 72 637 1221 99 288 505 66 2506 63 17 450 185 945 514 57 5 164 781
    [127] 245 178 9363 22402 15 912 256 905 753 391
    $population
    [1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 35 4 24 2 11 1008 5 47
    [23] 31 54 17 61 14 684 157 39 57 118 13 77 12 56 18 84 48 36 22 29 38 49
    [45] 45 231 274 60
    $language
    [1] 10 6 8 1 2 4 3 5 7 9
    $religion
    [1] 2 6 1 0 5 3 4 7
    $bars
    [1] 0 2 3 1 5
    $stripes
    [1] 3 0 2 1 5 9 11 14 4 6 13 7
    $colours
    [1] 5 3 2 8 6 4 7 1
    $red
    [1] 1 0
    $green
    [1] 1 0
    $blue
    [1] 0 1
    $gold
    [1] 1 0
    $white
    [1] 1 0
    $black
    [1] 1 0
    $orange
    [1] 0 1
    $mainhue
    [1] green red blue gold white orange black brown
    Levels: black blue brown gold green orange red white
    $circles
    [1] 0 1 4 2
    $crosses
    [1] 0 1 2
    $saltires
    [1] 0 1
    $quarters
    [1] 0 1 4
    $sunstars
    [1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50
    $crescent
    [1] 0 1
    $triangle
    [1] 0 1
    $icon
    [1] 1 0
    $animate
    [1] 0 1
    $text
    [1] 0 1
    $topleft
    [1] black red green blue white orange gold
    Levels: black blue gold green orange red white
    $botright
    [1] green red white black blue gold orange brown
    Levels: black blue brown gold green orange red white

    | Occasionally, you may need to apply a function that is not yet defined, thus requiring you to write your own. Writing functions in R is beyond the scope of this lesson, but let’s look at a quick example of how you might do so in the context of loop functions.

    | Pretend you are interested in only the second item from each element of the unique_vals list that you just created. Since each element of the unique_vals list is a vector and we’re not aware of any built-in function in R that returns the second element of a vector, we will construct our own function.

    | lapply(unique_vals, function(elem) elem[2]) will return a list containing the second item from each element of the unique_vals list. Note that our function takes one argument, elem, which is just a ‘dummy variable’ that takes on the value of each element of unique_vals, in turn.

    lapply(unique_vals, function(elem) elem[2])
    $name
    [1] Albania
    194 Levels: Afghanistan Albania Algeria American-Samoa Andorra Angola Anguilla Antigua-Barbuda … Zimbabwe
    $landmass
    [1] 3
    $zone
    [1] 3
    $area
    [1] 29
    $population
    [1] 3
    $language
    [1] 6
    $religion
    [1] 6
    $bars
    [1] 2
    $stripes
    [1] 0
    $colours
    [1] 3
    $red
    [1] 0
    $green
    [1] 0
    $blue
    [1] 1
    $gold
    [1] 0
    $white
    [1] 0
    $black
    [1] 0
    $orange
    [1] 1
    $mainhue
    [1] red
    Levels: black blue brown gold green orange red white
    $circles
    [1] 1
    $crosses
    [1] 1
    $saltires
    [1] 1
    $quarters
    [1] 1
    $sunstars
    [1] 0
    $crescent
    [1] 1
    $triangle
    [1] 1
    $icon
    [1] 0
    $animate
    [1] 1
    $text
    [1] 1
    $topleft
    [1] red
    Levels: black blue gold green orange red white
    $botright
    [1] red
    Levels: black blue brown gold green orange red white

    | The only difference between previous examples and this one is that we are defining and using our own function right in the call to lapply(). Our function has no name and disappears as soon as lapply() is done using it. So-called ‘anonymous functions’ can be very useful when one of R’s built-in functions isn’t an option.

    | In this lesson, you learned how to use the powerful lapply() and sapply() functions to apply an operation over the elements of a list. In the next lesson, we’ll take a look at some close relatives of lapply() and sapply().

    展开全文
  • R语言基础系列:你知道R中的赋值符号箭头(<-)和等号(=)的区别吗?1数据类型(向量、数组、矩阵、 列表和数据框)2读写数据所需的主要函数、与外部环境交互3数据筛选——提取对象的...

    R语言基础系列:

    Loop functions

    循环是R语言中最强大的函数之一。循环函数背后的思路是,当你想要对一个或一组对象执行循环的时候,使用这种方式可以让你在在很少的空间内执行大量的重复工作,不必向命令行那样做很多输入。

    之前我们学习过while循环和for循环,除了这些之外还有很多更加简洁的循环函数,他们通常名字里都带着apply这个词,主要包含:

    • apply

    • lapply

    • sapply

    • tapply

    • mapply

    lapply是最最主力的函数。他的主要用途是,对列表(list)对象而言,你想在其内部做一个循环,并对列表中的每一个元素运用函数

    sapplylapply的一个变体,简化了lapply的结果

    apply是一个对数组进行行或列运算的函数。如果你想对矩阵或其他高维数组求和,这个函数会非常好用。

    tapplytable apply()的缩写,将函数应用于向量的子集。

    mapplylapply的多变量版本。

    除了这些之外,还有一个函数叫做split(),它不对对象进行任何操作,但是常常和lapplysapply等结合使用,可以将对象分成子块。

    下面先认识一下最常用的lapply函数


    lapply()

    lapply有三个参数:第一个是输入对象,即名叫x的列表;第二个是一个函数名;其余的参数可以传递给参数...lapply函数是这样的:

    lapply
    function (X, FUN, ...) 
    {
      FUN <- match.fun(FUN)
      if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
      .Internal(lapply(X, FUN))
    }

    如果x不是列表,可能会被as.list(X)强制转化成列表,如果没有强制转化就会报错。对于lapply来说,重点要记住的就是他的对象是列表。

    PS. ...常用来给列表中的每个元素做运算的函数传递参数

    举例子:

    创建一个列表x,包含两个元素,a是1-5数列,b是10以内的随机数字组成的向量。使用lapply()函数来计算平均值,生成结果输入新列表y中:

    x <- list(a = 1:5, b = rnorm(10))
    y <- lapply(x, mean)

    新生成的列表y的元素,和原列表x有相同的名字,即a和b,在新列表y中显示出了计算的平均值。

    > x
    $a
    [1] 1 2 3 4 5
    
    $b
     [1] -1.4445463  2.0582121 -0.7964703  0.8434979 -0.5139175  0.7922571
     [7]  1.4374180  0.1635100  0.5597397  0.4719439
    
     > y
    $a
    [1] 3
    
    $b
    [1] 0.3571645

    再举个栗子:

    创建一个序列赋值给x,然后使用随机数发生器runif生成符合均匀分布的随机变量。runif()函数的第一个参数就是你要产生随机数字的个数,这个个数就是从x来的,即1-4。

    使用lapply()函数就会自动将生成的随机数字返回得到一个列表:

    > x <- 1:4
    > lapply(x, runif)
    [[1]]
    [1] 0.002001568
    
    [[2]]
    [1] 0.04863917 0.63242592
    
    [[3]]
    [1] 0.1522060 0.6011618 0.1055978
    
    [[4]]
    [1] 0.89643371 0.33845716 0.05518053 0.91004687

    上面两个例子中,lapply()都只填了xFUN两个参数,...没有设置,都为FUN的默认值。如果要设置的话,填在后面就好:

    例如,上面的runif()函数,默认是在0-1之间取值,如果修改区间,改成5-10,可以通过...传递参数,作如下修改:

    > x <- 1:4
    > lapply(x, runif, min = 5, max = 10)
    [[1]]
    [1] 5.673941
    
    [[2]]
    [1] 8.368024 9.502657
    
    [[3]]
    [1] 8.126944 9.363609 5.335993
    
    [[4]]
    [1] 6.001419 7.023682 8.916751 9.871137

    lapply和相关函数充分利用了所谓的匿名函数,匿名函数没有函数名,所以我们不用给他们分配函数名,可以直接创建函数

    这个例子中,我们创建了一个包含两个矩阵a、b的列表:

    > x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
    > x
    $a
         [,1] [,2]
    [1,]    1    3
    [2,]    2    4
    
    $b
         [,1] [,2]
    [1,]    1    4
    [2,]    2    5
    [3,]    3    6

    我现在想提取每个矩阵的第一列,结果发现没这个函数。

    那么就来创造一个函数提取矩阵的第一列:

    > lapply(x, function(elt) { elt[,1] })
    $a
    [1] 1 2
    
    $b
    [1] 1 2 3

    在我临时创造的函数里面,命名了一个参数叫elt,用于提取第一行,其实你想叫他什么都可以因为出了lapply()之后他就毫无意义了。

    所以使用lapply()的时候就知道了,有函数就用,没有就现写

    sapply()

    sapply()lapply()的变种。它的任务是将lapply()的结果尽量的简化。

    lapply()总是返回一个列表。就算是所有元素长度都一样,没必要是一个列表的时候,他也给你返回列表。

    sapply()就会更加灵活一些,可以直接返回一个包含所有元素的向量,将它简化。

    对比:

    先看一下“一根筋”的lapply()

    > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
    > lapply(x, mean)
    $a
    [1] 2.5
    
    $b
    [1] 0.1983592
    
    $c
    [1] 0.8517163
    
    $d
    [1] 5.145514

    再来看一下sapply()

    > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
    > sapply(x, mean)
              a           b           c           d 
     2.50000000 -0.02155822  1.22162680  4.96242076

    如果数据不适合简化的话,sapply()还是会返回一个列表的。

    apply

    apply函数是另一个循环函数,它可以把一个函数应用在一个数组的各个维度上,应用对象是矩阵的行和列。(矩阵是最常见的数组类型,是二维数组)

    在命令行中apply比for循环相对更好用一些,因为输入较少。

    apply函数长这样:

    > str(apply)
    function (X, MARGIN, FUN, ...)

    第一个参数x是一个数组,是一个有维度的向量;

    第二个参数MARGIN是一个整数向量,指示哪一行或列需要保留;

    第三个重要参数是你要应用到每个行、列的函数命令,这个函数可以是已经存在的函数,也可以是匿名函数。

    第四个参数...是其他你想传递的参数

    举例

    首先创建一个20行10列的矩阵,矩阵内是已生成的正态随机变量:

    > x <- matrix(rnorm(200), 20, 10)

    这个矩阵的第一个维度是20行,第二个维度是10列。

    如果我们要对每列求平均值,那就保留第二个维度,即设置MARGIN为2,这样就得到一个长为10的数字向量,每个数字是每一列的平均值:

    > apply(x, 2, mean)
     [1]  0.65874106 -0.32984283  0.02339061  0.07329736 -0.03052147
     [6] -0.07784829 -0.09898670 -0.02616803 -0.62234169  0.21103926

    同理,如果我想求每一行的数据之和,那么就指定MARGIN为1,只保留所有行(第一维度),消掉所有列。得到长度为20 的向量:

    > apply(x, 1, sum)
     [1]  1.2437406  1.2477988 -0.8286921  5.8799087 -7.1345567
     [6] -6.0516633  1.1861021 -4.6233625  1.4116391  4.0984832
    [11]  0.6733807 -4.2918003  6.3707265  0.5751055 -4.1153285
    [16] -2.4731258 -1.5706162  2.1357355  4.5665875 -2.6848776

    对于计算总和、平均值之类的简单操作而言,有一些经过优化的专用函数可以快速实现这些功能,他们就是~~:

    • rowSums() = apply(x, 1, sum)

    • rowMeans() = apply(x, 1, mean)

    • colSums() = apply(x, 2, sum)

    • colMeans()= apply(x, 2, mean)

    所以上面的对矩阵x的各行求和,可以直接用这些函数:

    > rowSums(x)
     [1]  1.2437406  1.2477988 -0.8286921  5.8799087 -7.1345567
     [6] -6.0516633  1.1861021 -4.6233625  1.4116391  4.0984832
    [11]  0.6733807 -4.2918003  6.3707265  0.5751055 -4.1153285
    [16] -2.4731258 -1.5706162  2.1357355  4.5665875 -2.6848776

    apply()函数也可以应用在复杂一点的命令上。

    还是以刚才的随机数矩阵为例,我们想计算每一行的25%和75%分位数,使用quantile()函数,...用来传递其他参数,结果如下(每一行都会有对应的两个返回值):

    > apply(x, 1, quantile, probs = c(0.25, 0.75))
              [,1]       [,2]       [,3]      [,4]      [,5]
    25% -0.4279914 -0.7373156 -0.9529599 0.2055977 -1.212583
    75%  0.7124124  0.6271725  0.4206316 1.1519559 -0.164044
              [,6]       [,7]        [,8]       [,9]     [,10]
    25% -1.5083933 -0.4973032 -1.40098143 -0.1734262 0.1533103
    75%  0.2944026  0.2808402 -0.03580914  0.7878205 1.1799618
             [,11]      [,12]       [,13]      [,14]     [,15]
    25% -0.7556971 -1.3682307 -0.09161202 -0.0714848 -1.170791
    75%  0.6127445  0.5407379  1.60144594  0.3402383  0.186875
             [,16]      [,17]      [,18]      [,19]     [,20]
    25% -0.4673410 -0.5190178 -0.4813278 -0.3519997 -1.083816
    75%  0.2163925  0.4672890  0.4078020  1.1469154 -0.117494

    除了二维矩阵之外,apply也可以用于多维矩阵,即数组

    如,我们先创建一个正态随机变量数组,是一个三维矩阵,然后求每个二维矩阵的和,

    设置MARGINc(1, 2)就可以消除掉第三个维度:

    > a <- array(rnorm(2 * 2 * 10), c(2, 2, 10))
    > apply(a, c(1, 2), mean)
              [,1]       [,2]
    [1,] 0.4376722  0.1558857
    [2,] 0.5024186 -0.2940285

    同理,如果使用rowMeans

    > rowMeans(a, dims = 2)
              [,1]       [,2]
    [1,] 0.4376722  0.1558857
    [2,] 0.5024186 -0.2940285

    mapply()

    mapply()lapply()sapply()的多变量版本

    也就是说

    lapply()sapply()等等都是将参数中的函数应用到某个单一的对象上,而mapply()的思想是把一个函数并行应用到一组不同的对象

    > str(mapply)
    function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

    mapply()的第一个参数是你要应用的函数,

    第二个...用于传递参数

    第三个参数是MoreArgs当你需要给函数传递更多参数是才会用到


    tapply

    tapply是一个很有用的函数,它可以将函数应用于某向量的子集。

    当我们想计算该向量某部分的概要统计量时,可以使用tapply

    对于该数值向量中的每一组而言,我们都可以计算一个统计量,这时,我们需要另一个变量或者对象,来识别该数值向量中各元素的分组

    > str(tapply)
    function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

    第一个参数是数值或者其他类型的向量

    第二个参数INDEX是另一个向量,长度和第一个向量相同,用来表明第一个向量中的各个元素分别属于哪一组

    第三个参数FUN是你想应用的函数。

    举例

    x是一个数组向量,其中包括10个正态随机数、10个均匀随机数、10个均值是1的正态随机数;这些数字可以分成三组,用函数gl()创建因子变量,这个因子变量有三个水平,每个水平重复10次:

    > x <- c(rnorm(10), runif(10), rnorm(10, 1))
    > f <- gl(3, 10) 
    > x
     [1] -1.22192295  1.23396169 -0.74551533  0.10967076 -1.89038874
     [6]  1.51580225  0.44769556  0.64438076  0.33981352  1.14463578
    [11]  0.19257671  0.66332397  0.98498225  0.64714562  0.63170090
    [16]  0.20529137  0.05715295  0.85430414  0.19152730  0.72570007
    [21] -0.68530428  1.19698816  0.79573025  1.93891480  1.50044997
    [26]  2.72678274  0.44608262  0.96684451  1.59033947  0.19885301
    > f
     [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
    Levels: 1 2 3

    应用tapply()函数,求每组数据的平均值

    > tapply(x, f, mean, simplify = FALSE)
    $`1`
    [1] 0.1578133
    
    $`2`
    [1] 0.5153705
    
    $`3`
    [1] 1.067568

    如果设置为不简化,那么结果将返回一个列表。

    返回观测值的范围,range()得到每组中的最大值和最小值:

    > tapply(x, f, range)
    $`1`
    [1] -1.890389  1.515802
    
    $`2`
    [1] 0.05715295 0.98498225
    
    $`3`
    [1] -0.6853043  2.7267827

    参考资料:

    1. 视频课程 R Programming by Johns Hopkins University:https://www.coursera.org/learn/r-programming/home/welcome

    2. 讲义 Programming for Data Science :https://bookdown.org/rdpeng/rprogdatascience/R

    欢迎大家跟我一起上车~~~~请关注

    猜你喜欢

    10000+:肠道细菌 人体上的生命 宝宝与猫狗 梅毒狂想曲 提DNA发Nature 实验分析谁对结果影响大  Cell微生物专刊

    系列教程:微生物组入门 Biostar 微生物组  宏基因组

    专业技能:生信宝典 学术图表 高分文章 不可或缺的人

    一文读懂:宏基因组 寄生虫益处 进化树

    必备技能:提问 搜索  Endnote

    文献阅读 热心肠 SemanticScholar Geenmedical

    扩增子分析:图表解读 分析流程 统计绘图

    16S功能预测   PICRUSt  FAPROTAX  Bugbase Tax4Fun

    在线工具:16S预测培养基 生信绘图

    科研经验:云笔记  云协作 公众号

    编程模板 Shell  R Perl

    生物科普  生命大跃进  细胞暗战 人体奥秘  

    写在后面

    为鼓励读者交流、快速解决科研困难,我们建立了“宏基因组”专业讨论群,目前己有国内外150+ PI,1500+ 一线科研人员加入。参与讨论,获得专业解答,欢迎分享此文至朋友圈,并扫码加主编好友带你入群,务必备注“姓名-单位-研究方向-职称/年级”。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍末解决群内讨论,问题不私聊,帮助同行。

    学习16S扩增子、宏基因组科研思路和分析实战,关注“宏基因组”

    点击阅读原文,跳转最新文章目录阅读

    展开全文
  • 批量处理函数有很重要的apply族函数:lapply sapply apply tapply mapply。这些函数底层通过C实现,效率比手工遍历高效。apply族函数是高效能计算的运算向量化(Vectorization)实现方法之一,比起传统的for,while常常...
  • R语言提供了批量处理函数,可以循环遍历某个集合内的所有或部分元素,以简化操作。 这些函数底层是通过C来实现的,所以效率也比手工遍历来的高效。 批量处理函数有很重要的apply族函数:lapply sapply apply ...
  • 类似于lapply函数,但输入为列表,返回值为向量 sapply(X, FUN, ..., ) X:列表、矩阵、数据框 FUN:自定义的调用函数 sapply(b, sum) # 求列表中各元素的和 # 输出 x y 55 78
  • lapply()(代表list apply)与矩阵的apply()函数的用法类似, 对列表的每个组件执行给定的函数,并返回另一个列表。 > x list(a = 1: 10, beta = exp( -3: 3), logic = c( TRUE, FALSE, FALSE, TRUE)) > ...
  • head(iris) #数据分组 iris.split (iris,as.factor(iris$Species)) #数据分组计算平均值 iris.apply <- lapply(iris.split,function(x)colMeans(x[-5])) # 组合结果 iris.combine (rbind,iris.apply) iris.combi
  • 对数据集进行操作的时候,我们经常需要写循环操作,比如对于矩阵的每一列计算它的平均值等等,而R语言的向量化操作可以节省非常多循环的代码。所以说在R语言里面,当你要写循环的时候,一定要对自己先说三遍,不写...
  • lapply()(代表list apply)与矩阵的apply()函数的用法类似, 对列表的每个组件执行给定的函数,并返回另一个列表。 > x (a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE)) > lapply(x, mean) ...
  • 需要注意的是,当第二个参数INDEX不是因子时,函数 tapply() 同样有效,因为必要时 R 会用 as.factor()把参数强制转换成因子。 tapply()的功能则又有不同,它是专门用来处理分组数据的,其参数要比sapply多一个。...

空空如也

空空如也

1 2 3 4 5 ... 8
收藏数 141
精华内容 56
关键字:

r语言lapply