精华内容
下载资源
问答
  • <p>I have a library of widgets (js and other files) which instead of writing <pre><code>{% include 'path' %} </code></pre> <p>all the time I have written an extension to just include them with a ...
  • or frameworks with no huge developers base, or young cool frameworks which need years to be stable.</strong></p> <p>I need to know well known frameworks which are used to create applications for ...
  • <p>For example, i have a server with 4 ip adresses and i want to use the 2nd one for IMAP. I look for something, like the "user-interface" in cURL, which allows you to use one of your server IPs. ...
  • //if it is /title/ then the key is title and its value is an input elements value with id as title so title=getElementById('title') //if it is mode,[1] then mode is the key and 1 is its direct value/...
  • which seems like exactly what I'd need in PHP, but I don't know how to convert the code, and I've never been great with byte manipulation. <p>What would the equivalent functions for <code>...
  • <p><strong>UPDATE</strong> Since the question was originally asked, Clojure now has <a href="https://github.com/clojure/core.async" rel="noreferrer">core.async</a> which provides all this ...
  • <p>To help debug GO programs, I want to write two generic functions that will be called on entry and exit, which will print the values of input and output parameters respectively: <pre><code>...
  • And also that the variable image is being created from an Image from the file stream, which I understand to be, like... the $_FILES array in php? </p> <p>I dunno, I don't really care about making ...
  • With these ZF2 tests, I'm specifically telling it which route to use, which doesn't necessarily mean that a real request will be routed correctly. <p>How do I test that my routing is working ...
  • 最近使用kotlin语言开发了新的项目,kotlin的一些特性和大量的语法糖相当好用,相比java,开发效率高了不少。但Kotlin大量的语法糖也带来了一些问题:学习成本高,语法糖使用场景的困惑。 比如,当我第一次看到...

    前言

    最近使用kotlin语言开发了新的项目,kotlin的一些特性和大量的语法糖相当好用,相比于java,开发效率高了不少。但Kotlin大量的语法糖也带来了一些问题:学习成本高,语法糖使用场景的困惑。
    比如,当我第一次看到作用域函数就产生了这样的疑问:what is this?Which function to use?

    于是我研究了一下什么是作用域函数,以及各个函数的区别和使用场景。

    介绍

    官方介绍:The Kotlin standard library contains several functions whose sole purpose is to execute a block of code within the context of an object. When you call such a function on an object with a lambda expression provided, it forms a temporary scope. In this scope, you can access the object without its name. Such functions are called scope functions. There are five of them: let, run, with, apply, and also.

    翻译理解:作用域函数的目的是在对象的上下文中执行代码块,它为调用者对象提供了一个临时内部作用域,在这个作用域中可以不显式的访问该对象。这样的作用域函数有5个:let,run,with,apply,和also。

    函数

    run

    run函数是最能体现作用域的用途的函数,如下使用示例:
    在mian函数中使用run函数创建了一个单独的作用域,在该作用域中重新定义了一个word变量,两次打印使用的是各自作用域中的word变量,互不影响;并且,run函数返回了lambda结果。

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    fun main(args: Array<String>) {
        var word = "我是小明"
        val returnValue = run {
            var word = "我是小红"
            println("run:$word")
            word
        }
        println("main:$word")
        println("returnValue:$returnValue")
    }
    

    运行结果:

    1
    2
    3
    
    run:我是小红
    main:我是小明
    returnValue:我是小红
    

    with

    with函数可以将任意对象作为上下文对象this传入,并且可以隐式的访问该对象,返回lambda结果。如下使用示例:在mian函数中使用with函数创建了一个临时作用域,在该作用域中可以重新定义person变量,两个person变量互无影响;并且可以使用this访问上下文对象,隐式修改person的age变量值。

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    
    data class Person (
        var name: String,
        var age: Int = 0
    )
    fun main(args: Array<String>) {
        var person = Person("小明",25)
        val returnValue = with(person) {
            println("with:this=$this")
            var person = Person("小红",23)
            println("with:person=$person")
            age = 26
            person
        }
        println("main:person=$person")
        println("main:returnValue=$returnValue")
    }
    

    运行结果:

    1
    2
    3
    4
    
    with:this=Person(name=小明, age=25)
    with:person=Person(name=小红, age=23)
    main:person=Person(name=小明, age=26)
    main:returnValue=Person(name=小红, age=23)
    

    T.run

    T.run函数可以使用T作为作用域的上下文对象this,在作用域中可以隐式访问T对象,并返回lambda结果。

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    data class Person (
        var name: String,
        var age: Int = 0
    )
    fun main(args: Array<String>) {
        var person: Person? = null
        // T?.run当T为null时不调用run函数
        person?.run {
            println("person?.run:person=$person")
        }
        person = Person("小明",25)
        val returnValue = person.run {
            println("person.run:this=$this")
            var person = Person("小红",23)
            println("person.run:person=$person")
            age = 26
            person
        }
        println("main:person=$person")
        println("main:returnValue=$returnValue")
    }
    

    运行结果:

    1
    2
    3
    4
    
    person.run:this=Person(name=小明, age=25)
    person.run:person=Person(name=小红, age=23)
    main:person=Person(name=小明, age=26)
    main:returnValue=Person(name=小红, age=23)
    

    T.let

    T.let函数与T.run函数唯一的区别是:T作为作用域上下文对象的名称不同,前者是it,后者是this,所以在T.let函数中必须显式使用it访问T对象。

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    data class Person (
        var name: String,
        var age: Int = 0
    )
    fun main(args: Array<String>) {
        var person: Person? = null
        person?.let {
            println("person?.let:person=$person")
        }
        person = Person("小明",25)
        val returnValue = person.let {
            println("person.let:it=$it")
            var person = Person("小红",23)
            println("person.let:person=$person")
            it.age = 26
            person
        }
        println("main:person=$person")
        println("main:returnValue=$returnValue")
    }
    

    运行结果:

    1
    2
    3
    4
    
    person.let:it=Person(name=小明, age=25)
    person.let:person=Person(name=小红, age=23)
    main:person=Person(name=小明, age=26)
    main:returnValue=Person(name=小红, age=23)
    

    T.also

    如下使用示例,T.also函数和T.let函数的唯一区别是:前者返回值是this(即T),后者返回值是lambda结果。

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    data class Person (
        var name: String,
        var age: Int = 0
    )
    fun main(args: Array<String>) {
        var person: Person? = null
        person?.also {
            println("person?.also:person=$person")
        }
        person = Person("小明",25)
        val returnValue = person.also {
            println("person.also:it=$it")
            var person = Person("小红",23)
            println("person.also:person=$person")
            it.age = 26
            person
        }
        println("main:person=$person")
        println("main:returnValue=$returnValue")
    }
    

    运行结果:

    1
    2
    3
    4
    
    person.also:it=Person(name=小明, age=25)
    person.also:person=Person(name=小红, age=23)
    main:person=Person(name=小明, age=26)
    main:returnValue=Person(name=小明, age=26)
    

    T.apply

    如下使用示例,T.apply函数和T.also函数的唯一的区别是:T作为作用域上下文对象的名称不同,前者是this,后者是it,所以在T.apply函数中可以隐式访问T对象。

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    data class Person (
        var name: String,
        var age: Int = 0
    )
    fun main(args: Array<String>) {
        var person: Person? = null
        person?.apply {
            println("person?.apply:person=$person")
        }
        person = Person("小明",25)
        val returnValue = person.apply {
            println("person.apply:this=$this")
            var person = Person("小红",23)
            println("person.apply:person=$person")
            age = 26
            person
        }
        println("main:person=$person")
        println("main:returnValue=$returnValue")
    }
    

    运行结果:

    1
    2
    3
    4
    
    person.apply:this=Person(name=小明, age=25)
    person.apply:person=Person(name=小红, age=23)
    main:person=Person(name=小明, age=26)
    main:returnValue=Person(name=小明, age=26)
    

    特殊的作用域函数

    T.takeIf

    以it作为在作用域上下文对象T的名称,若lambda结果为true,返回this;否则,返回null。

    函数源码

    1
    2
    3
    4
    5
    6
    7
    8
    
    @kotlin.internal.InlineOnly
    @SinceKotlin("1.1")
    public inline fun <T> T.takeIf(predicate: (T) -> Boolean): T? {
        contract {
            callsInPlace(predicate, InvocationKind.EXACTLY_ONCE)
        }
        return if (predicate(this)) this else null
    }
    

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    fun main(args: Array<String>) {
        var count = 0
        while (count <= 10) {
            val returnValue = count.takeIf {
                count++ % 2 == 0
            }
            println(returnValue)
        }
    }
    

    运行结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    0
    null
    2
    null
    4
    null
    6
    null
    8
    null
    10
    

    T.takeUnless

    以it作为在作用域上下文对象T的名称,若lambda结果为true,返回null;否则,返回this。与taskIf的实现相比,其实就是对lambda结果进行了取反操作。

    函数源码

    1
    2
    3
    4
    5
    6
    7
    8
    
    @kotlin.internal.InlineOnly
    @SinceKotlin("1.1")
    public inline fun <T> T.takeUnless(predicate: (T) -> Boolean): T? {
        contract {
            callsInPlace(predicate, InvocationKind.EXACTLY_ONCE)
        }
        return if (!predicate(this)) this else null
    }
    

    使用示例

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    fun main(args: Array<String>) {
        var count = 0
        while (count <= 10) {
            val returnValue = count.takeUnless {
                count++ % 2 == 0
            }
            println(returnValue)
        }
    }
    

    运行结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    null
    1
    null
    3
    null
    5
    null
    7
    null
    9
    null
    

    repeat

    以当前执行的次数it作为在作用域上下文对象T的名称,执行给定lambda函数指定的次数。从函数源码和使用示例可以看出,执行次数角标是从0开始。

    函数源码

    1
    2
    3
    4
    5
    6
    7
    
    @kotlin.internal.InlineOnly
    public inline fun repeat(times: Int, action: (Int) -> Unit) {
        contract { callsInPlace(action) }
        for (index in 0 until times) {
            action(index)
        }
    }
    

    使用示例

    1
    2
    3
    4
    5
    
    fun main(args: Array<String>) {
        repeat(5) {
            print("$it,")
        }
    }
    

    运行结果:

    1
    
    0,1,2,3,4,
    

    总结

    从上面的函数介绍和实际使用可以看出let,run,with,apply,和also,这些作用域函数的功能之间起着相互补充的作用,单独看某两个函数可能差别不大,但它们结合起来所实现的功能涵盖了绝大部分的使用场景。

    总结一下,用于快速判断操作符使用场景,主要使用这几个因素辨别:

    1. 调用者

      • 正常函数:有run,with函数。主要作用是:开辟一个作用域,不受作用域之外上下文影响,with还可以方便地在作用域中访问上下文对象。
      • 扩展函数:可以使用T?.fun()在调用之前做空检查,如:null?.run { println("Kotlin") },作用域内容不会被执行。
    2. 上下文对象

      • this:方便在作用域中直接访问this
      • it:可以更清楚的区分作用域和非作用域中的成员
    3. 返回值

      • 上下文对象this:可以作为链式调用。
      • lambda表达式结果:返回表达式结果,可以将结果结合其他作用域函数,使用更灵活。

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        
        // 示例:使用apply函数进行链式调用
        class Person {
            var name = ""
            var age = 0
        }
        fun main(args: Array<String>) {
            val person = Person().apply { name = "小明" }.apply { age = 25 }
            println("${person.name},${person.age}")
        }
        // 运行结果:小明,25
        

    下面对作用域函数简要区分,可以更方便快速的辨别各函数的作用和使用场景。

    作用域函数简要区分:

    • run:返回lambda结果
    • with:this上下文,返回lambda结果
    • T.run:支持空检查,this上下文,返回lambda结果
    • T.let:支持空检查,it上下文,返回lambda结果
    • T.also:支持空检查,it上下文,返回this(即T,it)
    • T.apply:支持空检查,this上下文,返回this(即T,this)

    特殊的作用域函数区分:

    • T.takeIf:支持空检查,it上下文,函数体返回值类型Boolean,函数体返回true,函数返回this;否则返回null
    • T.takeUnless:支持空检查,it上下文,函数体返回值类型Boolean,函数体返回true,函数返回null;否则返回this
    • repeat:执行给定函数 action 指定的次数 times (角标:0-times)

    参考资料

    官方文档:https://www.kotlincn.net/docs/reference/scope-functions.html
    medium Elye:https://medium.com/@elye.project/mastering-kotlin-standard-functions-run-with-let-also-and-apply-9cd334b0ef84
    CSDN george_zyf:https://blog.csdn.net/android_zyf/article/details/82496983

    展开全文
  • 表示什么都不执行,这样点击时就没有任何反应,相当于去掉 a 标签的默认行为。 select - option 绑定 Vue 实例 select 中 通过 v-model 绑定当前的选项,option 中使用 v-for 遍历显示所有选项。 <label> ...
  • Object is used by Prototype as a namespace; that is, it just keeps a few new methods together, which are intended for namespaced access (i.e....上面说的namespace个人理解就相当于C#中的静态类,提供

    Object is used by Prototype as a namespace; that is, it just keeps a few new methods together, which are intended for namespaced access (i.e. starting with “Object.”).

    上面说的namespace个人理解就相当于C#中的静态类,提供工具函数的意思,和C#中的namespace应该不是一个概念。因为C#中的命名空间后面不会直接跟方法,肯定是接一个对象然后在调用方法,不过和C++中的命名空间倒是有些类似

     

    clone extend inspect isArray isElement isFunction isHash isNumber isString isUndefined keys toHTML toJSON toQueryString values

     

     

    inspect方法:

    toJSON方法:

    注意这里面有一个递归调用的过程var value = toJSON(object[property]);最后返回一个JSON格式的字符串表示,下面看一下示例:

    toQueryString方法:

    用object创建一个Hash对象,然后调用Hash对象的toQueryString方法,并返回调用结果,讲到Hash对象时在详细说toQueryString方法。

    一般这个方法在调用Ajax.Request时会经常用到,下面看一下示例:

    toHTML方法:

    如果传进去的object参数为undefined或者null将返回空字符串

    alert(Object.toHTML())

    alert(Object.toHTML(null))

    如果object定义了toHTML方法,则调用object的toHTML方法,否者调用String的静态方法interpret,其实就是判断一下object是否为null,为null则返回'',否则调用object的toString方法,并返回调用结果

     

    下面看一下示例:

     

    keys和values方法:

    看一下示例就明白了,就不多说了:

     

    clone方法:

    '{}'就是空对象的直接量,相当于new Object()

    isXXX方法不说了吧。

     

    展开全文
  • 上面说的namespace个人理解就相当于C#中的静态类,提供工具函数的意思,和C#中的namespace应该不是一个概念。因为C#中的命名空间后面不会直接跟方法,肯定是接一个对象然后在调用方法,不过和C++中的命名空间倒是...
  • Bert模型结构——源码

    2020-06-18 10:05:02
    昨天同事突然问我"BERT论文中的图,虽然画了多个transformer block,但一层是不是相当于只有一个??",我当时也有点懵逼,后来看了源码弄清楚了,今天就写一下 Bert整个代码 class BertModel(BertPreTrainedModel): ...

    在这里插入图片描述
    昨天同事突然问我"BERT论文中的图,虽然画了多个transformer block,但一层是不是相当于只有一个??",我当时也有点懵逼,后来看了源码弄清楚了,今天就写一下

    Bert整个代码

    class BertModel(BertPreTrainedModel):
        """
    
        The model can behave as an encoder (with only self-attention) as well
        as a decoder, in which case a layer of cross-attention is added between
        the self-attention layers, following the architecture described in `Attention is all you need`_ by Ashish Vaswani,
        Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.
    
        To behave as an decoder the model needs to be initialized with the
        :obj:`is_decoder` argument of the configuration set to :obj:`True`; an
        :obj:`encoder_hidden_states` is expected as an input to the forward pass.
    
        .. _`Attention is all you need`:
            https://arxiv.org/abs/1706.03762
    
        """
    
        def __init__(self, config):
            super().__init__(config)
            self.config = config
    
            self.embeddings = BertEmbeddings(config)
            self.encoder = BertEncoder(config)
            self.pooler = BertPooler(config)
    
            self.init_weights()
    
        def get_input_embeddings(self):
            return self.embeddings.word_embeddings
    
        def set_input_embeddings(self, value):
            self.embeddings.word_embeddings = value
    
        def _prune_heads(self, heads_to_prune):
            """ Prunes heads of the model.
                heads_to_prune: dict of {layer_num: list of heads to prune in this layer}
                See base class PreTrainedModel
            """
            for layer, heads in heads_to_prune.items():
                self.encoder.layer[layer].attention.prune_heads(heads)
    
        @add_start_docstrings_to_callable(BERT_INPUTS_DOCSTRING)
        def forward(
            self,
            input_ids=None,
            attention_mask=None,
            token_type_ids=None,
            position_ids=None,
            head_mask=None,
            inputs_embeds=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
        ):
            r"""
        Return:
            :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
            last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
                Sequence of hidden-states at the output of the last layer of the model.
            pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
                Last layer hidden-state of the first token of the sequence (classification token)
                further processed by a Linear layer and a Tanh activation function. The Linear
                layer weights are trained from the next sentence prediction (classification)
                objective during pre-training.
    
                This output is usually *not* a good summary
                of the semantic content of the input, you're often better with averaging or pooling
                the sequence of hidden-states for the whole input sequence.
            hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
                Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
                of shape :obj:`(batch_size, sequence_length, hidden_size)`.
    
                Hidden-states of the model at the output of each layer plus the initial embedding outputs.
            attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
                Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
                :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
    
                Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
                heads.
    
        Examples::
    
            from transformers import BertModel, BertTokenizer
            import torch
    
            tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
            model = BertModel.from_pretrained('bert-base-uncased')
    
            input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
            outputs = model(input_ids)
    
            last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple
    
            """
    
            if input_ids is not None and inputs_embeds is not None:
                raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
            elif input_ids is not None:
                input_shape = input_ids.size()
            elif inputs_embeds is not None:
                input_shape = inputs_embeds.size()[:-1]
            else:
                raise ValueError("You have to specify either input_ids or inputs_embeds")
    
            device = input_ids.device if input_ids is not None else inputs_embeds.device
    
            if attention_mask is None:
                attention_mask = torch.ones(input_shape, device=device)
            if token_type_ids is None:
                token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
    
            # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
            # ourselves in which case we just need to make it broadcastable to all heads.
            if attention_mask.dim() == 3:
                extended_attention_mask = attention_mask[:, None, :, :]
            elif attention_mask.dim() == 2:
                # Provided a padding mask of dimensions [batch_size, seq_length]
                # - if the model is a decoder, apply a causal mask in addition to the padding mask
                # - if the model is an encoder, make the mask broadcastable to [batch_size, num_heads, seq_length, seq_length]
                if self.config.is_decoder:
                    batch_size, seq_length = input_shape
                    seq_ids = torch.arange(seq_length, device=device)
                    causal_mask = seq_ids[None, None, :].repeat(batch_size, seq_length, 1) <= seq_ids[None, :, None]
                    causal_mask = causal_mask.to(
                        attention_mask.dtype
                    )  # causal and attention masks must have same type with pytorch version < 1.3
                    extended_attention_mask = causal_mask[:, None, :, :] * attention_mask[:, None, None, :]
                else:
                    extended_attention_mask = attention_mask[:, None, None, :]
            else:
                raise ValueError(
                    "Wrong shape for input_ids (shape {}) or attention_mask (shape {})".format(
                        input_shape, attention_mask.shape
                    )
                )
    
            # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
            # masked positions, this operation will create a tensor which is 0.0 for
            # positions we want to attend and -10000.0 for masked positions.
            # Since we are adding it to the raw scores before the softmax, this is
            # effectively the same as removing these entirely.
            extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)  # fp16 compatibility
            extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
    
            # If a 2D ou 3D attention mask is provided for the cross-attention
            # we need to make broadcastabe to [batch_size, num_heads, seq_length, seq_length]
            if self.config.is_decoder and encoder_hidden_states is not None:
                encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()
                encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
                if encoder_attention_mask is None:
                    encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)
    
                if encoder_attention_mask.dim() == 3:
                    encoder_extended_attention_mask = encoder_attention_mask[:, None, :, :]
                elif encoder_attention_mask.dim() == 2:
                    encoder_extended_attention_mask = encoder_attention_mask[:, None, None, :]
                else:
                    raise ValueError(
                        "Wrong shape for encoder_hidden_shape (shape {}) or encoder_attention_mask (shape {})".format(
                            encoder_hidden_shape, encoder_attention_mask.shape
                        )
                    )
    
                encoder_extended_attention_mask = encoder_extended_attention_mask.to(
                    dtype=next(self.parameters()).dtype
                )  # fp16 compatibility
                encoder_extended_attention_mask = (1.0 - encoder_extended_attention_mask) * -10000.0
            else:
                encoder_extended_attention_mask = None
    
            # Prepare head mask if needed
            # 1.0 in head_mask indicate we keep the head
            # attention_probs has shape bsz x n_heads x N x N
            # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
            # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
            if head_mask is not None:
                if head_mask.dim() == 1:
                    head_mask = head_mask.unsqueeze(0).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
                    head_mask = head_mask.expand(self.config.num_hidden_layers, -1, -1, -1, -1)
                elif head_mask.dim() == 2:
                    head_mask = (
                        head_mask.unsqueeze(1).unsqueeze(-1).unsqueeze(-1)
                    )  # We can specify head_mask for each layer
                head_mask = head_mask.to(
                    dtype=next(self.parameters()).dtype
                )  # switch to fload if need + fp16 compatibility
            else:
                head_mask = [None] * self.config.num_hidden_layers
    
            embedding_output = self.embeddings(
                input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
            )
            encoder_outputs = self.encoder(
                embedding_output,
                attention_mask=extended_attention_mask,
                head_mask=head_mask,
                encoder_hidden_states=encoder_hidden_states,
                encoder_attention_mask=encoder_extended_attention_mask,
            )
            sequence_output = encoder_outputs[0]
            pooled_output = self.pooler(sequence_output)
    
            outputs = (sequence_output, pooled_output,) + encoder_outputs[
                1:
            ]  # add hidden_states and attentions if they are here
            return outputs  # sequence_output, pooled_output, (hidden_states), (attentions).
    

    上面是整个bert的代码我们看看其中的一个部分
    在这里插入图片描述
    可以看到这里只有一个encoder,也就是说 整个 bert只有一个encoder

    encoder部分代码

    class BertEncoder(nn.Module):
        def __init__(self, config):
            super().__init__()
            self.output_attentions = config.output_attentions
            self.output_hidden_states = config.output_hidden_states
            self.layer = nn.ModuleList([BertLayer(config) for _ in range(config.num_hidden_layers)])
    
        def forward(
            self,
            hidden_states,
            attention_mask=None,
            head_mask=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
        ):
            all_hidden_states = ()
            all_attentions = ()
            for i, layer_module in enumerate(self.layer):
                if self.output_hidden_states:
                    all_hidden_states = all_hidden_states + (hidden_states,)
    
                layer_outputs = layer_module(
                    hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask
                )
                hidden_states = layer_outputs[0]
    
                if self.output_attentions:
                    all_attentions = all_attentions + (layer_outputs[1],)
    
            # Add last layer
            if self.output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)
    
            outputs = (hidden_states,)
            if self.output_hidden_states:
                outputs = outputs + (all_hidden_states,)
            if self.output_attentions:
                outputs = outputs + (all_attentions,)
            return outputs  # last-layer hidden state, (all hidden states), (all attentions)
    

    很明显,可以看到下图这里有对self.layer做一个for循环,也就是说这里肯定有多lay_module的叠加,那我们先看self.layer是什么
    在这里插入图片描述

    self.layer

    self.layer是多个BertLayer,按照普通的conf.num_hidden_layers,是有23个BertLayer
    在这里插入图片描述

    BertLayer

    class BertLayer(nn.Module):
        def __init__(self, config):
            super().__init__()
            self.attention = BertAttention(config)
            self.is_decoder = config.is_decoder
            if self.is_decoder:
                self.crossattention = BertAttention(config)
            self.intermediate = BertIntermediate(config)
            self.output = BertOutput(config)
    
        def forward(
            self,
            hidden_states,
            attention_mask=None,
            head_mask=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
        ):
            self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
            attention_output = self_attention_outputs[0]
            outputs = self_attention_outputs[1:]  # add self attentions if we output attention weights
    
            if self.is_decoder and encoder_hidden_states is not None:
                cross_attention_outputs = self.crossattention(
                    attention_output, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask
                )
                attention_output = cross_attention_outputs[0]
                outputs = outputs + cross_attention_outputs[1:]  # add cross attentions if we output attention weights
    
            intermediate_output = self.intermediate(attention_output)
            layer_output = self.output(intermediate_output, attention_output)
            outputs = (layer_output,) + outputs
            return outputs
    

    这里就是个transformer结构了

    总结

    可以看出Bert包含了一个encoder,然后一个encoder结构里有12个transformer.

    展开全文
  • and I notice that the results of many previous state-of-the-arts (which are obtained by your reproduction, I suppose) are relatively high in the paper than what I used to know, which makes me wonder ...
  • or-less systematic analysis of its most basic or fundamental concepts, its conceptual unity and its natural ordering or hierarchy of concepts, which may help to connect it with the rest of human ...
  • 相当于关闭了相对路径而直接使用baseURL。 是否在搜索过程中也可以添加进去绝对路径呢? <h3>Build Environment 构建环境 <ul><li>OS: [macOS]</li><li>Theme version[0.2.0]</li><li>Hugo version [v0.69.0/...
  • 2009 达内Unix学习笔记

    2010-02-10 19:45:32
    cd ~/s0807 直接进入主目录下的某目录(“cd ~"相当于主目录的路径的简写)。 ls 显示当前目录的所有目录和文件。 用法 ls [-aAbcCdeEfFghHilLmnopqrRstux1@] [file...] ls /etc/ 显示某目录下的所有文件和目录,如...
  • 相当于std.out.printf(fmt, ...args) sprintf(fmt, ...args) 相当于libc的sprintf(). in out err 包装libc文件的stdin, stdout, stderr. SEEK_SET SEEK_CUR SEEK_END seek()的常量 global 引用全局对象。 gc() 手动...
  • 所以该快应用 URI 相当于一个中转的资源标识符,而非传统意义上的直接用于与服务端交互的 HTTPS URL。 从资源定位的角度看和 PWA 的区别 不同于传统页面或者 PWA,它们的资源定位符通常对应着服务端响应...
  • ENV 的值类型支持 任意类型 、任意长度,相当于直接 memcpy 变量至 flash ;(V4.0 之前只支持存储字符串) ENV 操作效率比以前的模式高,充分利用剩余空闲区域,擦除次数及操作时间显著降低; 原生支持 磨损平衡、...
  • In practice, this requirement ensures that onFulfilled and onRejected execute asynchronously, after the event loop turn in which then is called, and with a fresh stack. This can be implemented with ...
  • 英语四级整理笔记.doc

    2020-03-27 23:40:09
    精确地发音有助你正确的表达你的思想不被误解,例如下面的单词如果你发音不准的话,很有可能造成误会: bad [i:] bed beach bitch * sheet shit fool full 这是英语播音员常使用的方法。首先要保证的是,你的...
  • 所以我的解决办法是采用了caozhy同学的建议,将App_Code下所有共享的代码剪切出来创建了一个新的Library,然后在WebApplication里引用了这个Library,由于这样,App_Code下就不存在代码了,所以我的问题也相当于变相...
  • 新版Android开发教程.rar

    千次下载 热门讨论 2010-12-14 15:49:11
    Compiler for Java, both of which are not supported for Android development----------------------------------- Android 编程基础 9 什么是 Android? Android? Android? Android? Android 是一个专门针对移动...
  • 代码语法错误分析工具pclint8.0

    热门讨论 2010-06-29 07:00:09
    Windows平台下也有好多人都喜欢用SourceInsight编辑C/C++程序,如果将pclint集成到SourceInsight中,那就相当于给SourceInsight增加了一个C/C++编译器,而且它的检查更严格,能发现一些编译器发现不了的问题,可以...
  • arm920t: CPU的类型(CPU),其对应 cpu/arm920t子目录。 fs2410: 开发板的型号(BOARD),对应 board/fs2410目录。 NULL: 开发者/或经销商(vender),本例为空 s3c24x0: 片上系统(SOC) (5)编译 #make fs2410_...

空空如也

空空如也

1 2
收藏数 27
精华内容 10
关键字:

which相当于with