精华内容
下载资源
问答
  • 基于注意力的模型之前已经在机器翻译(Cho et al., 2014; Bahdanau et al., 2015)、语音识别(Chorowski et al., 2015; Chan et al., 2016)和计算机视觉(Xu et al. 2015)等领域得到了应用。我们的工作受到了 ...

    原论文地址:https://openreview.net/forum?id=B1VWyySKx

     

    CHAR2WAV: END-TO-END SPEECH SYNTHESIS

    Jose Sotelo, Soroush Mehri, Kundan Kumar, Joao Felipe Santos, Kyle Kastner,
    Aaron Courville , Yoshua Bengio
    Universite de Montreal
    IIT Kanpur
    INRS-EMT
    CIFAR Fellow
    Senior CIFAR Fellow

     

    摘要

     

    我们提出一种端到端的用于语音合成的模型 Char2Wav,其有两个组成部分:一个读取器(reader)和一个神经声码器。该读取器是一个带有注意力(attention)的编码器-解码器模型。其中编码器是一个以文本或音素作为输入的双向循环神经网络(RNN),而解码器则是一个带有注意力的循环神经网络,其会产出声码器声学特征。神经声码器是指 SampleRNN 的一种条件式的扩展,其可以根据中间表征(intermediate representations)生成原始的声波样本。与用于语音合成的传统模型不同,Char2Wav 可以学习直接根据文本生成音频。

     

    1 介绍

    语音合成的主要任务包括将文本映射为音频信号。语音合成有两个主要目标:可理解性和自然度。可理解性是指合成音频的清晰度,特别是听话人能够在多大程度上提取出原信息。自然度则描述了无法被可理解性直接获取的信息,比如听的整体容易程度、全局的风格一致性、地域或语言层面的微妙差异等等。

    传统的语音合成方法是将这个任务分成两个阶段来完成的。第一个阶段被称为前端(frontend)是将文本转换为语言特征,这些特征通常包括音素、音节、词、短语和句子层面的特征(Zen, 2006; Zen et al., 2013; van den Oord et al., 2016)。第二个阶段被称为后端(backend),以前端所生成的语言特征为输入来生成对应的声音。WaveNet(van den Oord et al., 2016)就是一种可实现高质量的「神经后端(neural backend)」的方法。要更加详细地了解传统的语音合成模型,我们推荐参阅 Taylor ( 2009 ) 。

    定义好的语言特征通常需要耗费大量时间,而且不同的语言也各有不同。在本论文中,我们将前端和后端整合到了一起,可以通过端到端的方式学习整个过程。这个流程消除了对专业语言学知识的需求,这就移除了在为新语言创建合成器时所面临的一个主要瓶颈。我们使用了一个强大的模型来从数据中学习这种信息。

     

    2 相关研究

    基于注意力的模型之前已经在机器翻译(Cho et al., 2014; Bahdanau et al., 2015)、语音识别(Chorowski et al., 2015; Chan et al., 2016)和计算机视觉(Xu et al. 2015)等领域得到了应用。我们的工作受到了 Alex Graves ( Graves, 2013; 2015 ) 的工作很大的影响。在一个客座讲座中,Graves 展示了一个使用了一种注意机制的语音合成模型,这是他之前在手写生成方面的研究成果的延伸。不幸的是,这个延伸工作没有被发表出来,所以我们不能将我们的方法和他的成果进行直接的比较。但是,他的结果给了我们关键的启发,我们也希望我们的成果能有助于端到端语音合成的进一步发展。

     

    3 模型描述

    3.1 读取器

    我们采用了 Chorowski et al.( 2015 ) 的标记法。一个基于注意力的循环序列生成器(ARSG/attention-based recurrent sequence generator)是指一种基于一个输入序列 X 生成一个序列 Y= ( y1, . . . , yT ) 的循环神经网络。X 被一个编码器预处理输出一个序列 h = ( h1, . . . , hL ) 。在本研究中,输出 Y 是一个声学特征的序列,而 X 则是文本或要被生成的音素序列。此外,该编码器是一个双向循环网络。

    Char2Wav 架构

     

     

     

     

     

     

     

     

     

    在第 i 步,ARSG 重点关注 h 并生成 yi:

    公式1

     

     

     

     

    其中 si-1 是该生成器循环神经网络的第 i-1 个状态,而

    是注意权重(attention weight)或对齐(alignment)。

    在这项成果中,我们使用了由 Graves ( 2013 ) 开发的基于位置的注意机制(location-based attention mechanism)。我们有

    公式2

     

    而给定一个调节序列 h 的长度 L,我们有:

    公式3

     

     

     

     

    其中 κi、βi 和 ρi 分别表示该窗口的位置、宽度和重要程度。

     

    3.2 神经声码器

    使用声码器进行语音合成受到特定声码器重建质量的限制。为了获得高质量的输出,我们使用一个学习到的参数神经模块(parametric neural module)替代了该声码器。为了该目标,我们使用 SampleRNN ( Mehri et al., 2016)作为增强的函数逼近器(function approximator)。SampleRNN 最近被提出用于在音频信号这样的序列数据中建模极其长期的依存关系。SampleRNN 中的层级结构被设计来捕捉不同时间尺度中序列的动态。这对捕捉远距音频时间步骤(例如,语音信号中的词层面关系)之间的长距关联以及近距音频时间步骤的动态都是有必要的。

    我们使用同一模型的条件式版本学习把来自声码器特征序列映射到相应的音频样本。每个声码器的特征帧(feature frame)被加了进来用作相应状态的最好的额外输入。这使得该模块能使用过去的音频样本和声码器特征帧来生成当前的音频样本。

     

    4. 训练细节

    首先,我们分别预训练读取器和神经声码器然后使用标准的 WORLD 声码器特征(Morise et al., 2016; Wu et al., 2016)作为读取器的目标和神经声码器的输入。最终,我们端到端的微调整个模型。代码已经在网上公开。

    GitHub 开源地址:http://github.com/sotelo/parrot

    合成语音样本地址:http://josesotelo.com/speechsynthesis

     

    5 结果

    此次我们并未提供对结果的综合的定量分析。相反,我们提供了来自模型生成的语音样本。在图 2 中,我们演示了模型生成的语音样本以及相应的文本对齐结果。

    模型生成的语音样本

     

     

     

     

     

     

     

     

     

     

    REFERENCES

    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.

     

    Jacob Benesty, M Mohan Sondhi, and Yiteng Huang. Springer handbook of speech processing. Springer Science & Business Media, 2007.

     

    Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 1171–1179. Curran Associates, Inc., 2015.

     

    Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic ´ language model. J. Mach. Learn. Res., 3:1137–1155, March 2003. ISSN 1532-4435.

     

    William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964, March 2016.

     

    Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol- ¨ ger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.

     

    Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio.

    Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 577–585. Curran Associates, Inc., 2015.

     

    Yuchen Fan, Yao Qian, Frank K. Soong, and Lei He. Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4475–4479, April 2015.

     

    Alex Graves. Practical variational inference for neural networks. In J. Shawe-taylor, R.s. Zemel, P. Bartlett, F.c.n. Pereira, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 24, pp. 2348–2356. 2011.

     

    Alex Graves. Generating sequences with recurrent neural networks. 08 2013. URL https:

    //arxiv.org/abs/1308.0850.

     

    Alex Graves. Hallucination with recurrent neural networks, 2015. URL https://www.

    youtube.com/watch?v=-yX1SYeDHbg.

     

    Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Tony Jebara and Eric P. Xing (eds.), Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772. JMLR Workshop and Conference Proceedings,2014.

     

    Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. 10 2014. URL https:

    //arxiv.org/abs/1410.5401.

     

    Andrew J Hunt and Alan W Black. Unit selection in a concatenative speech synthesis system using a large speech database. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, volume 1, pp. 373–376. IEEE,1996.

     

    Zeyu Jin, Adam Finkelstein, Stephen DiVerdi, Jingwan Lu, and Gautham J Mysore. Cute: A concatenative method for voice conversion using exemplar-based unit selection. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pp. 5660–5664. IEEE,2016.

     

    Alexander Rosenberg Johansen, Jonas Meinertz Hansen, Elias Khazen Obeid, Casper Kaae

    Sønderby, and Ole Winther. Neural machine translation with characters and hierarchical encoding. 10 2016. URL https://arxiv.org/abs/1610.06550.

     

    Geoffrey Zweig Kaisheng Yao. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. ISCA - International Speech Communication Association, May 2015.

     

    Simon King. Measuring a decade of progress in text-to-speech. Loquens, 1(1), 1 2014. ISSN 2386-2637.

     

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.

    Bo Li and Heiga Zen. Multi-language multi-speaker acoustic modeling for lstm-rnn based statistical parametric speech synthesis. 2016.

     

    Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, and Li Deng. Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Processing Magazine, 32:35–52, 2015.

     

    Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. Samplernn: An unconditional end-to-end neural audio generation model. 12 2016. URL https://arxiv.org/abs/1612.07837.

     

    Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. Recurrent models of visual attention. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 2204–2212. Curran Associates, Inc., 2014.

     

    Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. World: A vocoder-based high-quality speech synthesis system for real-time applications. IEICE Transactions on Information and Systems, E99.D(7):1877–1884, 2016.

     

    Luis A. Pineda, Hayde Castellanos, Javier Cuetara, Lucian Galescu, Janet Ju ´ arez, Joaquim Llisterri, ´Patricia Perez, and Luis Villase ´ nor. The corpus dimex100: Transcription and evaluation. ˜ Lang. Resour. Eval., 44(4):347–370, December 2010.

     

    Kanishka Rao, Fuchun Peng, Hasim Sak, and Francoise Beaufays. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225–4229, April 2015. doi: 10.1109/ICASSP.2015.7178767.

     

    Ilya Sutskever, Oriol Vinyals, and Quoc Le. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 3104–3112. Curran Associates, Inc., 2014.

     

    Paul Taylor. Text-to-Speech Synthesis. Cambridge University Press, Cambridge, 2009.

    Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016. URL http://arxiv.org/abs/ 1605.02688.

     

    Keiichi Tokuda and Heiga Zen. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4215–4219, 2015.

     

    Keiichi Tokuda and Heiga Zen. Directly modeling voiced and unvoiced components in speech waveforms by neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5640–5644, 2016.

     

    Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. Speech synthesis based on hidden markov models. Proceedings of the IEEE, 101(5): 1234–1252, May 2013. ISSN 0018-9219.

     

    Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. 09 2016. URL https://arxiv.org/abs/1609.03499.

     

    Bart van Merrienboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde- ¨Farley, Jan Chorowski, and Yoshua Bengio. Blocks and fuel: Frameworks for deep learning. CoRR, abs/1506.00619, 2015. URL http://arxiv.org/abs/1506.00619.

     

    Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, and Simon King. A study of speaker adaptation for dnn-based speech synthesis. In INTERSPEECH, pp. 879–883. ISCA, 2015.

     

    Zhizheng Wu, Oliver Watts, and Simon King. Merlin: An Open Source Neural Network Speech

    Synthesis System. 7 2016.

     

    Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In David Blei and Francis Bach (eds.), Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 2048–2057. JMLR Workshop and Conference Proceedings, 2015.

     

    Junichi Yamagishi. English multi-speaker corpus for cstr voice cloning toolkit, 2012. URL http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html.

     

    Heiga Zen. An example of context-dependent label format for hmm-based speech synthesis in english, 2006. URL http://hts.sp.nitech.ac.jp/?Download.

     

    Heiga Zen. Acoustic modeling in statistical parametric speech synthesis - from hmm to lstm-rnn. In Proc. MLSLP, 2015. Invited paper.

     

    Heiga Zen, Keiichi Tokuda, and Alan W Black. Statistical parametric speech synthesis. Speech Communication, 51(11):1039–1064, 2009.

     

    Heiga Zen, Andrew Senior, and Mike Schuster. Statistical parametric speech synthesis using deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7962–7966, 2013.

     

    Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, and Przemysław Szczepaniak. Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices. In Proc. Interspeech, San Francisco, CA, USA, 2016.

    转载于:https://my.oschina.net/stephenyng/blog/1621123

    展开全文
  • 文原文连接:...在一个win Forms应用中嵌入以及播放WAV音频文件 背景 本文描述如何嵌入WAV音频文件到应用程序里面以及通过使用System.Media 类库来播放这些文件。由于使用了Syst...

    <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

    文原文连接:http://www.c-sharpcorner.com/UploadFile/scottlysle/PlayWav01142007012501AM/PlayWav.aspx

    在一个win Forms应用中嵌入以及播放WAV音频文件

     

     

    背景

    本文描述如何嵌入WAV音频文件到应用程序里面以及通过使用System.Media 类库来播放这些文件。由于使用了System.Media 类库,因此在此示例中不依赖于引用winmm.dll来播放音频文件。也就可以用更加少的代码来实现这个功能了

    音频文件将会作为资源来嵌入到应用程序中,在安装中不再需要对外部WAV文件进行打包。同时应用嵌入资源可以消除应用程序被安装之后其外部所需文件被移走或被删除的潜在情况。

    <?xml:namespace prefix = v ns = "urn:schemas-microsoft-com:vml" />


    1。简单应用程序中的示例窗体

    开始

     

    在下载的解决方案中包含一个名叫“PlayWavFiles”的项目。项目如解决方案管理器中所示(图2)。项目中只是包含一个win forms 应用程序中默认的引用;resources文件夹所示的就是被嵌入到应用程序中的资源,这些文件不会随着应用程序被打包到安装文件中。



    2 解决方案管理器


    示例是通过Visual Studio 2005 C#来写的;在IDE中打开解决方案检查代码。

    添加音频文件资源

    添加音频文件到解决方案中,打开在解决方案管理器中的Resources.resx文件。打开之后,你会看到一个列表(图3)。你可以通过这个列表来添加所选择类型的资源到项目中。在列表中选择音频文件选项,点击“添加资源”



    3 添加音频到资源中

    一旦你点击“添加资源”按钮后,会打开一个文件浏览器。你可以用这个文件浏览器来找到你所需的音频文件并且添加到应用程序资源中。



     
    4 浏览音频文件

    一旦音频文件已经被添加到应用程序资源中,选择所添加的项设置其“Persistence”属性为“Embedded in .resx”。




    5 设置被添加的音频文件的Persistence属性

     

    现在,音频文件已经被添加以及设置;他们 可以在项目中使用。

     代码:Main Form

     

    在应用中只有一个的窗体(frmMain),窗体中包含播放嵌入音频文件所需的代码。另在,在默认被引用的类库中,System.Media类库也应该被添加到窗体中被引用。

    using System;

    using System.Collections.Generic;

    using System.ComponentModel;

    using System.Data;

    using System.Drawing;

    using System.Text;

    using System.Windows.Forms;

    using System.Media;

     

    namespace PlayWavFiles

    {

        public partial class frmMain : Form

        {

            public frmMain()

            {

                InitializeComponent();

            } 


    在窗体中只有三个按钮点击事件处理,exit button用来关闭应用程序

    private void btnExit_Click(object sender, EventArgs e)

    {

        Application.Exit();

    }

     

    第二个按钮点击事件用于播放嵌入的音频文件

    private void btnDemo1_Click(object sender, EventArgs e)

    {

        try

        {

            SoundPlayer sndplayr = new

            SoundPlayer(PlayWavFiles.Properties.Resources.BuzzingBee);

            sndplayr.Play();

        }

        catch (Exception ex)

        {

            MessageBox.Show(ex.Message + ": " + ex.StackTrace.ToString(), "Error");

        }

    }

     
    点击事件中建立一个System.Media sound 的播放器实例,设置播放器从应用程序资源中加载一个音频文件。资源加载到播放器之后,播放器的播放函数会被调用,音频文件将会被播放。

    下一个点击事件处理中,示例循环播放一个嵌入的音频文件。

    private void btnDemo2_Click(object sender, EventArgs e)

    {

        try

        {

            SoundPlayer sndplayr = new

            SoundPlayer(PlayWavFiles.Properties.Resources.LoopyMusic); 

            if (btnDemo2.Text == "Demo WAV 2")

            {

                sndplayr.PlayLooping();

                btnDemo2.Text = "STOP";

            }

            else

            {

                sndplayr.Stop();

                btnDemo2.Text = "Demo WAV 2";

            }

        }

        catch (Exception ex)

        {

            MessageBox.Show(ex.Message + ": " + ex.StackTrace.ToString(), "Error");

        }

    }


    示例以同样简单的方式运行。最后的点击事件处理中调用“PlayLooping”而不是"Play" 。通过这个函数以及player"Stop" 函数,音频文件将会被循环播放或停止播放。为了支持这个特性,当循环播放开始,按钮的文字将会变成"STOP",当用户点击"STOP"按钮,player "Stop"函数将会被调用,按钮的文字将会变成"Demo WAV 2"。点击按钮之后会循环播放或者停止播放并且更新按钮的文字。

     

    上面就是关于播放嵌入音频文件的全部代码了。

    摘要

    示例描述了如何嵌入音频文件到应用程序的资源中,和如何利用System.Media类库来循环播放嵌入资源音频文件。类似的功能可以通过调用winmm.dll 来实现。然而示例中使用的代码更少,而且是安全的应用开发。

    转载于:https://www.cnblogs.com/TtTiCk/archive/2007/04/14/712791.html

    展开全文
  • WAV文件格式解析

    万次阅读 2017-04-16 15:31:02
    本文通过翻译分析了WAV的文件格式。WAV为微软公司(Microsoft)开发的一种声音文件格式,它符合RIFF(Resource Interchange File Format)文件规范,用于保存Windows平台的音频信息资源,被Windows平台及其应用程序所...

    来源:

    http://www.codeguru.com/cpp/g-m/multimedia/audio/article.php/c8935/PCM-Audio-and-Wave-Files.htm#page-1

    源程序下载地址:

    http://www.codeguru.com/dbfiles/get_file/WaveFun_Src.zip?id=8935&lbl=WAVEFUN_SRC_ZIP

    http://www.codeguru.com/dbfiles/get_file/WaveFun_Bin.zip?id=8935&lbl=WAVEFUN_BIN_ZIP


    WAVE文件支持很多不同的比特率、采样率、多声道音频。WAVE是PC机上存储PCM音频最流行的文件格式,基本上可以等同于原始数字音频。

    WAVE文件为了与IFF保持一致,数据采用“chunk”来存储。因此,如果想要在WAVE文件中补充一些新的信息,只需要在在新chunk中添加信息,而不需要改变整个文件。这也是设计IFF最初的目的。

    WAVE文件是很多不同的chunk集合,但是对于一个基本的WAVE文件而言,以下三种chunk是必不可少的。


    使用WAVE文件的应用程序必须具有读取以上三种chunk信息的能力,如果程序想要复制WAVE文件,必须拷贝文件中所有的chunk。

    文件中第一个chunkRIFFchunk,然后是fmtchunk,最后是datachunk。对于其他的chunk,顺序没有严格的限制。

    以下是一个最基本的WAVE文件,包含三种必要chunk


    文件组织形式:

    1. 文件头

    RIFF/WAV文件标识段

     声音数据格式说明段

    2. 数据体:

    由 PCM(脉冲编码调制)格式表示的样本组成。



    描述WAVE文件的基本单元是“sample”,一个sample代表采样一次得到的数据。因此如果用44KHz采样,将在一秒中得到44000sample。每个sample可以用8位、24位,甚至32位表示(位数没有限制,只要是8的整数倍即可),位数越高,音频质量越好。

    此处有一个值得注意的细节,8位代表无符号的数值,而16位或16位以上代表有符号的数值。

    例如,如果有一个10bit的样本,由于sample位数要求是8的倍数,我们就需要把它填充到16位。16位中:0-5位补0,6-15位是原始的10bit数据。这就是左补零对齐原则。

    上述只是单声道,如果要处理多声道,就需要在任意给定时刻给出多个sameple。例如,在多声道中,给出某一时刻,我们需要分辨出哪些sample是左声道的,哪些sample是右声道的。因此,我们需要一次读写两个sample.

    假如以44KHz取样立体声音频,我们需要一秒读写44*2 KHz的sample. 给出公式:

    每秒数据大小(字节)=采样率 * 声道数 * sample比特数 / 8

    处理多声道音频时,每个声道的样本是交叉存储的。我们把左右声道数据交叉存储在一起:先存储第一个sample的左声道数据,然后存储第一个sample的右声道数据。

    当一个设备需要重现声音时,它需要同时处理多个声道,一个sample中多个声道信息称为一个样本帧。

    下面将介绍如何使用C++处理WAVE文件。

    RIFF头chunk表示如下:

    1.	struct RIFF_HEADER
    2.	{
    3.	   TCHAR szRiffID[4];        // 'R','I','F','F'
    4.	      DWORD dwRiffSize;
    5.	 
    6.	   TCHAR szRiffFormat[4];    // 'W','A','V','E'
    7.	};
    

    第二个块是fmt chunk,它用来描述WAVE文件的特性,例如比特率、声道数。可以使用结构体来描述fmt chunk.

    1.	struct WAVE_FORMAT
    2.	{
    3.	   WORD wFormatTag;
    4.	   WORD wChannels;
    5.	   DWORD dwSamplesPerSec;
    6.	   DWORD dwAvgBytesPerSec;
    7.	   WORD wBlockAlign;
    8.	   WORD wBitsPerSample;
    9.	};
    10.	struct FMT_BLOCK
    11.	{
    12.	   TCHAR szFmtID[4];    // 'f','m','t',' ' please note the
    13.	                        // space character at the fourth location.
    14.	      DWORD dwFmtSize;
    15.	      WAVE_FORMAT wavFormat;
    16.	};
    

    最后,描述包含实际声音数据的data chunk:

    1.	struct DATA_BLOCK
    2.	{
    3.	   TCHAR szDataID[4];    // 'd','a','t','a'
    4.	   DWORD dwDataSize;
    5.	};
    

    以上就是一个WAV文件的三个最基本的chunk,也可以有很多可选chunk位于fmt block和data block之间,下面是一个可选chunk的例子(note chunk)。

    1.	struct NOTE_CHUNK
    2.	{
    3.	   TCHAR ID[4];    // 'note'
    4.	   long chunkSize;
    5.	   long dwIdentifier;
    6.	   TCHAR dwText[];
    7.	};
    

    原文:


    The WAVE FileFormat

    The WAVE File Format supports a variety of bitresolutions, sample rates, and channels of audio. I would say that this is themost popular format for storing PCM audio on the PC and has become synonymouswith the term "raw digital audio."

    The WAVE file format is based on Microsoft's version of theElectronic Arts Interchange File Format method for storing data. In keepingwith the dictums of IFF,data in a Wave file is stored in many different "chunks."So, if a vendor wants to store additional information in a Wave file, he justadds info to new chunks instead of trying to tweak the base file format or comeup with his own proprietary file format. That is the primary goal of the IFF.

    As mentioned earlier, a WAVE file is a collection of a numberof different types of chunks. But, there are threechunks that are required tobe present in a valid wave file:

    1.   'RIFF', 'WAVE' chunk

    2.   "fmt" chunk

    3.   'data' chunk

    All otherchunks are optional. The Riff wave chunk is the identifier chunkthat tells us that this is a wave file. The "fmt" chunk containsimportant parameters describing the waveform, such as its sample rate, bits per sample,and so forth. The Data chunk contains the actual waveform data.

    An application that uses a WAVE file must be able to read the threerequired chunks,although it can ignore the optional chunks.But, all applications that perform a copy operation on wave files should copy all of the chunksin the WAVE.

    The Riffchunk is always the first chunk. The fmt chunk should be present before thedata chunk. Apart from this, there are no restrictions upon the order of thechunks within a WAVE file.

    Here is an example of the layout for a minimal WAVE file. Itconsists of a single WAVE containing the three required chunks.

    While interpreting WAVE files, the unit of measurement usedis a "sample."Literally, it is what it says. A sample represents data captured during asingle sampling cycle. So, if you are sampling at 44 KHz, you will have 44 Ksamples. Each sample could be represented as 8 bits, 16 bits, 24 bits, or 32bits. (There is no restriction on how many bits you use for a sample exceptthat it has to be a multiple of 8.) To some extent, the more the number of bitsin a sample, the better the quality of the audio.

    One annoying detail to note is that 8-bit samples arerepresented as "unsigned"values whereas 16-bit and higher are represented by "signed"values. I don't know why this discrepancy exists; that's just the way it is.

    The data bits for each sample should be left-justified and padded with0s. For example, consider the case of a 10-bit sample (assamples must be multiples of 8, we need to represent it as 16 bits). The 10bits should be left-justified so that they become bits 6 to 15 inclusive, andbits 0 to 5 should be set to zero.

    The analogy I have provided is for mono audio, meaning that you have just one"channel." When you deal with stereo audio, 3Daudio, and so forth, you are in effect dealing with multiplechannels, meaning you have multiple samples describing theaudio in any given moment in time. For example, for stereo audio, at any givenpoint in time you need to know what the audio signal was for the left channel as well as the right channel.So, you will have to read and write two samples at a time.

    Say you sample at 44 KHz for stereoaudio; then effectively, you will have 44 K * 2 samples. If you are using 16bits per sample, then given the duration of audio, you can calculate the totalsize of the wave file as:

    Size in bytes = sampling rate * numberof channels * (bits per sample / 8) * duration in seconds

    When youare dealing with such multi-channelsounds, single sample points from each channel are interleaved. Instead of storing all of the samplepoints for the left channel first, and then storing all of the sample pointsfor the right channel next, you "interleave"the two channels' samples together. You would store the first sample of theleft channel. Then, you would store the first sample of the right channel, andso on.

    When adevice needs to reproduce the stored stereo audio (or any multi-channel audio),it will process the left and right channels (or however many channels thereare) simultaneously. This collective piece of information is called a sample frame.

    So far,you have covered the very basics of PCM audio and how it is represented in awave file. It is time to take a look at some code and see how you can use C++to manage wave files. Start by laying out the structures for the differentchunks of a wave file.

    Thefirst chunk is the riff header chunk and can be represented as follows. You usea TCHAR that is defined as a normal ASCII char or as a wide character dependingupon whether the UNICODE directive has been set on your compiler.

    1. struct RIFF_HEADER
    2. {
    3.    TCHAR szRiffID[4];       // 'R','I','F','F'
    4.       DWORD dwRiffSize;
    5.  
    6.    TCHAR szRiffFormat[4];   // 'W','A','V','E'
    7. };

     

    I guessit is self explanatory. The second chunk is the fmt chunk. It describes theproperties of the wave file, such as bits per sample, number of channels, andthe like. You can use a helper structure to neatly represent the chunk as:

    1. struct WAVE_FORMAT
    2. {
    3.    WORD wFormatTag;
    4.    WORD wChannels;
    5.    DWORD dwSamplesPerSec;
    6.    DWORD dwAvgBytesPerSec;
    7.    WORD wBlockAlign;
    8.    WORD wBitsPerSample;
    9. };
    10. struct FMT_BLOCK
    11. {
    12.    TCHAR szFmtID[4];   // 'f','m','t',' ' please note the
    13.                        // space character at the fourth location.
    14.       DWORD dwFmtSize;
    15.       WAVE_FORMAT wavFormat;
    16. };

     

     

    1.  
    2. struct DATA_BLOCK
    3. {
    4.    TCHAR szDataID[4];   // 'd','a','t','a'
    5.    DWORD dwDataSize;
    6. };

     

    That's it. That's all you need todescribe a wave form. Of course, there a lot of optional chunks that you canhave (they should be before the data block and after the fmt block). Just as anexample, here is an optional chunk that you could use:

    Note Chunk, used to store"comments" about the wave data:

    1. struct NOTE_CHUNK
    2. {
    3.    TCHAR ID[4];   // 'note'
    4.   long chunkSize;
    5.   long dwIdentifier;
    6.    TCHAR dwText[];
    7. };

    展开全文
  • wav文件提取数据Although PHP is well known for building web pages and applications, it can do more ... I recently needed to extract a piece of audio from a WAV file on-the-fly and let the user downloa...

    wav文件提取数据

    Although PHP is well known for building web pages and applications, it can do more than that. I recently needed to extract a piece of audio from a WAV file on-the-fly and let the user download it through his browser. I tried to find a library that fit my needs but wasn’t successful and had to write the code myself. It was a good opportunity to study in depth how a WAV file is made. In this article I’ll give you a brief overview of the WAV file format and explain the library I developed, Audero Wav Extractor.

    尽管PHP以构建网页和应用程序而闻名,但它可以做的还不止这些。 最近,我需要即时从WAV文件中提取一段音频,然后让用户通过浏览器下载它。 我试图找到一个适合我需要的库,但没有成功,必须自己编写代码。 这是深入研究如何制作WAV文件的好机会。 在本文中,我将向您简要概述WAV文件格式,并说明我开发的库Audero Wav Extractor

    WAV格式概述 (Overview of the WAV Format)

    The Waveform Audio File Format, also known as WAVE or WAV, is a Microsoft file format standard for storing digital audio data. A WAV file is composed of a set of chunks of different types representing different sections of the audio file. You can envision the format as an HTML page: the first chunks are like the <head> section of a web page, so inside it you will find several pieces of information about the file itself, while the chunk having the audio data itself would be in the <body> section of the page. In this case, the word “chunk” refers to the data sections contained in the file.

    波形音频文件格式,也称为WAVE或WAV,是用于存储数字音频数据的Microsoft文件格式标准。 WAV文件由代表音频文件不同部分的一组不同类型的块组成。 您可以将格式设想为HTML页面:第一个块类似于网页的<head>部分,因此在其中可以找到有关文件本身的几条信息,而具有音频数据本身的块将是在页面的<body>部分中。 在这种情况下,“块”一词是指文件中包含的数据段。

    The most important format’s chunks are “RIFF”, which contains the number of bytes of the file, “Fmt”, which has vital information such as the sample rate and the number of channels, and “Data”, which actually has the audio stream data. Each chunk must have at least two field, the id and the size. Besides, every valid WAV must have at least 2 chunks: Fmt and Data. The first is usually at the beginning of the file but after the RIFF.

    最重要的格式块是“ RIFF”,其中包含文件的字节数;“ Fmt”,其具有重要的信息,例如采样率和通道数;“ Data”,其实际具有音频流数据。 每个块必须至少具有两个字段,即id和size。 此外,每个有效的WAV必须至少包含2个块:Fmt和Data。 第一个通常在文件的开头,但在RIFF之后。

    Each chunk has its own format and fields, and a field constitutes a sub-sections of the chunk. The WAV format has been underspecified in the past and this lead to files having headers that don’t follow the rule strictly. So, while you’re working with an audio, you may find one having one or more fields, or even the most important set to zero or to a wrong value.

    每个块都有其自己的格式和字段,并且一个字段构成块的子部分。 过去对WAV格式的指定不足,这导致文件的头文件未严格遵循该规则。 因此,在使用音频时,您可能会发现一个具有一个或多个字段,甚至最重要的字段设置为零或错误的值。

    To give you an idea of what’s inside a chunk, the first one of each WAV file is RIFF. Its first 4 bytes contain the string “RIFF”, and the next 4 contain the file’s size minus the 8 bytes used for these two pieces of data. The final 4 bytes of the RIFF chunk contain the string “WAVE”. You might guess what’s the aim of this data. In this case, you could use them to identify if the file you’re parsing is actually a WAV file or not as I did in the setFilePath() method of the Wav class of my library.

    为了让您了解块中的内容,每个WAV文件的第一个是RIFF。 它的前4个字节包含字符串“ RIFF”,后4个字节包含文件的大小减去用于这两段数据的8个字节。 RIFF块的最后4个字节包含字符串“ WAVE”。 您可能会猜到这些数据的目的是什么。 在这种情况下,您可以使用它们来识别要解析的文件是否实际上是WAV文件,就像我在库的Wav类的setFilePath()方法中所做的setFilePath()

    Another interesting thing to explain is how the duration of a WAV file is calculated. All the information you need, can be retrieved from the two must-have chunks cited before and are: Data chunk size, sample rate, number of channels, and bits per sample. The formula to calculate the file time in seconds is the following:

    另一个有趣的解释是如何计算WAV文件的持续时间。 您需要的所有信息都可以从前面引用的两个必备数据块中检索到,分别是:数据块大小,采样率,通道数和每个采样的位数。 以秒为单位计算文件时间的公式如下:

    time = dataChunkSize / (sampleRate * channelsNumber * bitsPerSample / 8)
    

    Say we have:

    说我们有:

    dataChunkSize = 4498170
    sampleRate = 22050
    channelsNumber = 16
    bitsPerSample = 1
    
    

    Applying this values to the formula, we have:

    将此值应用于公式,我们有:

    time = 4498170 / (22050 * 1 * 16 / 8)
    

    And the result is 102 seconds (rounded).

    结果是102秒(四舍五入)。

    Explaining in depth how a WAV file is structured is outside the scope of this article. If you want to study it further, read these pages I came across when I worked on this:

    深入说明WAV文件的结构超出了本文的范围。 如果您想进一步研究它,请阅读我从事此工作时遇到的以下页面:

    什么是Audero Wav提取器 (What’s Audero Wav Extractor)

    Audero Wav Extractor is a PHP library that allows you to extract an exceprt from a WAV file. You can save the extracted excerpt to the local hard disk, download through the user’s browser, or return it as a string for a later processing. The only special requirement the library has is PHP 5.3 or higher because it uses namespaces.

    Audero Wav Extractor是一个PHP库,可让您从WAV文件中提取摘录。 您可以将提取的摘录保存到本地硬盘,通过用户的浏览器下载,或将其作为字符串返回以供以后处理。 该库具有的唯一特殊要求是PHP 5.3或更高版本,因为它使用名称空间。

    All the classes of the library are inside the WavExtractor directory, but you’ll notice there is an additional directory Loader where you can find the library’s autoloader. The entry point for the developers is the AuderoWavExtractor class that has the three main methods of the project:

    库的所有类都在WavExtractor目录中,但是您会注意到还有一个附加目录Loader ,您可以在其中找到库的自动加载器。 开发人员的入口点是AuderoWavExtractor类,该类具有项目的三种主要方法:

    • downloadChunk(): To download the exceprt

      downloadChunk() :要下载摘录

    • saveChunk(): To save it on the hard disk

      saveChunk() :将其保存在硬盘上

    • getChunk(): To retrieve the exceprt as a string

      getChunk() :以字符串形式检索摘录

    All of these methods have the same first two parameters: $start and $end that represent the start and the end time, in milliseconds, of the portion to extract respectively. Moreover, both downloadChunk() and saveChunk() accept an optional third argument to set the name of the extracted snippet. If no name is provided, then the method generates one on its own in the format “InputFilename-Start-End.wav”.

    所有这些方法都具有相同的前两个参数: $start$end代表要提取的部分的开始时间和结束时间(以毫秒为单位)。 此外, downloadChunk()saveChunk()接受可选的第三个参数来设置提取的代码片段的名称。 如果未提供名称,则该方法将以“ InputFilename-Start-End.wav”格式自行生成一个名称。

    Inside the WavExtractor directory there are two sub-folders: Utility, containing the Converter class that has some utility methods, and Wav. The latter contains the Wav, Chunk, and ChunkField classes. The first, as you might expect, represents the WAV file and is composed by one or more chunks (of Chunk type). This class allows you to retrieve the WAV headers, the duration of the audio, and some other useful information. Its most pertinent method is getWavChunk(), the one that retrieve the specified audio portion by reading the bytes from the file.

    WavExtractor目录中,有两个子文件夹: Utility ,包含具有一些实用程序方法的Converter类,以及Wav 。 后者包含WavChunkChunkField类。 如您所料,第一个表示WAV文件,由一个或多个块( Chunk类型)组成。 此类允许您检索WAV标头,音频的持续时间以及其他一些有用的信息。 它最相关的方法是getWavChunk() ,该方法通过从文件中读取字节来检索指定的音频部分。

    The Chunk class represents a chunk of the WAV file and it’s extended by specialized classes contained in the Chunk folder. The latter doesn’t support all of the existing chunk types, just the most important ones. Unrecognized sections are managed by the generic class and simply ignored in the overall process.

    Chunk类代表WAV文件的一部分,并由Chunk文件夹中包含的专门类进行扩展。 后者不支持所有现有的块类型,仅支持最重要的类型。 无法识别的部分由通用类管理,并在整个过程中被忽略。

    The last class described is ChunkField. As I pointed out, each chunk has its own type and fields and each of them have a different length (in bytes) and format. It is very important information to know because you need to pass the right parameters to parse the bytes properly using PHP’s pack() and the unpack() functions or you’ll receive an error. To help manage the data, I decided to wrap them into a class that saves the format, the size, and the value of each field.

    描述的最后一个类是ChunkField 。 正如我所指出的,每个块都有其自己的类型和字段,并且每个块都有不同的长度(以字节为单位)和格式。 这是非常重要的信息,因为您需要传递正确的参数以使用PHP的pack()unpack()函数正确解析字节,否则会收到错误消息。 为了帮助管理数据,我决定将它们包装到一个类中,该类保存每个字段的格式,大小和值。

    如何使用Audero Wav Extractor (How to use Audero Wav Extractor)

    You can obtain “Audero Wav Extractor” via Composer, adding the following lines to your composer.json file and running its install command.

    您可以通过Composer获得“ Audero Wav Extractor”,将以下几行添加到composer.json文件中并运行其install命令。

    "require": {
        "audero/audero-wav-extractor": "2.1.*"
    }

    Composer will download and place the library in the project’s vendor/audero directory.

    Composer将下载该库并将其放置在项目的vendor/audero目录中。

    Alternatively, you can download the library directly from its repository.

    或者,您可以直接从其存储库下载该库。

    To extract an exceprt and force the download to the user’s browser, you’ll write code that resembles the following:

    要提取摘要并将其强制下载到用户的浏览器,您将编写类似于以下内容的代码:

    <?php
    //  include the Composer autoloader
    require_once "vendor/autoload.php";
    
    $inputFile = "sample1.wav";
    $outputFile = "excerpt.wav";
    $start = 0 * 1000; // from 0 seconds
    $end = 2 * 1000;  // to 2 seconds
    
    try {
        $extractor = new AuderoWavExtractorAuderoWavExtractor($inputFile);
        $extractor->downloadChunk($start, $end, $outputFile);
        echo "Chunk extraction completed. ";
    }
    catch (Exception $e) {
        echo "An error has occurred: " . $e->getMessage();
    }

    In the first lines I included the Composer autoloader and then set the values I’ll be working with. As you can see, I provided the source file, the output path including the filename and the time range I want to extract. Then I created an instance of AuderoWavExtractor, giving the source file as a parameter, and then called the downloadChunk() method. Please note that because the output path is passed by reference, you always need to set it into a variable.

    在第一行中,我包括了Composer自动加载器,然后设置将要使用的值。 如您所见,我提供了源文件,输出路径,包括文件名和要提取的时间范围。 然后,我创建了AuderoWavExtractor的实例,将源文件作为参数,然后调用了downloadChunk()方法。 请注意,由于输出路径是通过引用传递的,因此您始终需要将其设置为变量。

    Let’s look at another example. I’ll show you how to select a time range and save the file into the local hard disk. Moreover, I’ll use the autoloader included in the project.

    让我们看另一个例子。 我将向您展示如何选择时间范围并将文件保存到本地硬盘中。 此外,我将使用项目中包含的自动加载器。

    <?php
    // set include path
    set_include_path(get_include_path() . PATH_SEPARATOR . __DIR__ . "/../src/");
    
    // include the library autoloader
    require_once "AuderoLoaderAutoLoader.php";
    
    // Set the classes' loader method
    spl_autoload_register("AuderoLoaderAutoLoader::autoload");
    
    $inputFile = "sample2.wav";
    $start = 0 * 1000; // from 0 seconds
    $end = 2 * 1000;  // to 2 seconds
    
    try {
        $extractor = new AuderoWavExtractorAuderoWavExtractor($inputFile);
        $extractor->saveChunk($start, $end);
        echo "Chunk extraction completed.";
    }
    catch (Exception $e) {
        echo "An error has occurred: " . $e->getMessage();
    }

    Apart from the loader configuration, the snippet is very similar to the previous. In fact I only made two changes: the first one is the method called, saveChunk() instead of downloadChunk(), and the second is I haven’t set the output filename (which will use the default format explained earlier).

    除了加载程序配置之外,该代码段与之前的代码非常相似。 实际上,我仅作了两项更改:第一个是名为saveChunk()的方法,而不是downloadChunk() ,第二个是我没有设置输出文件名(它将使用前面解释的默认格式)。

    结论 (Conclusion)

    In this article I showed you “Audero Wav Extractor” and how you can use easily extract one or more snippets from a given WAV file. I wrote the library for a work project with requirements for working with a very narrow set of tiles, so if a WAV or its headers are heavily corrupted then the library will probably fail, but I wrote the code to try to recover from errors when possible. Feel free to play with the demo and the files included in the repository as I’ve released it under the CC BY-NC 3.0 license.

    在本文中,我向您展示了“ Audero Wav提取器”以及如何轻松地从给定的WAV文件中提取一个或多个片段。 我为一个工作项目编写了库,要求使用一组非常狭窄的图块,因此,如果WAV或其标题严重损坏,则该库可能会失败,但是我编写了代码,尝试在可能的情况下从错误中恢复。 我已经按照CC BY-NC 3.0许可发布了该演示和该存储库中的文件,请随意使用

    翻译自: https://www.sitepoint.com/extract-an-exceprt-from-a-wav-file/

    wav文件提取数据

    展开全文
  • mp3转换wav文件WAV audio files are a great way to preserve the complete and accurate quality of a recording in a truly lossless format on your computer. However, if you’re not an audiophile and are ...
  • mp3如何转换为wav Boy I would have loved this post a decade ago when I was ripping CDs from my local library. The memory is actually ... 翻译自: https://davidwalsh.name/convert-wav-mp3 mp3如何转换为wav
  • flac文件转wav 如果潜伏在与发烧友相关的站点上,您可能会偶然发现作家声称以FLAC或WAV格式播放的同一音乐之间的声音差异。 从“不相信这样的事情是可能的人”的角度以及那些认为存在差异的人的角度来看,这篇关于...
  • gstreamer播放wav文件

    千次阅读 2012-09-18 22:42:24
    在网上查了一下实现wav播放的方法,大多数都是这种直接操作/dev/dsp的,GTK也没有像QT那样的直接播放音频文件的类,于是我只好使用第三方的库了,网上很流行的gstreamer,今天简单读了一下它的手册,发现gstreamer...
  • 文本文件朗读生成语音WAV文件软件用于将文本文件中的文字自动朗读合成 WAV音频文件,仅支持WINDOWS7 和WINDOWS8 系统。   注:软件本身是绿色软件,单个文件,不用安装都可运行,但必须先安装 微软的 .net ...
  • 如何生成一个正弦波形WAV

    千次阅读 2014-03-09 21:22:02
    参考了很多资料,什么WAV头啊之类的,了解结构之后便写了出来。本来这个程序我是用Java实现的,但是没有C++快,于是翻译成了C++,不幸的是,翻译后各种错误,原因竟是那可恨的Char。 #include #include #include ...
  • Play a WAV file on an AudioTrack 译者注: 1. 由于这是技术文章,所以有些词句使用原文,表达更准确。 2. 由于水平有效,有些地方可能翻译的不够准确,如有不当之处,敬请批评指正. 3. 针对某些语句,适当补充了...
  • WAV2LETTER ++:最快的开源语音识别系统 Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve,Vitaliy Liptchinsky, R...
  • bmx2wav 是一个可以将 bms、bme等音游格式文件转换成 wav 格式文件的软件。←这个大部分人应该都知道 由于软件的原作者是日本的,软件用的字符编码不能在中文系统正常显示. 我看着一堆乱码实在不舒服,本来只是...
  • Play a WAV file on an AudioTrack 译者注: 1. 因为这是技术文章,所以有些词句使用原文,表达更准确。 2. 因为水平有效,有些地方可能翻译的不够准确,如有不当之处,敬请批评指正. 3. 针对某些语句。适当补充了...
  • 近日,Facebook 人工智能研究院 ( FAIR ) 宣布开源首个全卷积语音识别工具包 wav2letter++。系统基于全卷积方法进行语音识别,训练语音识别端到端神经网络的速度是其他框架的 2 倍多。他们在博客中对此次开源进行了...
  • Linux下查看wav文件的头信息-sox 安装sox apt install sox 查看一个wav文件的 采样率和声道 sox -V sa1.wav -n 输出结果如下:
  • 语音识别之Wav2Letter(译)

    千次阅读 2018-05-30 23:55:05
    Wav2Letter: an End-to-End ConvNet-based Speech Recognition SystemAbstract: 本文提出了一种简单的端到端语音识别模型,它结合了基于卷积网络的声学模型和图译码。 训练它输出字母和转录语音,不需要把音节强制...
  • mplayer配置文件(1) convert .avi, .wmv etc....mplayer -ao pcm:file=%7%out.wav MOVIE_FILES(2) then it is easy to convert .wav file to .mp3 file: (2)然后很容易将.wav文件转换为.mp3文件: l...
  • Qt-》QAudioOutput play 播放wav文件

    千次阅读 热门讨论 2013-12-01 23:43:56
    用Qt写音频比directxShow方便多了,很好用,初级文章,供大家学习路上少分困难,多份轻松。#include #include #include ... inputFile.setFileName("/home/alex/Music/noh.wav"); inputFile.open
  • wav2letter++全卷积语音识别框架

    千次阅读 2018-12-27 09:29:23
    最近,Facebook的AI研究中心(FAIR)发表的一个研究论文,提出了一种新的单纯基于卷积神经网络(Convolutional Neural Network)的语音识别技术,而且提供了开源的实现wav2letter++,一个完全基于卷积模型的高性能的...
  • 用python播放声音文件(mp3、wav、m4a等)

    万次阅读 多人点赞 2019-05-27 16:51:25
    用python播放声音文件(mp3、wav、m4a等) 前段时间在搞一个基于python的语音助手,其中需要用到python播放音频的功能,要在windows上和树莓派上运行,但是在网上找了好久,都没有找到合适的解决方案(pygame 和 ...
  • 翻译: ape压缩原理 数字音频: 声 音简单的说是一种波,而数字化音频是声波的数字化形式。这是通过对大量的模拟信号在每秒钟“采样”很多次而达到的。这个过程在概念上可以理解为在每秒钟内 ...
  • 最近,Facebook的AI研究中心(FAIR)发表的一个研究论文,提出了一种新的单纯基于卷积神经网络(Convolutional Neural Network)的语音识别技术,而且提供了开源的实现wav2letter++,一个完全基于卷积模型的高性能的...
  • 最近,Facebook的AI研究中心(FAIR)发表的一个研究论文,提出了一种新的单纯基于卷积神经网络(Convolutional Neural Network)的语音识别技术,而且提供了开源的实现wav2letter++,一个完全基于卷积模型的高性能的...
  • 最近,Facebook AI Research(FAIR)宣布了第一个全收敛语音识别工具包wav2letter++。该系统基于完全卷积方法进行语音识别,训练语音识别端到端神经网络的速度是其他框架的两倍以上。他们在博客中详细介绍了这个开源...
  • 目录波形文件的基础知识波形文件的存储过程与声音有关的三个参数1、采样频率2、采样位数3、声道数WAV文件的编码文件整体结构RIFF区块fmt区块(FORMAT区块)DATA区块NAudio文件数据管理分析WaveFileReader类构造函数...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 5,256
精华内容 2,102
关键字:

wav翻译