2018-12-08 14:18:57 xiaoxiaowenqiang 阅读数 829
  • C++语音识别开篇

    本篇mark老师将教大家使用第三方库的调用来简单的实现语音识别。随着机器学习和人工智能的热闹,国内语音行业也可谓是百花齐放。 语音识别一个伟大的时代已在我们身边悄悄走来。

    5927 人正在学习 去看看 杨波

机器人语音 语音识别理解合成控制 ASR NLU TTS

github

一、语音处理总体框架

    1. 语音识别(ASR , Automatic Speech Recognition )
    2. 语义理解(NLU , Natural Language Understanding)
    e. 语音合成(TTS , Text To Speech)

1. 语音识别 **ASR**:支持的包:
    国外:CMU SPhinx  ——>  pocketsphinx
    国内:科大迅飞等。。

2. 语义理解 NLU:  图灵

3. 语音合成 TTS:
    国外:Festival    ——> sound_play   是 ros-indigo-audio-common 的一部分
    国内:科大迅飞等。。。

二、国外库

1、语音识别 pocketsphinx

 1) 安装
  sudo apt-get install gstreamer0.10-pocketsphinx   # 原生系统
  sudo apt-get install ros-indigo-pocketsphinx      # ros接口支持
  sudo apt-get install ros-indigo-audio-common      # 包含了sound_play TTS
  sudo apt-get install libasound2                   # 语音驱动
  sudo apt-get install gstreamer0.10-gconf          # GStreamer组件
  
 2) 测试
   pocketsphinx包 包含了 一个 节点  recognizer.py
   获取硬件 语音的输入流,在已有的语音库里  匹配语音相应的单词  并发布到 /recognizer/output 话题   
   适合robotcup的一个语音库测试:
   roslaunch pocketsphinx robocup.launch
   说话测试
   显示 话题消息:
     rostopic echo /recognizer/output
   查看语音库:
   roscd pocketsphinx/demo
   more robocup.corpus      显示
   只有说了语音库内的语音才能得到较满意的结果
  # robocup.launch     
 <launch>
  <node name="recognizer" pkg="pocketsphinx" type="recognizer.py" output="screen"> # 识别器
    <param name="lm" value="$(find pocketsphinx)/demo/robocup.lm"/>                # 语言模型 在线工具根据语言库生成
    <param name="dict" value="$(find pocketsphinx)/demo/robocup.dic"/>             # 语言词典
  </node>
</launch>
# recognizer.py 文件
import roslib; roslib.load_manifest('pocketsphinx')
import rospy

import pygtk
pygtk.require('2.0')
import gtk

import gobject
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst

from std_msgs.msg import String
from std_srvs.srv import *
import os
import commands

class recognizer(object):
    """ GStreamer based speech recognizer. """

    def __init__(self):
        # Start node
        rospy.init_node("recognizer")

        self._device_name_param = "~mic_name"  # Find the name of your microphone by typing pacmd list-sources in the terminal
        self._lm_param = "~lm"
        self._dic_param = "~dict"

        # Configure mics with gstreamer launch config
        if rospy.has_param(self._device_name_param):
            self.device_name = rospy.get_param(self._device_name_param)
            self.device_index = self.pulse_index_from_name(self.device_name)
            self.launch_config = "pulsesrc device=" + str(self.device_index)
            rospy.loginfo("Using: pulsesrc device=%s name=%s", self.device_index, self.device_name)
        elif rospy.has_param('~source'):
            # common sources: 'alsasrc'
            self.launch_config = rospy.get_param('~source')
        else:
            self.launch_config = 'gconfaudiosrc'

        rospy.loginfo("Launch config: %s", self.launch_config)

        self.launch_config += " ! audioconvert ! audioresample " \
                            + '! vader name=vad auto-threshold=true ' \
                            + '! pocketsphinx name=asr ! fakesink'

        # Configure ROS settings
        self.started = False
        rospy.on_shutdown(self.shutdown)
        self.pub = rospy.Publisher('~output', String)
        rospy.Service("~start", Empty, self.start)
        rospy.Service("~stop", Empty, self.stop)

        if rospy.has_param(self._lm_param) and rospy.has_param(self._dic_param):
            self.start_recognizer()
        else:
            rospy.logwarn("lm and dic parameters need to be set to start recognizer.")

    def start_recognizer(self):
        rospy.loginfo("Starting recognizer... ")

        self.pipeline = gst.parse_launch(self.launch_config)
        self.asr = self.pipeline.get_by_name('asr')
        self.asr.connect('partial_result', self.asr_partial_result)
        self.asr.connect('result', self.asr_result)
        self.asr.set_property('configured', True)
        self.asr.set_property('dsratio', 1)

        # Configure language model
        if rospy.has_param(self._lm_param):
            lm = rospy.get_param(self._lm_param)
        else:
            rospy.logerr('Recognizer not started. Please specify a language model file.')
            return

        if rospy.has_param(self._dic_param):
            dic = rospy.get_param(self._dic_param)
        else:
            rospy.logerr('Recognizer not started. Please specify a dictionary.')
            return

        self.asr.set_property('lm', lm)
        self.asr.set_property('dict', dic)

        self.bus = self.pipeline.get_bus()
        self.bus.add_signal_watch()
        self.bus_id = self.bus.connect('message::application', self.application_message)
        self.pipeline.set_state(gst.STATE_PLAYING)
        self.started = True

    def pulse_index_from_name(self, name):
        output = commands.getstatusoutput("pacmd list-sources | grep -B 1 'name: <" + name + ">' | grep -o -P '(?<=index: )[0-9]*'")

        if len(output) == 2:
            return output[1]
        else:
            raise Exception("Error. pulse index doesn't exist for name: " + name)

    def stop_recognizer(self):
        if self.started:
            self.pipeline.set_state(gst.STATE_NULL)
            self.pipeline.remove(self.asr)
            self.bus.disconnect(self.bus_id)
            self.started = False

    def shutdown(self):
        """ Delete any remaining parameters so they don't affect next launch """
        for param in [self._device_name_param, self._lm_param, self._dic_param]:
            if rospy.has_param(param):
                rospy.delete_param(param)

        """ Shutdown the GTK thread. """
        gtk.main_quit()

    def start(self, req):
        self.start_recognizer()
        rospy.loginfo("recognizer started")
        return EmptyResponse()

    def stop(self, req):
        self.stop_recognizer()
        rospy.loginfo("recognizer stopped")
        return EmptyResponse()

    def asr_partial_result(self, asr, text, uttid):
        """ Forward partial result signals on the bus to the main thread. """
        struct = gst.Structure('partial_result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def asr_result(self, asr, text, uttid):
        """ Forward result signals on the bus to the main thread. """
        struct = gst.Structure('result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def application_message(self, bus, msg):
        """ Receive application messages from the bus. """
        msgtype = msg.structure.get_name()
        if msgtype == 'partial_result':
            self.partial_result(msg.structure['hyp'], msg.structure['uttid'])
        if msgtype == 'result':
            self.final_result(msg.structure['hyp'], msg.structure['uttid'])

    def partial_result(self, hyp, uttid):
        """ Delete any previous selection, insert text and select it. """
        rospy.logdebug("Partial: " + hyp)

    def final_result(self, hyp, uttid):
        """ Insert the final result. """
        msg = String()
        msg.data = str(hyp.lower())
        rospy.loginfo(msg.data)
        self.pub.publish(msg)

if __name__ == "__main__":
    start = recognizer()
    gtk.main()
#!/usr/bin/python
# -*- coding:utf-8 -*-
### 修改后的 文件
import roslib
roslib.load_manifest('pocketsphinx')
import rospy
import pygtk # Python轻松创建具有图形用户界面的程序  播放音乐等
pygtk.require('2.0')
import gtk   #  GNU Image Manipulation Program (GIMP)   Toolkit

import gobject # 亦称Glib对象系统,是一个程序库,它可以帮助我们使用C语言编写面向对象程序
import pygst   # 与 pygtk 相关
pygst.require('0.10')
gobject.threads_init()# 初始化
import gst

from std_msgs.msg import String
from std_srvs.srv import *
import os
import commands

class recognizer(object):
    """GStreamer是一个多媒体框架,它可以允许你轻易地创建、编辑与播放多媒体文件"""
    # 初始化系统配置
    def __init__(self):
        # 创建节点
        rospy.init_node("recognizer")
        # 全局参数
        self._device_name_param = "~mic_name"  # 麦克风
        self._lm_param = "~lm"                 # 语言模型 language model  
        self._dic_param = "~dict"              # 语言字典
        self._hmm_param = "~hmm"               # 识别网络  hiden markov model 隐马尔可夫模型 分中英文模型
        
        
        # 用 gstreamer launch config 配置 麦克风  一些启动信息
        if rospy.has_param(self._device_name_param):# 按照指定的麦克风
            self.device_name = rospy.get_param(self._device_name_param)# 麦克风名字
            self.device_index = self.pulse_index_from_name(self.device_name)# 麦克风编号 ID
            self.launch_config = "pulsesrc device=" + str(self.device_index)# 启动信息
            rospy.loginfo("Using: pulsesrc device=%s name=%s", self.device_index, self.device_name)
        elif rospy.has_param('~source'):
            # common sources: 'alsasrc'
            self.launch_config = rospy.get_param('~source')
        else:
            self.launch_config = 'gconfaudiosrc'

        rospy.loginfo("麦克风配置: %s", self.launch_config) # "Launch config: %s",self.launch_config

        self.launch_config += " ! audioconvert ! audioresample " \
                            + '! vader name=vad auto-threshold=true ' \
                            + '! pocketsphinx name=asr ! fakesink'

        # 配置ros系统设置
        self.started = False
        rospy.on_shutdown(self.shutdown)# 自主关闭
        self.pub = rospy.Publisher('~output', String)# 发布 ~output 参数指定的 话题 类型 tring  似乎缺少 指定发布队列大小 tring
        
        rospy.Service("~start", Empty, self.start)   # 开始服务
        rospy.Service("~stop", Empty, self.stop)     # 结束服务
        # 检查模型和字典配置
        if rospy.has_param(self._lm_param) and rospy.has_param(self._dic_param):
            self.start_recognizer()
        else:
            rospy.logwarn("启动语音识别器必须指定语言模型lm,以及语言字典dic.")
            # rospy.logwarn("lm and dic parameters need to be set to start recognizer.")
                    
    def start_recognizer(self):
        rospy.loginfo("开始语音识别... ")
        # rospy.loginfo("Starting recognizer... ")
        
        self.pipeline = gst.parse_launch(self.launch_config)# 解析 麦克风配置 
        self.asr = self.pipeline.get_by_name('asr')         # 自动语音识别 模型
        self.asr.connect('partial_result', self.asr_partial_result)# 后面的函数
        self.asr.connect('result', self.asr_result)
        #self.asr.set_property('configured', True) # 需要开启配置  hmm模型
        self.asr.set_property('dsratio', 1)

        # 配置语言模型
        if rospy.has_param(self._lm_param):
            lm = rospy.get_param(self._lm_param)
        else:
            rospy.logerr('请配置一个语言模型 lm.')
            return

        if rospy.has_param(self._dic_param):
            dic = rospy.get_param(self._dic_param)
        else:
            rospy.logerr('请配置一个语言字典 dic.')
            return
        
        if rospy.has_param(self._hmm_param):
            hmm = rospy.get_param(self._hmm_param)
        else:
            rospy.logerr('请配置一个语言识别模型 hmm.')
            return


        self.asr.set_property('lm', lm)   # 设置asr的语言模型
        self.asr.set_property('dict', dic)# 设置asr的字典
        self.asr.set_property('hmm', hmm) # 设置asr的识别模型
        

        self.bus = self.pipeline.get_bus()
        self.bus.add_signal_watch()
        self.bus_id = self.bus.connect('message::application', self.application_message)
        self.pipeline.set_state(gst.STATE_PLAYING)
        self.started = True

    # 解析 麦克风名称 得到 麦克风ID
    def pulse_index_from_name(self, name):
        output = commands.getstatusoutput("pacmd list-sources | grep -B 1 'name: <" + name + ">' | grep -o -P '(?<=index: )[0-9]*'")

        if len(output) == 2:
            return output[1]
        else:
            raise Exception("Error. pulse index doesn't exist for name: " + name)
        
    # 停止识别器
    def stop_recognizer(self):
        if self.started:
            self.pipeline.set_state(gst.STATE_NULL)
            self.pipeline.remove(self.asr)
            self.bus.disconnect(self.bus_id)
            self.started = False
    # 程序关闭
    def shutdown(self):
        """ 删除所有的参数,以防影响下次启动"""
        for param in [self._device_name_param, self._lm_param, self._dic_param]:
            if rospy.has_param(param):
                rospy.delete_param(param)

        """ 关闭 GTK 进程. """
        gtk.main_quit()
    # 开始
    def start(self, req):
        self.start_recognizer()
        rospy.loginfo("识别器启动")
        return EmptyResponse()
    # 停止
    def stop(self, req):
        self.stop_recognizer()
        rospy.loginfo("识别器停止")
        return EmptyResponse()
    
    def asr_partial_result(self, asr, text, uttid):
        """前线部分结果到主线程. """
        struct = gst.Structure('partial_result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def asr_result(self, asr, text, uttid):
        """ 前线结果到主线程 """
        struct = gst.Structure('result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def application_message(self, bus, msg):
        """ 从总线上接收应用数据. """
        msgtype = msg.structure.get_name()
        if msgtype == 'partial_result':
            self.partial_result(msg.structure['hyp'], msg.structure['uttid'])
        if msgtype == 'result':
            self.final_result(msg.structure['hyp'], msg.structure['uttid'])
    # 部分结果
    def partial_result(self, hyp, uttid):
        """ Delete any previous selection, insert text and select it. """
        rospy.logdebug("Partial: " + hyp)
        
    # 最终结果
    def final_result(self, hyp, uttid):
        """ Insert the final result. """
        msg = String()# 话题消息类型
        msg.data = str(hyp)# 识别语音对于成的文字
        rospy.loginfo(msg.data)
        self.pub.publish(msg)

if __name__ == "__main__":
    start = recognizer()
    gtk.main()
 3) 创建新的语音单词
  a) 创建语音单词语句文件  一行一句 的 txt文件
   例如:
    roscd rbx1_speech/config
    more nav_commands.txt
pause speech
continue speech
move forward
move backward
move back
move left
move right
...
## 中文 
voice_ctr.txt
前进
后退
左转
右转
向左转
向右转
停止
加速
减速
   b) 编译生成语音库
    通过在线的一个 语言模型(lm)生成
    http://www.speech.cs.cmu.edu/tools/lmtool-new.html
    上传语言文件  Upload a sentence corpus file: Browse
    在线编译     COMPILE KNOWLEDGE BASE
    下载         编译好的文件

使用 .dic 字典文件 音节/音素 字典文件
     .lm  语言默默文件 出现的概率

注意中文 .dic文件是空的
需要自己生成

使用别人生成好的比较全的.dic文件查找自己定义的单词的 音节
例如:
cd ewenwan/catkin_ws/src/voice_system/model/lm/zh/zh_CN/
grep 停止 mandarin_notone.dic
>>>
停止	t ing zh ib
停止听写	t ing zh ib t ing x ie
停止录音	t ing zh ib l u y in
停止注水	t ing zh ib zh u sh ui
呼吸停止	h u x i t ing zh ib
自动停止	z if d ong t ing zh ib
# 编写 词典文件
# voice_ctr.dic
前进	t ing zh ib
后退	h ou t ui
左转	z uo zh uan
右转	y uo zh uan
向左转	x iang z uo zh uan
向右转	x iang y uo zh uan
停止	t ing zh ib
加速	j ia s u
减速	j ian s u
   c) 编写自己的launch启动文件

voice_nav_commands.launch

<launch>
 <node name="recognizer" pkg="pocketsphinx" type="recognizer.py" output="screen"> #识别器
  <param name="lm" value="$(find rbx1_speech)/config/nav_commands.lm"/>           #新的语言模型
  <param name="dict" value="$(find rbx1_speech)/config/nav_commands.dic"/>        #新的语言单词库
 </node>
</launch>

my_voice_asr.launch

<launch>
  <node name="recognizer" pkg="voice_system" type="my_recognizer.py" output="screen">   
    <param name="lm" value="$(find voice_system)/model/lm/zh/zh_MY/voice_ctr.lm"/>
    <param name="dict" value="$(find voice_system)/model/lm/zh/zh_MY/voice_ctr.dic"/>
    <param name="hmm" value="$(find voice_system)/model/hmm/zh/tdt_sc_8k"/>
</launch>
  d) 运行测试
         
   roslaunch voice_system my_voice_asr.launch
   rostopic  echo /recognizer/output

4) 编写语音控制 文件

      需要另外一个节点 订阅 /recognizer/output话题 得到上的文本控制命令
      根据文本控制命令,给出实际的 机器人控制指令:
   如简单的发布到 cmd_vel话题上的 Twist移动指令,
   或者导航命令 move_base_msgs.msg 的 MoveBaseGoal 类型指令
# voice_cmd_vel.py   发布  cmd_vel/Twist 命令
import roslib; roslib.load_manifest('pocketsphinx')
import rospy
import math

from geometry_msgs.msg import Twist  # cmd_vel/Twist 
from std_msgs.msg import String

class voice_cmd_vel:

    def __init__(self): # initialized
        rospy.on_shutdown(self.cleanup) # auto shutdown  清零速度
        self.speed = 0.2                # 默认速度和速度标志 
        self.msg = Twist()              # cmd_vel type

        # publish to cmd_vel, subscribe to speech output
        self.pub_ = rospy.Publisher('cmd_vel', Twist)  # 发布的命令话题
        #                                      类型      回调函数
        rospy.Subscriber('recognizer/output', String, self.speechCb) # 订阅的 识别后的文本控制命令

        r = rospy.Rate(10.0)
        while not rospy.is_shutdown():
            self.pub_.publish(self.msg) #发布控制命令
            r.sleep()
        
    def speechCb(self, msg):
        rospy.loginfo(msg.data) #终端显示文本命令

        if msg.data.find("full speed") > -1: #全速
            if self.speed == 0.2:
                self.msg.linear.x = self.msg.linear.x*2
                self.msg.angular.z = self.msg.angular.z*2
                self.speed = 0.4
        if msg.data.find("half speed") > -1: #半速
            if self.speed == 0.4:
                self.msg.linear.x = self.msg.linear.x/2
                self.msg.angular.z = self.msg.angular.z/2
                self.speed = 0.2

        if msg.data.find("forward") > -1:    #直线前进
            self.msg.linear.x = self.speed   #默认速度0.2 或0.4
            self.msg.angular.z = 0
        elif msg.data.find("left") > -1:     #左转
            if self.msg.linear.x != 0:
                if self.msg.angular.z < self.speed:
                    self.msg.angular.z += 0.05
            else:        
                self.msg.angular.z = self.speed*2 #线速度为零 左转为逆时针 antcolockwise 逆时针为正方向
        elif msg.data.find("right") > -1:   #右转
            if self.msg.linear.x != 0:
                if self.msg.angular.z > -self.speed:
                    self.msg.angular.z -= 0.05
            else:        
                self.msg.angular.z = -self.speed*2 #线速度为零 右转为顺时针 colockwise 顺时针为反方向
        elif msg.data.find("back") > -1:   #后退
            self.msg.linear.x = -self.speed #线速度取反
            self.msg.angular.z = 0
        elif msg.data.find("stop") > -1 or msg.data.find("halt") > -1:     #停止      
            self.msg = Twist()              #速度清零
        
        self.pub_.publish(self.msg)         #发布速度命令

    def cleanup(self):
        # stop the robot!
        twist = Twist()
        self.pub_.publish(twist)

if __name__=="__main__":
    rospy.init_node('voice_cmd_vel')
    try:
        voice_cmd_vel()
    except:
        pass
     # voice_nav.py 语言导航  还是 cmd_vel/Twist 消息
import rospy
from geometry_msgs.msg import Twist #cmd_vel/Twist 消息
from std_msgs.msg import String     #语音识别
from math import copysign           # 克隆符号copysign(x,y) 将 y的正负号给x  copysign(1,-5)=-1
class VoiceNav:
           #初始化
    def __init__(self):
        rospy.init_node('voice_nav')    #创建节点    
        rospy.on_shutdown(self.cleanup) #自动退出       
        # Set a number of parameters affecting the robot's speed
        # 参数设置 可以 rospara set修改 或者 launch文件修改
        self.max_speed = rospy.get_param("~max_speed", 0.4)
        self.max_angular_speed = rospy.get_param("~max_angular_speed", 1.5)
        self.speed = rospy.get_param("~start_speed", 0.1)
        self.angular_speed = rospy.get_param("~start_angular_speed", 0.5)
        self.linear_increment = rospy.get_param("~linear_increment", 0.05)
        self.angular_increment = rospy.get_param("~angular_increment", 0.4)
        
        # 执行频率
        self.rate = rospy.get_param("~rate", 5)
        r = rospy.Rate(self.rate)        
        # 是否停止语言控制的标志 
        self.paused = False        
        # 初始化命令消息 Initialize the Twist message we will publish.
        self.cmd_vel = Twist()
        # 发布命令Publish the Twist message to the cmd_vel topic
        self.cmd_vel_pub = rospy.Publisher('cmd_vel', Twist, queue_size=5)       
        # 订阅语音识别的话题获得文本命令 Subscribe to the /recognizer/output topic to receive voice commands.
        #                                                回调函数
        rospy.Subscriber('/recognizer/output', String, self.speech_callback)
        
        # 关键字到命令的映射 字典 A mapping from keywords or phrases to commands
        self.keywords_to_command = {'stop': ['stop', 'halt', 'abort', 'kill', 'panic', 'off', 'freeze', 'shut down', 'turn off', 'help', 'help me'],
                                    'slower': ['slow down', 'slower'],
                                    'faster': ['speed up', 'faster'],
                                    'forward': ['forward', 'ahead', 'straight'],
                                    'backward': ['back', 'backward', 'back up'],
                                    'rotate left': ['rotate left'],
                                    'rotate right': ['rotate right'],
                                    'turn left': ['turn left'],
                                    'turn right': ['turn right'],
                                    'quarter': ['quarter speed'],
                                    'half': ['half speed'],
                                    'full': ['full speed'],
                                    'pause': ['pause speech'],
                                    'continue': ['continue speech']}
        
        rospy.loginfo("Ready to receive voice commands")
        
        # We have to keep publishing the cmd_vel message if we want the robot to keep moving.
        while not rospy.is_shutdown():
            self.cmd_vel_pub.publish(self.cmd_vel) #发布速度命令
            r.sleep()                       
            
    def get_command(self, data):
        # Attempt to match the recognized word or phrase to the 
        # keywords_to_command dictionary and return the appropriate
        # command
        for (command, keywords) in self.keywords_to_command.iteritems():
            for word in keywords:
                if data.find(word) > -1: #如果找到包含相应关键字的语言 就返回对应的命令
                    return command
        
    def speech_callback(self, msg):
        # Get the motion command from the recognized phrase 得到命令
        command = self.get_command(msg.data)        
        # Log the command to the screen
        rospy.loginfo("Command: " + str(command)) # 终端回显        
        # If the user has asked to pause/continue voice control,
        # set the flag accordingly  语音控制 标志
        if command == 'pause':
            self.paused = True
        elif command == 'continue':
            self.paused = False
        
        # If voice control is paused, simply return without
        # performing any action
        if self.paused:  #不使用语言控制 则返回
            return       
        
        # The list of if-then statements should be fairly
        # self-explanatory
        if command == 'forward':    
            self.cmd_vel.linear.x = self.speed #前进
            self.cmd_vel.angular.z = 0
            
        elif command == 'rotate left':
            self.cmd_vel.linear.x = 0
            self.cmd_vel.angular.z = self.angular_speed  # 原地左转 为正方向 速度为正值
                
        elif command == 'rotate right':  
            self.cmd_vel.linear.x = 0      
            self.cmd_vel.angular.z = -self.angular_speed # 原地右转 为负值
            
        elif command == 'turn left':                     #向左转
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.angular.z += self.angular_increment
            else:        
                self.cmd_vel.angular.z = self.angular_speed
                
        elif command == 'turn right':                    #向右转
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.angular.z -= self.angular_increment
            else:        
                self.cmd_vel.angular.z = -self.angular_speed
                
        elif command == 'backward':                      #后退
            self.cmd_vel.linear.x = -self.speed          #为负数值
            self.cmd_vel.angular.z = 0
            
        elif command == 'stop':                          # 停止
            # Stop the robot!  Publish a Twist message consisting of all zeros.         
            self.cmd_vel = Twist()                       #速度清零
        
        elif command == 'faster':                        #加速
            self.speed += self.linear_increment           #记录
            self.angular_speed += self.angular_increment
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.linear.x += copysign(self.linear_increment, self.cmd_vel.linear.x) #原来是负数 就更负 加速倒退:原来是正数就更大 加速前进
            if self.cmd_vel.angular.z != 0:
                self.cmd_vel.angular.z += copysign(self.angular_increment, self.cmd_vel.angular.z) #同上,确保加速正转或加速反转
            
        elif command == 'slower':
            self.speed -= self.linear_increment
            self.angular_speed -= self.angular_increment
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.linear.x -= copysign(self.linear_increment, self.cmd_vel.linear.x) #同上,确保减速前进或减速倒退
            if self.cmd_vel.angular.z != 0:
                self.cmd_vel.angular.z -= copysign(self.angular_increment, self.cmd_vel.angular.z) #同上,确保减速正转或减速反转
                
        elif command in ['quarter', 'half', 'full']:
            if command == 'quarter':
                self.speed = copysign(self.max_speed / 4, self.speed) #最大速度的 1/4 确保和原来方向一样
        
            elif command == 'half':
                self.speed = copysign(self.max_speed / 2, self.speed) #最大速度的 1/2 确保和原来方向一样
            
            elif command == 'full':
                self.speed = copysign(self.max_speed, self.speed)     #最大速度  确保和原来方向一样
            
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.linear.x = copysign(self.speed, self.cmd_vel.linear.x)

            if self.cmd_vel.angular.z != 0:
                self.cmd_vel.angular.z = copysign(self.angular_speed, self.cmd_vel.angular.z)
                
        else:
            return
        #确保速度不要超过最大速度限制
        self.cmd_vel.linear.x = min(self.max_speed, max(-self.max_speed, self.cmd_vel.linear.x)) 
        self.cmd_vel.angular.z = min(self.max_angular_speed, max(-self.max_angular_speed, self.cmd_vel.angular.z))

    def cleanup(self):
        # When shutting down be sure to stop the robot!
        twist = Twist()
        self.cmd_vel_pub.publish(twist)
        rospy.sleep(1)

if __name__=="__main__":
    try:
        VoiceNav()
        rospy.spin()
    except rospy.ROSInterruptException:
        rospy.loginfo("Voice navigation terminated.")
# 中文控制命令
#!/usr/bin/env python
# -*- coding:utf-8 -*-
"""
根据语言识别结果的发布话题上的控制命令,发布速度命令到cmd_vel话题上,地盘接收到速度命令控制电机转动前进
"""

import rospy
from geometry_msgs.msg import Twist # 速度命令消息类型
from std_msgs.msg import String
from math import copysign           # 

# 自定义语音控制的类
class VoiceNav:
    def __init__(self):
        rospy.init_node('voice_nav') # 节点名
        
        rospy.on_shutdown(self.cleanup)
        
        # 一些参数,launch文件或配置文件可修改
        self.max_speed = rospy.get_param("~max_speed", 0.8)                # 最大线速度 m/s
        self.max_angular_speed = rospy.get_param("~max_angular_speed", 1.5)# 最大角速度 
        self.speed = rospy.get_param("~start_speed", 0.3)                  # 前进 开始速度
        self.angular_speed = rospy.get_param("~start_angular_speed", 0.5)  # 旋转 开始角速度
        self.linear_increment = rospy.get_param("~linear_increment", 0.05) # 线速度  每次增加
        self.angular_increment = rospy.get_param("~angular_increment", 0.4)# 角速度  每次增加
        
        # 发布频率
        self.rate = rospy.get_param("~rate", 5)# 指令发布频率
        r = rospy.Rate(self.rate)       
        # 语音控制  标志  默认开启
        self.paused = False      
        # 初始化要发布在cmd_vel话题上的速度类型
        self.cmd_vel = Twist()
        # 发布话题                            话题名    消息类型     队列大小
        self.cmd_vel_pub = rospy.Publisher('cmd_vel', Twist, queue_size=5)       
        # 订阅语言识别发布话题                     类型     回调函数
        rospy.Subscriber('/recognizer/output', String, self.speech_callback)
        
        # 控制命令 关键字 
        self.keywords_to_command = {'停止': ['停止', '停下来', '停止 前进', '关闭', '睡觉', '停', '禁止', '关闭', 'help me'],
                                    '减速': ['减慢', '慢', '减慢 速度', '变慢', '慢下来', '慢速', '减速'],
                                    '加速': ['加速', '快', '加快 速度', '加快', '加速上去', '加速 前进'],
                                    '前进': ['前进', '向前 移动', '向前 走', '过来', '全速 前进'],
                                    '后退': ['后退', '向后 移动', '向后 走', '倒退','全速 倒退'],
                                    '正转': ['正转', '正向 自转', '正向 旋转', '原地 自转', '转圈', '正向'],
                                    '反转': ['反转', '反向 自转', '反向 旋转', '原地 反转', '反向'],
                                    '左转': ['向左 转弯', '左转', '向左 移动', '向左 走', '左 转弯'],
                                    '右转': ['向右 转弯', '右转', '向右 移动', '向左 走', '左 转弯'],
                                    '当前位置': ['位置', '当前 位置', '报告 位置', '汇报 位置'],
                                    '当前任务': ['任务', '当前 任务', '报告 任务', '回报 任务'],
                                    '当前状态': ['当前状态', '状态'],
                                    '关语控': ['关闭 语音 控制', '禁用 语音 控制'],
                                    '开语控': ['打开 语音 控制', '启用 语音 控制']}
        
        rospy.loginfo("已经准备好接受语言控制命令")
        
        # 按发送频率发送命令
        while not rospy.is_shutdown():
            self.cmd_vel_pub.publish(self.cmd_vel)
            r.sleep()                       
    #  查找命令      
    def get_command(self, data):
        # 在 控制命令 关键字 中查找 有关控制命令
        for (command, keywords) in self.keywords_to_command.iteritems():
            for word in keywords:
                if data.find(word) > -1:
                    return command
                
    # 回调函数  根据语言命令 执行 发送 速度指令 或其他内容    
    def speech_callback(self, msg):
        # 从语音识别话题上 得到指令
        command = self.get_command(msg.data)# 语言识别话题上的消息        
        # 显示 语言识别信息
        rospy.loginfo("Command: " + str(command))
        
        # 语音控制 开关
        if command == '关语控':
            self.paused = True
        elif command == '开语控':
            self.paused = False
        
        if self.paused:
            return       
        
        if command == '前进':    
            self.cmd_vel.linear.x = self.speed
            self.cmd_vel.angular.z = 0
            
        elif command == '反转':
            self.cmd_vel.linear.x = 0
            self.cmd_vel.angular.z = self.angular_speed
                
        elif command == '正转':  
            self.cmd_vel.linear.x = 0      
            self.cmd_vel.angular.z = -self.angular_speed
            
        elif command == '左转':
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.angular.z += self.angular_increment
            else:        
                self.cmd_vel.angular.z = self.angular_speed
                
        elif command == '右转':    
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.angular.z -= self.angular_increment
            else:        
                self.cmd_vel.angular.z = -self.angular_speed
                
        elif command == '后退':
            self.cmd_vel.linear.x = -self.speed
            self.cmd_vel.angular.z = 0
            
        elif command == '停止': 
            # Stop the robot!  Publish a Twist message consisting of all zeros.         
            self.cmd_vel = Twist()
        
        elif command == '加速':
            self.speed += self.linear_increment
            self.angular_speed += self.angular_increment
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.linear.x += copysign(self.linear_increment, self.cmd_vel.linear.x)
            if self.cmd_vel.angular.z != 0:
                self.cmd_vel.angular.z += copysign(self.angular_increment, self.cmd_vel.angular.z)
            
        elif command == '减速':
            self.speed -= self.linear_increment
            self.angular_speed -= self.angular_increment
            if self.cmd_vel.linear.x != 0:
                self.cmd_vel.linear.x -= copysign(self.linear_increment, self.cmd_vel.linear.x)
            if self.cmd_vel.angular.z != 0:
                self.cmd_vel.angular.z -= copysign(self.angular_increment, self.cmd_vel.angular.z)


        else:
            return

        self.cmd_vel.linear.x = min(self.max_speed, max(-self.max_speed, self.cmd_vel.linear.x))
        self.cmd_vel.angular.z = min(self.max_angular_speed, max(-self.max_angular_speed, self.cmd_vel.angular.z))

    def cleanup(self):
        # 停止速度
        twist = Twist()
        self.cmd_vel_pub.publish(twist)
        rospy.sleep(1)

if __name__=="__main__":
    try:
        VoiceNav()
        rospy.spin()
    except rospy.ROSInterruptException:
        rospy.loginfo("语言导航结束")
  1. 进行语言控制实验
a) 仿真实验
	1. 运行机器人   roslaunch rbx1_bringup fake_turtlebot.launch
	2. rviz       rosrun rviz rviz -d `rospack find rbx1_nav`/sim.rviz
	3. 控制台      rqt_console &
	4. 运行语音识别 roslaunch rbx1_speech voice_nav_commands.launch
		     或者中文 roslaunch voice_system my_voice_asr.launch
	5.运行文本控制 roslaunch rbx1_speech turtlebot_voice_nav.launch
		     或者中文 roslaunch voice_system my_sim_turtlebot_voice_nav.launch


b) 实体机器人实验
	1. 运行机器人   roslaunch rbx1_bringup turtlebot_minimal_create.launch
	2. 控制台       rqt_console &
	3. 运行语音识别 roslaunch rbx1_speech voice_nav_commands.launch
	4. 运行文本控制 roslaunch rbx1_speech turtlebot_voice_nav.launch
c) 视频解说
 http://www.youtube.com/watch?v=10ysYZUX_jA

#############################################################
#################################################################

2、语音合成 Festival sound_play 是 ros-indigo-audio-common 的一部分

  1) 安装
   sudo apt-get install ros-indigo-audio-common
   sudo apt-get install libasound2
  2) 测试
     运行服务节点  rosrun sound_play soundplay_node.py
     添加发音文本  rosrun sound_play say.py "Greetings Humans. Take me to your leader."
     默认发音: voice_kal_diphone
     
  3) 添加新的发音库 
     a) 查看已经安装的发音库
       ls /usr/share/festival/voices/english >>> 就一个 voice_kal_diphone
     b) 查看全部的 Festival语言库
       sudo apt-cache search --names-only festvox-*
   
            >>>
festvox-don - minimal British English male speaker for festival
festvox-ellpc11k - Castilian Spanish male speaker for Festival
festvox-en1 - mbrola-en1 voice support for festival
festvox-rablpc16k - British English male speaker for festival, 16khz sample rate
festvox-rablpc8k - British English male speaker for festival, 8khz sample rate
festvox-us1 - mbrola-us1 voice support for festival
festvox-us2 - mbrola-us2 voice support for festival
festvox-us3 - mbrola-us3 voice support for festival
festvox-ca-ona-hts - Catalan female speaker for festival, 16kHz HTS
festvox-czech-dita - Czech adult female speaker "dita" for Festival
festvox-czech-krb - Czech child male speaker "krb" for Festival
festvox-czech-machac - Czech adult male speaker "machac" for Festival
festvox-czech-ph - Czech male speaker for Festival
festvox-hi-nsk - Hindi male speaker for festival
festvox-italp16k - Italian female speaker for Festival
festvox-itapc16k - Italian male speaker for Festival
festvox-kallpc16k - American English male speaker for festival, 16khz sample rate
festvox-kallpc8k - American English male speaker for festival, 8khz sample rate
festvox-kdlpc16k - American English male speaker for festival, 16khz sample rate
festvox-kdlpc8k - American English male speaker for festival, 8khz sample rate
festvox-mr-nsk - Marathi male speaker for festival
festvox-ru - Russian male speaker for Festival
festvox-suopuhe-common - Common files for Festival Finnish speakers
festvox-suopuhe-lj - Finnish female speaker for Festival
festvox-suopuhe-mv - Finnish male speaker for festival
festvox-te-nsk - Telugu (te) male speaker for festival
     c) 安装指定的语音库
        sudo apt-get install festvox-don
     d) 测试新的发音 指定发音源
        rosrun sound_play say.py "Welcome to the future" voice_don_diphone
     e) 播放语言文件 声音
        rosrun sound_play play.py `rospack find rbx1_speech`/sounds/R2D2a.wav
     f) 播放内建音乐 数字1~5
        rosrun sound_play playbuiltin.py 4
        
    4) 使用ros节点
# talkback.py
import rospy
from std_msgs.msg import String
from sound_play.libsoundplay import SoundClient # SoundClient 客户端  
import sys

class TalkBack:
    def __init__(self, script_path): #初始化 类
        rospy.init_node('talkback')  #新建节点
        rospy.on_shutdown(self.cleanup)#自尽
        
        # Set the default TTS voice to use  声音源
        self.voice = rospy.get_param("~voice", "voice_don_diphone")       
        # Set the wave file path if used    文件路径
        self.wavepath = rospy.get_param("~wavepath", script_path + "/../sounds")
        
        # Create the sound client object
        self.soundhandle = SoundClient() #客户端
        
        # Wait a moment to let the client connect to the
        # sound_play server
        rospy.sleep(1)
        
        # Make sure any lingering sound_play processes are stopped.
        self.soundhandle.stopAll()      # 终止发音
        
        # Announce that we are ready for input
        self.soundhandle.playWave(self.wavepath + "/R2D2a.wav") #播放声音文件
        rospy.sleep(1)
        self.soundhandle.say("Ready", self.voice)               #发音
        
        rospy.loginfo("Say one of the navigation commands...")

        # Subscribe to the recognizer output and set the callback function
        rospy.Subscriber('/recognizer/output', String, self.talkback) #订阅语音识别话题
        
    def talkback(self, msg):
        # Print the recognized words on the screen
        rospy.loginfo(msg.data)      #回显   
        # Speak the recognized words in the selected voice
        self.soundhandle.say(msg.data, self.voice)   #发音  
        # Uncomment to play one of the built-in sounds
        #rospy.sleep(2)
        #self.soundhandle.play(5)
        
        # Uncomment to play a wave file
        #rospy.sleep(2)
        #self.soundhandle.playWave(self.wavepath + "/R2D2a.wav")

    def cleanup(self):
        self.soundhandle.stopAll()
        rospy.loginfo("Shutting down talkback node...")

if __name__=="__main__":
    try:
        TalkBack(sys.path[0])
        rospy.spin()
    except rospy.ROSInterruptException:
        rospy.loginfo("Talkback node terminated.")

         
        5) 测试
         roslaunch rbx1_speech talkback.launch

二、国内语音识别

1、科大讯飞 TTS 部分

 进去官网 http://www.xfyun.cn/index.php/mycloud/app/appManage
 注册帐号下载 SDK 解压

 示例目录下:
 /voice/samples/tts_sample
 含有 .c源文件
 一个 编译文件 Makefile
 
 # common makefile header
DIR_INC = ../../include  # 包含文件 路径变量
DIR_BIN = ../../bin      # 二进制文件库 路径变量
DIR_LIB = ../../libs     # 依赖库 路径变量
# executive file name
TARGET	= tts_sample     # 目标文件 编译生成的可执行文件名
BIN_TARGET = $(DIR_BIN)/$(TARGET) # 可执行文件保存路径 在bin目录下

CROSS_COMPILE =                   # 交叉编译
CFLAGS = -g -Wall -I$(DIR_INC)

# 根据 系统设置想要的 可执行文件依赖 路径  可直接修改为(本系统64位)  LDFLAGS := -L$(DIR_LIB)/x64
ifdef LINUX64
LDFLAGS := -L$(DIR_LIB)/x64
else
LDFLAGS := -L$(DIR_LIB)/x86 
endif

# 编译系统依赖
LDFLAGS += -lmsc -lrt -ldl -lpthread 

OBJECTS := $(patsubst %.c,%.o,$(wildcard *.c))

$(BIN_TARGET) : $(OBJECTS)
	$(CROSS_COMPILE)gcc $(CFLAGS) $^ -o $@ $(LDFLAGS)
# executive file type
%.o : %.c
	$(CROSS_COMPILE)gcc -c $(CFLAGS) $< -o $@
clean:
	@rm -f *.o $(BIN_TARGET)

.PHONY:clean
64bit_make.sh  编译环境设置脚本
#common makefile foot 
  #编译64位可执行文件
make clean;make LINUX64=1  #设置64为标志
#设置libmsc.so库搜索路径
export LD_LIBRARY_PATH=$(pwd)/../../libs/x64/   #拓展搜索库路径

省去 source 64bit_make.sh
     将 /lib/x64/libmsc.so 拷贝到系统搜索库路径下
    sudo  cp /libs/x64/libmsc.so /usr/bin

    并修改 Makefile 文件


2、将科大 tts 应用到ros节点上
 1)创建包
   cd catkin_ws/src  
   catkin_creat_pkg voice_system roscpp rospy std_msgs
   将科大迅飞 包下 include目录下 的 头文件拷贝到 刚刚创建的包voice下的include内
    msp_cmn.h
    msp_errors.h
    msp_types.h
    qise.h
    qisr.h
    qtts.h
  2)创建节点文件
    根据  tts.sample.c文件修改创建  kdxf_ttf.cpp文件
    主要就是订阅一个自定义话题上的文本消息 传给科大tts 播放wav文件
        /*
* 语音合成(Text To Speech,TTS)技术能够自动将任意文字实时转换为连续的
* 自然语音,是一种能够在任何时间、任何地点,向任何人提供语音信息服务的
* 高效便捷手段,非常符合信息时代海量数据、动态更新和个性化查询的需求。
*/
#include "ros/ros.h"         //ros系统头文件
#include "std_msgs/String.h" 
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#include "qtts.h"
#include "msp_cmn.h"
#include "msp_errors.h"

const char* filename = "/home/ewenwan/ME/music/music.wav";
const char* playwavpath = "play /home/ewenwan/ME/music/music.wav";
/* wav音频头部格式 */
typedef struct _wave_pcm_hdr
{
	char            riff[4];                // = "RIFF" RIFF(Resource Interchange File Format) 资源交换 文件规范
	int		        size_8;                 // = FileSize - 8 从下个地址开始到文件尾的总字节数
	char            wave[4];                // = "WAVE"  WAV文件标志(WAVE)
	char            fmt[4];                 // = "fmt "  波形格式标志(fmt ),最后一位空格。
	int		        fmt_size;		        // = 下一个结构体的大小 : 16  过滤字节(一般为00000010H,16d),若为00000012H则说明数据头携带附加信息(见“附加信息”)。

	short int       format_tag;             // = PCM : 1       格式种类(值为1时,表示数据为线性PCM编码)
	short int       channels;               // = 通道数 : 1     通道数,单声道为1,双声道为2
	int		        samples_per_sec;        // = 采样率 : 8000 | 6000 | 11025 | 16000
	int		        avg_bytes_per_sec;      // = 每秒字节数 : samples_per_sec * bits_per_sample / 8
	short int       block_align;            // = 每采样点字节数 : wBitsPerSample / 8
	short int       bits_per_sample;        // = 量化比特数: 8 | 16

	char            data[4];                // = "data";
	int		        data_size;              // = 纯数据长度 : FileSize - 44 
} wave_pcm_hdr;

/* 默认wav音频头部数据 */
wave_pcm_hdr default_wav_hdr = 
{
	{ 'R', 'I', 'F', 'F' }, //文件规范
	0,                      //后面数据的大小(前面有4个字节)
	{'W', 'A', 'V', 'E'},   //文件格式
	{'f', 'm', 't', ' '},   //波形格式标志 最后一位 空
	16,                     //过滤字节数
	1,                      //格式种类(值为1时,表示数据为线性PCM编码)
	1,                      //通道数,单声道为1,双声道为2
	16000,                  //采样率 : 8000 | 6000 | 11025 | 16000
	32000,                  //每秒字节数 : samples_per_sec * bits_per_sample / 8
	2,                      //每采样点字节数 : wBitsPerSample / 8
	16,                     //量化比特数: 8 | 16
	{'d', 'a', 't', 'a'},   //"data";
	0                       //纯音频数据长度 数据长度 : FileSize - 44 (前面有40个字节)
};
/* 文本合成 函数  text, filename, session_begin_params */
int text_to_speech(const char* src_text, const char* des_path, const char* params)
{
	int          ret          = -1;    //返回参数
	FILE*        fp           = NULL;  //文件句柄 文件头
	const char*  sessionID    = NULL;  //
	unsigned int audio_len    = 0;
	wave_pcm_hdr wav_hdr      = default_wav_hdr; //wav文件头
	int          synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;

	if (NULL == src_text || NULL == des_path)  //原文本空或者 生成的文件名路径为空 返回
	{
		printf("params is error!\n");
		return ret;
	}
	fp = fopen(des_path, "wb");                //二进制格式写入
	if (NULL == fp)
	{
		printf("open %s error.\n", des_path);  //打开文件失败
		return ret;
	}
	/* 开始合成 合成语音参数 标志*/
	sessionID = QTTSSessionBegin(params, &ret);//开始一次语音合成,分配语音合成资源。
	if (MSP_SUCCESS != ret)                    //参数不合规范
	{
		printf("QTTSSessionBegin failed, error code: %d.\n", ret);
		fclose(fp);                           //关闭文件
		return ret;
	}
	  // int MSPAPI QTTSTextPut 	( const char * sessionID, const char * textString, unsigned int  textLen, const char *  params ) 	
	  // 由QTTSSessionBegin返回的句柄  字符串指针,指向待合成的文本字符串    合成文本长度,最大支持8192个字节(2730个汉字)   本次合成所用的参数,只对本次合成的文本有效。      
	ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);//写入待合成的文本。
	if (MSP_SUCCESS != ret)
	{
		printf("QTTSTextPut failed, error code: %d.\n",ret);
		QTTSSessionEnd(sessionID, "TextPutError"); //发生错误的话 就结束本次语音合成。 
		fclose(fp);                                //关闭文件
		return ret;
	}
	printf("正在合成 ...\n");
	//写入文件  头信息     大小    写入次数   文件名
	fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头,使用采样率为16000
	while (1) 
	{
		/* 获取合成音频 */ //    由QTTSSessionBegin返回的句柄   合成音频长度,单位字节   合成音频状态  成功与否 
		const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret); //获取合成音频。 
		if (MSP_SUCCESS != ret) //为成功
			break;
		if (NULL != data)       //合成的音频有内容
		{
			fwrite(data, audio_len, 1, fp);        //写入内容 长度 次数 文件名
		    wav_hdr.data_size += audio_len;        //总音频长度计算data_size大小  用于记录头文件 size_8 大小  
		}
		if (MSP_TTS_FLAG_DATA_END == synth_status) //合成的音频已经取完
			break;
		printf(">");
		usleep(150*1000);                          //防止频繁占用CPU
	}
	printf("\n");
	if (MSP_SUCCESS != ret)                        //获取音频未成功
	{
		printf("QTTSAudioGet failed, error code: %d.\n",ret);
		QTTSSessionEnd(sessionID, "AudioGetError");//发生错误的话 就结束本次语音合成。
		fclose(fp);
		return ret;
	}
	/* 修正wav文件头数据的大小 */
	wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8); //头文件长度+音频长度 -8 
	
	/* 将修正过的数据写回文件头部,音频文件为wav格式 */
	fseek(fp, 4, 0);  //偏移4个字节 从第4个字节开始写入 size_8
	fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp);       //写入size_8的值
	fseek(fp, 40, 0); //偏移4个字节                                //将文件指针偏移到存储data_size值的位置
	fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值
	fclose(fp);                                                  //关闭文件
	fp = NULL;
	/* 合成完毕 */
	ret = QTTSSessionEnd(sessionID, "Normal");                   //结束本次语音合成。
	if (MSP_SUCCESS != ret)
	{
		printf("QTTSSessionEnd failed, error code: %d.\n",ret);
	}

	return ret;
}
/*
make topic Text To Wave file
*/
int makeTextToWave(const char* text, const char* filename)
{
	int         ret                  = MSP_SUCCESS;                     //默认返回参数
	/* appid 	应用ID。*/
	const char* login_params         = "appid = 58dbcf6e, work_dir = .";//登录参数,appid与msc库绑定,请勿随意改动
	/*
	* rdn:           合成音频数字发音方式 0:数值优先(车牌号报读),1:完全数值(1000 一千),2:完全字符串,3:字符串优先。 默认为0 
	* volume:        合成音频的音量     [0,100],数值越大音量越大。默认为50 
	* pitch:         合成音频的音调     [0,100],数值越大音调越高。默认为50 
	* speed:         合成音频对应的语速 [0,100],数值越大语速越快。默认为50 
	* voice_name:    合成发音人        xiaoyan yanping jinger yufeng donaldduck babyxu nannan xiaoqian(东北话)
	* sample_rate:   合成音频采样率     
	* text_encoding: 合成文本编码格式   GB2312;GBK;BIG5;UNICODE;GB18030;UTF8 
	* background_sound 	合成音频中的背景音 0:无背景音乐 1:有背景音乐。默认为0 
	* 详细参数说明请参阅《讯飞语音云MSC--API文档》
	*/
	const char* session_begin_params = "voice_name = donaldduck, text_encoding = utf8, sample_rate = 16000, speed = 50, volume = 50, pitch = 50, rdn = 0";
	//const char* filename             = "tts_sample.wav"; //合成的语音文件名称
	//const char* text                 = "大家好,我叫小明,车牌号沪A1005,两套房子,五百万存款"; //合成文本
	/* 用户登录 int MSPAPI MSPLogin ( const char *  usr,const char *  pwd,const char *  params )*/
	ret = MSPLogin(NULL, NULL, login_params);//第一个参数是用户名,第二个参数是密码,第三个参数是登录参数,用户名和密码可在http://www.xfyun.cn注册获取
	if (MSP_SUCCESS != ret)
	{
		printf("MSPLogin failed, error code: %d.\n", ret);
		goto exit ;//登录失败,退出登录
	}
	
		/* 文本合成 */
	  printf("开始合成 ...\n");
	  /*要合成语音的文本  合成后的语音文件wav名字  合成语音的参数*/
	  ret = text_to_speech(text, filename, session_begin_params);
	  if (MSP_SUCCESS != ret)
	   {
		printf("text_to_speech failed, error code: %d.\n", ret);
	   }
	   else{
	     printf("合成完毕\n");
	    }
exit:
	MSPLogout(); //退出登录
	
	//return 0;
}
/*播放 wav文件*/
void playWav()
{
	system(playwavpath); // 需要安装 sudo apt-get install sox
}
  /*subcribe topic callbace function*/
void chatterCallback(const std_msgs::String::ConstPtr&  msg)
{
    //ROS_INFO("I heard: [%s]", msg->data.c_str());
	//const char* text=msg->data.c_str();
    std::cout << " I hear topic text:" << msg->data.c_str() << std::endl;
	makeTextToWave( msg->data.c_str() , filename );
	playWav();
	ROS_INFO("READY TO DO TTS");
}
   
int main(int argc, char** argv)
{
	 const char* start_voice ="科大讯飞在线语音合成启动成功";
	 makeTextToWave( start_voice , filename );
	 playWav();
	 ROS_INFO("READY TO TTS");
	 
	 ros::init(argc, argv, "tts"); //初始化ros系统 ,在roscore节点管理器注册节点
     ros::NodeHandle nhd;                   //节点句柄
	 //节点创建一个发布者
     //ros::Publisher chatter_pub = nh.advertise<std_msgs::String>("pub_hello_ros", 1000);
	 //创建一个订阅者sub     节点句柄         话题            缓存区    函数指针   &callbackfunc 得到
     ros::Subscriber sub = nhd.subscribe("voice/tts", 100, &chatterCallback);
	 
	 //ros::spin();
     ros::Rate  rate(5);              //发布频率
     while(ros::ok()){
       ros::spinOnce();               //给ROS控制权  可以调用一次回调函数
       rate.sleep();
     }	 
	return 0;
}
3)修改CMakeLists 文件
    共有三个部分:
    a) 添加 头文件搜索 .h文件

	include_directories(    #头文件路径
	  include               #cpp文件自建的头文件         
	  ${catkin_INCLUDE_DIRS})

    b) 添加编译源文件信息
       add_executable(xftts src/xf_ttf.cpp)
    c) 添加动态搜索库 .o文件  可执行文件名
       target_link_libraries(xftts ${catkin_LIBRARIES}  -lmsc -lrt -ldl -lpthread ) #系统库 和外库(由原先的 make文件得来)
		注意可能会提示找不到 -lmsc
	  可替换为完整路径:
	/home/ewenwan/ewenwan/catkin_ws/src/voice/libs/x64/libmsc.so #注意更换为自己的路径

4) 编译
   cd catkin_ws
   catkin_make
5) 依赖 
    play 播放wav文件 // 需要安装 sudo apt-get install sox
6) 实验
   rosrun  voice_system  xftts
   发布话题 数据
   rostopic pub -1 voice/xf_tts_topic  std_msgs/String "你帅呆了"

2、图灵 NLU 部分 语意理解 会结合科大tts直接对话交流

1)安装依赖库
sudo apt-get install libcurl3 libcurl4-openssl-dev    //curl httppose访问
sudo apt-get install libjsoncpp1 libjsoncpp-dev       //json字符串

2)源文件编写
cd  catkin_ws/src/voice_system
新建 tl_nlu.cpp
    /*
* 图灵 NLU 在线语意理解
*/
#include "ros/ros.h"           //ros系统头文件
#include "std_msgs/String.h" 
#include <sstream>
#include <jsoncpp/json/json.h> //json字符串
#include <curl/curl.h>         //curl http访问
#include <string>
#include <exception>

using namespace std;
//全局变量
int flag = 0;
string result;

/*解析图灵服务器返回的json字符串*/
int parseJsonResonse(string input_str)
{
	Json::Value root;
	Json::Reader reader;
	bool parsingSuccessful = reader.parse(input_str, root);
	if( !parsingSuccessful )
	{
		std::cout << "Fail to parse the response data"  << std::endl;
		return -1;
	}
    const Json::Value code = root["code"];//文本类型标识吗
    const Json::Value text = root["text"];//返回的结果
	result = text.asString();             //返回文本内容
	flag = 1;
	std::cout << "response code: " << code << std::endl;
	std::cout << "response text: " << result << std::endl;
	return 0;	
}

/*将接收到的返回数据写如内存*/
int writer(char *data, size_t size, size_t number, string *writerData)
{
   if ( writerData == NULL)
   {
	 return -1;
   }
	unsigned long len = size * number;
	writerData->append(data, len);
	return len;	
}

/*HTTP 请求*/
int HttpPostRequest(string input_str)
{
	string buffer;
	
	std::string strJson = "{" ;
	strJson += "\"key\" : \"fcabefe6a6ca48ff8c7d4f5dfccf0627\",";
	strJson += "\"info\" : ";
	strJson += "\"";
	strJson += input_str;
	strJson += "\"";
	strJson += "}";
	
	std::cout << "post json string: " << strJson << std::endl;
	try
	{
		CURL *pCurl = NULL;
		CURLcode res;                      //返回状态
		curl_global_init(CURL_GLOBAL_ALL); //初始化 pCurl
		pCurl = curl_easy_init();          //头
		
		if( NULL != pCurl)
		{
			curl_easy_setopt(pCurl, CURLOPT_TIMEOUT,5); //延迟时间
			curl_easy_setopt(pCurl, CURLOPT_URL,"http://www.tuling123.com/openapi/api");
			// http头 // 设置http发送的内容类型为JSON
			curl_slist *plist = curl_slist_append(NULL, "Content-Type:application/json;charset=UTF-8");			
			curl_easy_setopt(pCurl, CURLOPT_HTTPHEADER, plist);
			
			// 设置要POST的JSON数据
			curl_easy_setopt(pCurl, CURLOPT_POSTFIELDS,strJson.c_str());
			curl_easy_setopt(pCurl, CURLOPT_WRITEFUNCTION, &writer);
			curl_easy_setopt(pCurl, CURLOPT_WRITEDATA, &buffer);
			
			//执行http请求
			res = curl_easy_perform(pCurl);
			//检查错误
			if (res != CURLE_OK)
			{
				printf("curl_easy_perform failed:%s\n", curl_easy_strerror(res));
			}
			curl_easy_cleanup(pCurl);	//清除当前http请求		
		}
		curl_global_cleanup();          //全部清除   
	}
	catch (std::exception &ex)
	{
		printf("!!! curl exception: %s.\n", ex.what());
	}
	if(buffer.empty())
	{
		std::cout << "!!! ERROR The Tuling server response NULL" << std::endl;
	}
	else
	{
		parseJsonResonse(buffer);
	}
	return 0;
}
/*
*   当voice/tl_nlu 话题有消息时,调用HttpPostRequest向图灵服务器发送内容,返回结果。
*/
void nluCallback(const std_msgs::String::ConstPtr&  msg)
{

    std::cout << " Your question is :" << msg->data << std::endl;
    HttpPostRequest(msg->data);
	ROS_INFO("READY TO DO NLU");
}
   
int main(int argc, char** argv)
{
	 ros::init(argc, argv, "tl_nlu_node"); //初始化ros系统 ,在roscore节点管理器注册节点
     ros::NodeHandle nhd;                   //节点句柄
	 //创建一个订阅者sub     节点句柄         话题            缓存区    函数指针   &callbackfunc 得到
     ros::Subscriber sub = nhd.subscribe("voice/tl_nlu_topic", 20, &nluCallback);
	 //节点创建一个发布者
	 ros::Publisher pub = nhd.advertise<std_msgs::String>("voice/xf_tts_topic", 20);
	 ROS_INFO("READY TO DO NLU");
	 //ros::spin();
     ros::Rate  rate(10);              //频率
     while(ros::ok()){
		if(flag)                       //成功获取到返回数据
		{
			std_msgs::String msg;
			msg.data = result;
			pub.publish(msg); 
			flag = 0 ;
		}
       ros::spinOnce();               //给ROS控制权  可以调用一次回调函数
       rate.sleep();
     }	 
	return 0;
}
3)修改CMakeLists 文件

# NLU
add_executable(tlnlu src/tl_nlu.cpp)
target_link_libraries(tlnlu ${catkin_LIBRARIES} -lcurl -ljsoncpp)            #系统库 和外库 curl jsoncpp

4) 编译
cd catkin_ws
catkin_make

5) 实验
roscore
rosrun voice_system xftts    // 文本转换成语言
rosrun voice_system tlnlu    // 语意理解

rostopic pub -1 /voice/tl_nlu_topic std_msgs/String "给我说一个绕口令吧"

3、科大讯飞 ASR部分

  示例目录下:
 /voice/samples/ita_record_sample
  .c源文件 拷贝到  包的源文件下   catkin_ws/src/voice_system/src
  .h源文件 拷贝到  包的包含文件下 catkin_ws/src/voice_system/include
  
  修改 ita_record_sample.c文件至 xf_asr.cpp
/*
* 语音听写(iFly Auto Transform)技术能够实时地将语音转换成对应的文字。
* voice Activity Detection 语音活动检测  一句话是否说完了
*/

#include<ros/ros.h>
#include<std_msgs/String.h>
#include<std_msgs/Int32.h>

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include "qisr.h"
#include "msp_cmn.h"
#include "msp_errors.h"
#include "speech_recognizer.h"

#define FRAME_LEN	640 
#define	BUFFER_SIZE	4096
#define ASRFLAG     1      //控制命令

using namespace std;       //命名空间
//全局变量
bool flag = false;         //发布语音识别结果话题  标志
bool recorder_Flag = true; //可录音标志,开始麦克风录音并上传识别
string result = "";        //识别结果字符串

/* 上传用户字典*/
static int upload_userwords()
{
	char*			userwords	=	NULL; //
	size_t			len			=	0;    //
	size_t			read_len	=	0;    //
	FILE*			fp			=	NULL; //
	int				ret			=	-1;   //

	fp = fopen("userwords.txt", "rb");    //打开用户字典文本
	if (NULL == fp)										
	{
		printf("\nopen [userwords.txt] failed! \n"); //打开失败
		goto upload_exit;
	}

	fseek(fp, 0, SEEK_END);                          //到文件结尾
	len = ftell(fp);                                 //偏移长度 字节  相当于文件大小
	fseek(fp, 0, SEEK_SET);  	                     //到文件头				
	
	userwords = (char*)malloc(len + 1);              //申请文件字节+1 大小的缓存区
	if (NULL == userwords)
	{
		printf("\nout of memory! \n");               //缓存区申请失败
		goto upload_exit;
	}

	read_len = fread((void*)userwords, 1, len, fp);  //读取文件写入到 缓存区
	if (read_len != len)
	{
		printf("\nread [userwords.txt] failed!\n");  //文件应该相等
		goto upload_exit;
	}
	userwords[len] = '\0';                           //添加结束符号
	
	//用户数据上传   数据名称字符串 	待上传数据缓冲区的起始地址。  数据长度(如果是字符串,则不包含'\0')。
	//params[in]  "sub = uup,dtt = userword" 	上传用户词表 	iat业务 	UTF-8 编码
	MSPUploadData("userwords", userwords, len, "sub = uup, dtt = userword", &ret); //上传
	if (MSP_SUCCESS != ret)
	{
		printf("\nMSPUploadData failed ! errorCode: %d \n", ret);//上传失败
		goto upload_exit;
	}
	
	//退出
upload_exit:
	if (NULL != fp)
	{
		fclose(fp); //关闭文件
		fp = NULL;
	}	
	if (NULL != userwords)
	{
		free(userwords);//释放缓存区
		userwords = NULL;
	}	
	return ret;
}
/*打印结果*/
static void show_result(char *str, char is_over)
{
	printf("\rResult: [ %s ]", str);
	if(is_over)
		putchar('\n');
		
	string s(str);
	result = s;   //得到语音识别结果
	flag = true;  //设置发布话题为真
}
//全局变量
static char *g_result = NULL;
static unsigned int g_buffersize = BUFFER_SIZE;
//对结果采取的措施
void on_result(const char *result, char is_last)
{
	if (result) {
		size_t left = g_buffersize - 1 - strlen(g_result);
		size_t size = strlen(result);
		if (left < size) {
			g_result = (char*)realloc(g_result, g_buffersize + BUFFER_SIZE);
			if (g_result)
				g_buffersize += BUFFER_SIZE;
			else {
				printf("mem alloc failed\n");
				return;
			}
		}
		strncat(g_result, result, size);
		show_result(g_result, is_last);
	}
}
/*一段新的语句开始*/
void on_speech_begin()
{
	if (g_result)
	{
		free(g_result);
	}
	g_result = (char*)malloc(BUFFER_SIZE);
	g_buffersize = BUFFER_SIZE;
	memset(g_result, 0, g_buffersize);

	printf("Start Listening...\n");
}
/* 说话结束*/
void on_speech_end(int reason)
{
	if (reason == END_REASON_VAD_DETECT) //已经检测到 语句vad断点了  就是说了一句话了
	{
		printf("\nSpeaking done \n");
		recorder_Flag = false;         //录音标志  录音结束
	}
	else
		printf("\nRecognizer error %d\n", reason);
}

/* 从麦克风识别语音*/
static void demo_mic(const char* session_begin_params)
{
	int errcode;
	struct speech_rec iat;
	struct speech_rec_notifier recnotifier = {
		on_result,
		on_speech_begin,
		on_speech_end
	};

	errcode = sr_init(&iat, session_begin_params, SR_MIC, &recnotifier);
	if (errcode) {
		printf("speech recognizer init failed\n");
		return;
	}
	errcode = sr_start_listening(&iat);
	if (errcode) {
		printf("start listen failed %d\n", errcode);
	}
	
	while(recorder_Flag){ //当可录音标志为真时 开始录音并识别
		sleep(1);
	}
	errcode = sr_stop_listening(&iat);
	if (errcode) {
		printf("stop listening failed %d\n", errcode);
	}

	sr_uninit(&iat);
}

 int asrToText()
  {
	
	int ret = MSP_SUCCESS;
	int upload_on =	1; /* whether upload the user word */
	/* login params, please do keep the appid correct */
	const char* login_params = "appid = 58dbcf6e, work_dir = ."; //登陆参数
	int aud_src = 0; /* from mic or file */

	/*
	*  See "iFlytek MSC Reference Manual"
	*  识别参数
	*/
	const char* session_begin_params =
		"sub = iat, domain = iat, language = zh_cn, "
		"accent = mandarin, sample_rate = 16000, "
		"result_type = plain, result_encoding = utf8";

	/* Login first. the 1st arg is username, the 2nd arg is password
	 * just set them as NULL. the 3rd arg is login paramertes
	 * 登陆
	 * */
	ret = MSPLogin(NULL, NULL, login_params);
	if (MSP_SUCCESS != ret)	{
		printf("MSPLogin failed , Error code %d.\n",ret);
		goto exit; // login fail, exit the program
	}

    /*字典上传?*/
	/*	
	printf("Want to upload the user words ? \n0: No.\n1: Yes\n");
	scanf("%d", &upload_on);
	if (upload_on)
	{
		printf("Uploading the user words ...\n");
		ret = upload_userwords();
		if (MSP_SUCCESS != ret)
			goto exit;	
		printf("Uploaded successfully\n");
	}
	*/
	
   /*声音来源  麦克分或 wav文件*/
   /*
	printf("Where the audio comes from?\n0: From a audio file.\n1: From microphone.\n");
	scanf("%d", &aud_src);//键盘输入
	if(aud_src != 0) {
		printf("Demo recognizing the speech from microphone\n");
		printf("Speak in 15 seconds\n");

		demo_mic(session_begin_params);

		printf("15 sec passed\n");
	} else {
		printf("Demo recgonizing the speech from a recorded audio file\n");
		demo_file("wav/iflytek02.wav", session_begin_params); //选者wav文件
	}
	*/
   demo_mic(session_begin_params);
	
exit:
	MSPLogout(); // Logout...
 }


/*
*   根据发布的话题来修改录音标志
*/


void asrCallBack(const std_msgs::Int32::ConstPtr &msg)
{ 
        ROS_INFO_STREAM("Topic is Subscriber, now starting voice recognition");
        if(msg->data == ASRFLAG)  //话题收到相应的 语音识别激活标志
        {
           asrToText();
        }
}

/* main thread: start/stop record ; query the result of recgonization.
 * record thread: record callback(data write)
 * helper thread: ui(keystroke detection)
 */
int main(int argc, char* argv[])
{
     ros::init(argc, argv, "xf_asr_node"); //初始化ros系统 ,在roscore节点管理器注册节点
     ros::NodeHandle nhd;                   //节点句柄
	 //创建一个订阅者sub     节点句柄         话题            缓存区    函数指针   &callbackfunc 得到
     ros::Subscriber sub = nhd.subscribe("voice/xf_asr_topic", 20, &asrCallBack);
	 //节点创建一个发布者
	 ros::Publisher pub = nhd.advertise<std_msgs::String>("voice/tl_nlu_topic", 20);
	 ROS_INFO("please publish the kinds of std_msgs/Int32  1 to the topic voice/xf_asr_topic");
	 ROS_INFO("waitting for the running sign of voice recognition");
	 //ros::spin();
     ros::Rate  rate(10);              //频率
     while(ros::ok()){
		if(flag)                       //成功获取到返回数据
		{
			std_msgs::String msg;
			msg.data = result;
			pub.publish(msg);
			recorder_Flag = true;     //可开始录音识别
			flag = false ;            //发布识别结果 话题 标志 为否
		}
       ros::spinOnce();               //给ROS控制权  可以调用一次回调函数
       rate.sleep();
     }	 
	return 0;;
}

###CMakeLists.txt文件####

cmake_minimum_required(VERSION 2.8.3)
project(voice_system)

## Add support for C++11, supported in ROS Kinetic and newer
# add_definitions(-std=c++11)

## Find catkin macros and libraries
## if COMPONENTS list like find_package(catkin REQUIRED COMPONENTS xyz)
## is used, also find other catkin packages
find_package(catkin REQUIRED COMPONENTS
  roscpp
  rospy
  std_msgs
)

## System dependencies are found with CMake's conventions
# find_package(Boost REQUIRED COMPONENTS system)


## Uncomment this if the package has a setup.py. This macro ensures
## modules and global scripts declared therein get installed
## See http://ros.org/doc/api/catkin/html/user_guide/setup_dot_py.html
# catkin_python_setup()

################################################
## Declare ROS messages, services and actions ##
################################################

## To declare and build messages, services or actions from within this
## package, follow these steps:
## * Let MSG_DEP_SET be the set of packages whose message types you use in
##   your messages/services/actions (e.g. std_msgs, actionlib_msgs, ...).
## * In the file package.xml:
##   * add a build_depend tag for "message_generation"
##   * add a build_depend and a run_depend tag for each package in MSG_DEP_SET
##   * If MSG_DEP_SET isn't empty the following dependency has been pulled in
##     but can be declared for certainty nonetheless:
##     * add a run_depend tag for "message_runtime"
## * In this file (CMakeLists.txt):
##   * add "message_generation" and every package in MSG_DEP_SET to
##     find_package(catkin REQUIRED COMPONENTS ...)
##   * add "message_runtime" and every package in MSG_DEP_SET to
##     catkin_package(CATKIN_DEPENDS ...)
##   * uncomment the add_*_files sections below as needed
##     and list every .msg/.srv/.action file to be processed
##   * uncomment the generate_messages entry below
##   * add every package in MSG_DEP_SET to generate_messages(DEPENDENCIES ...)

## Generate messages in the 'msg' folder
# add_message_files(
#   FILES
#   Message1.msg
#   Message2.msg
# )

## Generate services in the 'srv' folder
# add_service_files(
#   FILES
#   Service1.srv
#   Service2.srv
# )

## Generate actions in the 'action' folder
# add_action_files(
#   FILES
#   Action1.action
#   Action2.action
# )

## Generate added messages and services with any dependencies listed here
# generate_messages(
#   DEPENDENCIES
#   std_msgs
# )

################################################
## Declare ROS dynamic reconfigure parameters ##
################################################

## To declare and build dynamic reconfigure parameters within this
## package, follow these steps:
## * In the file package.xml:
##   * add a build_depend and a run_depend tag for "dynamic_reconfigure"
## * In this file (CMakeLists.txt):
##   * add "dynamic_reconfigure" to
##     find_package(catkin REQUIRED COMPONENTS ...)
##   * uncomment the "generate_dynamic_reconfigure_options" section below
##     and list every .cfg file to be processed

## Generate dynamic reconfigure parameters in the 'cfg' folder
# generate_dynamic_reconfigure_options(
#   cfg/DynReconf1.cfg
#   cfg/DynReconf2.cfg
# )

###################################
## catkin specific configuration ##
###################################
## The catkin_package macro generates cmake config files for your package
## Declare things to be passed to dependent projects
## INCLUDE_DIRS: uncomment this if you package contains header files
## LIBRARIES: libraries you create in this project that dependent projects also need
## CATKIN_DEPENDS: catkin_packages dependent projects also need
## DEPENDS: system dependencies of this project that dependent projects also need
catkin_package(
#  INCLUDE_DIRS include
#  LIBRARIES voice_system
#  CATKIN_DEPENDS roscpp rospy std_msgs
#  DEPENDS system_lib
)

###########
## Build ##
###########

## Specify additional locations of header files
## Your package locations should be listed before other locations
# include_directories(include)
include_directories(    #头文件路径
  include               #cpp文件自建的头文件         
  ${catkin_INCLUDE_DIRS}
)

## Declare a C++ library
# add_library(${PROJECT_NAME}
#   src/${PROJECT_NAME}/voice_system.cpp
# )

## Add cmake target dependencies of the library
## as an example, code may need to be generated before libraries
## either from message generation or dynamic reconfigure
# add_dependencies(${PROJECT_NAME} ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})

## Declare a C++ executable
## With catkin_make all packages are built within a single CMake context
## The recommended prefix ensures that target names across packages don't collide
# add_executable(${PROJECT_NAME}_node src/voice_system_node.cpp)

## Rename C++ executable without prefix
## The above recommended prefix causes long target names, the following renames the
## target back to the shorter version for ease of user use
## e.g. "rosrun someones_pkg node" instead of "rosrun someones_pkg someones_pkg_node"
# set_target_properties(${PROJECT_NAME}_node PROPERTIES OUTPUT_NAME node PREFIX "")

## Add cmake target dependencies of the executable
## same as for the library above
# add_dependencies(${PROJECT_NAME}_node ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})

## Specify libraries to link a library or executable target against
# target_link_libraries(${PROJECT_NAME}_node
#   ${catkin_LIBRARIES}
# )

# TTS
add_executable(xftts src/xf_ttf.cpp)
target_link_libraries(
xftts 
${catkin_LIBRARIES}  
/home/ewenwan/ewenwan/catkin_ws/src/voice/libs/x64/libmsc.so #注意更换为自己的路径
-lrt 
-ldl 
-lpthread ) #系统库 和外库 msc rt dl pthread

# NLU
add_executable(tlnlu src/tl_nlu.cpp)
target_link_libraries(
tlnlu 
${catkin_LIBRARIES} 
-lcurl 
-ljsoncpp
)            #系统库 和外库 curl jsoncpp

# ASR
add_executable(xfasr src/xf_asr.cpp src/speech_recognizer.c src/linuxrec.c)
target_link_libraries(
xfasr 
${catkin_LIBRARIES} 
/home/ewenwan/ewenwan/catkin_ws/src/voice/libs/x64/libmsc.so #注意更换为自己的路径
-lrt 
-ldl 
-lpthread 
-lasound)  #系统库 和外库 msc rt dl pthread asound

#############
## Install ##
#############

# all install targets should use catkin DESTINATION variables
# See http://ros.org/doc/api/catkin/html/adv_user_guide/variables.html

## Mark executable scripts (Python etc.) for installation
## in contrast to setup.py, you can choose the destination
# install(PROGRAMS
#   scripts/my_python_script
#   DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
# )

## Mark executables and/or libraries for installation
# install(TARGETS ${PROJECT_NAME} ${PROJECT_NAME}_node
#   ARCHIVE DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION}
#   LIBRARY DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION}
#   RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
# )

## Mark cpp header files for installation
# install(DIRECTORY include/${PROJECT_NAME}/
#   DESTINATION ${CATKIN_PACKAGE_INCLUDE_DESTINATION}
#   FILES_MATCHING PATTERN "*.h"
#   PATTERN ".svn" EXCLUDE
# )

## Mark other files for installation (e.g. launch and bag files, etc.)
# install(FILES
#   # myfile1
#   # myfile2
#   DESTINATION ${CATKIN_PACKAGE_SHARE_DESTINATION}
# )

#############
## Testing ##
#############

## Add gtest based cpp test target and link libraries
# catkin_add_gtest(${PROJECT_NAME}-test test/test_voice_system.cpp)
# if(TARGET ${PROJECT_NAME}-test)
#   target_link_libraries(${PROJECT_NAME}-test ${PROJECT_NAME})
# endif()

## Add folders to be run by python nosetests
# catkin_add_nosetests(test)     

###############################
########################

5) 实验
roscore
rosrun voice_system xftts    // 文本转换成语言
rosrun voice_system tlnlu    // 语意理解
rosrun voice_system xfasr    // 语音识别
rostopic pub -1 voice/xf_asr_topic std_msgs/Int32 1

在这里插入图片描述

2015-12-05 19:16:43 yangzhaomuma 阅读数 5522
  • C++语音识别开篇

    本篇mark老师将教大家使用第三方库的调用来简单的实现语音识别。随着机器学习和人工智能的热闹,国内语音行业也可谓是百花齐放。 语音识别一个伟大的时代已在我们身边悄悄走来。

    5927 人正在学习 去看看 杨波

Android语音识别,简单的理解就是把语音转化为文字。

在日常中,语音识别,车载导航、语音输入等,虽然不一定准确,但用途广泛。

这里就介绍下谷歌原生的语音识别与百度的语音识别

谷歌语音识别

谷歌语音识别做法很简单

1、首先检测本地是否有语音识别工具,比如谷歌语音搜索,如果没有就结束;

2、用intent意图表示语音识别;

3、发送这个intent,并等待返回;

4、显示返回的内容;

具体的代码如下:

package com.example.speak_csdn;

import java.util.ArrayList;
import java.util.List;

import android.os.Bundle;
import android.speech.RecognizerIntent;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.Toast;
import android.app.Activity;
import android.content.Intent;
import android.content.pm.PackageManager;
import android.content.pm.ResolveInfo;

public class MainActivity extends Activity {

	final int RESPONCERESULT=99;
	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);
		Button btnButton=(Button)findViewById(R.id.mybtn);
		btnButton.setOnClickListener(new OnClickListener() {
			
			@Override
			public void onClick(View v) {
				// TODO Auto-generated method stub
				speak();
			}
		});
	}

	public void speak()
	{
            try{  
             //通过Intent传递语音识别的模式,开启语音  
             Intent intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);  
             //语言模式和自由模式的语音识别  
             intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);  
             //提示语音开始  
             intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "开始语音");  
             
             //开始语音识别  
             startActivityForResult(intent, RESPONCERESULT);  
             }catch (Exception e) {  
                 // TODO: handle exception  
                 e.printStackTrace();  
                 Toast.makeText(getApplicationContext(), "找不到语音设备", 1).show();  
             }  
	}

	@Override
	protected void onActivityResult(int requestCode, int resultCode, Intent data) {
		// TODO Auto-generated method stub
		
		//回调获取从谷歌得到的数据   
        if(requestCode==RESPONCERESULT && resultCode==RESULT_OK){  
            //取得语音的字符  
            ArrayList<String> results=data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);  
            //谷歌可能有许多语音类似的返回,越往上优先级越高,这里列出所有的返回并拼接成字符串   
            String resultString="";  
            for(int i=0;i<results.size();i++){  
                resultString+=results.get(i);  
            }  
            Toast.makeText(this, resultString, 1).show();  
        }  
		super.onActivityResult(requestCode, resultCode, data);
	}
	
	

}

代码完成了,注意要加上网络访问权限,因为这个是在线语音识读,代码关键的语句在以下几句:

 //通过Intent传递语音识别的模式,开启语音  
             Intent intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);  
             //语言模式和自由模式的语音识别  
             intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);  
             //提示语音开始  
             intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "开始语音");  
             
             //开始语音识别  
             startActivityForResult(intent, RESPONCERESULT);  


对应这个的布局语句很简单,只有一个button按钮。给这个按钮绑定事件,点击运行...


这和我们预料的不同啊,网络没有问题,测试了WIFI和GPRS都是同样的结果。最终这只能归结为谷歌后台服务无法连接,你或者可以通过翻墙来看到效果。

悲伤。

那现在我们看看中国本地的语音识别,百度语音。

百度语音识别

百度语音识别,应用的是百度提供的SDK来实现。这个在百度的开放平台上可以看到很详细的说明。
应用它的步骤如下:
1、下载jar包;
2、添加权限;
3、在代码中,用给定的API来做语音识别;
4、显示返回内容;

jar包、so文件下载

以下是从百度开放平台上下载的jar包以及so文件,用于后续的开发使用

权限添加

AndroidManifest.xml中添加需要的权限,如下:
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.WRITE_SETTINGS" />
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.READ_PHONE_STATE" />
    <uses-permission android:name="android.permission.WAKE_LOCK" />
    <uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
    <uses-permission android:name="android.permission.ACCESS_COARSE_LOCATION" />
    <uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
    <!-- 蓝牙录音 -->
    <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
    <!-- 某些手机启动SCO音频连接需要此权限 -->
    <uses-permission android:name="android.permission.BROADCAST_STICKY" />
    <!-- 蓝牙录音检测耳机状态 -->
    <uses-permission android:name="android.permission.BLUETOOTH" />

代码中使用API

这个就是使用API的过程,如下代码:
package com.example.baiduspeak_csdn;

import java.util.ArrayList;

import com.baidu.voicerecognition.android.ui.BaiduASRDigitalDialog;
import com.baidu.voicerecognition.android.ui.DialogRecognitionListener;

import android.os.Bundle;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.Toast;
import android.app.Activity;

public class MainActivity extends Activity {

	//百度自定义对话框
	 private BaiduASRDigitalDialog mDialog = null;
	 //对话框监听
	 private DialogRecognitionListener mRecognitionListener;
	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);
		
		Button btnButton=(Button)findViewById(R.id.mybtn);
		btnButton.setOnClickListener(new OnClickListener() {
			
			@Override
			public void onClick(View v) {
				// TODO Auto-generated method stub
				speak_Baidu();
			}
		});
		
		mRecognitionListener = new DialogRecognitionListener() {

            @Override
            public void onResults(Bundle results) {
                ArrayList<String> rs = results != null ? results.getStringArrayList(RESULTS_RECOGNITION) : null;
                if (rs != null && rs.size() > 0) {
                	Toast.makeText(MainActivity.this, rs.get(0), 1).show();
                }

            }
        };
	}
    //百度语音识别
	public void speak_Baidu()
	{
		if (mDialog != null) {
            mDialog.dismiss();
        }
        Bundle params = new Bundle();
        //设置注册百度开放平台得到的值 API_KEY,SECRET_KEY
        params.putString(BaiduASRDigitalDialog.PARAM_API_KEY, Constants.API_KEY);
        params.putString(BaiduASRDigitalDialog.PARAM_SECRET_KEY, Constants.SECRET_KEY);
        //设置对话框模式
        params.putInt(BaiduASRDigitalDialog.PARAM_DIALOG_THEME, Config.DIALOG_THEME);
        //根据设置新建对话框
        mDialog = new BaiduASRDigitalDialog(this, params);
        //设置对话框的监听
        mDialog.setDialogRecognitionListener(mRecognitionListener);
        //对话框设置
	    mDialog.getParams().putInt(BaiduASRDigitalDialog.PARAM_PROP, Config.CURRENT_PROP);
	    mDialog.getParams().putString(BaiduASRDigitalDialog.PARAM_LANGUAGE, Config.getCurrentLanguage());
	    mDialog.getParams().putBoolean(BaiduASRDigitalDialog.PARAM_START_TONE_ENABLE, Config.PLAY_START_SOUND);
	    mDialog.getParams().putBoolean(BaiduASRDigitalDialog.PARAM_END_TONE_ENABLE, Config.PLAY_END_SOUND);
	    mDialog.getParams().putBoolean(BaiduASRDigitalDialog.PARAM_TIPS_TONE_ENABLE, Config.DIALOG_TIPS_SOUND);
	    mDialog.show();
	}
	 @Override
	    protected void onDestroy() {
	        if (mDialog != null) {
	            mDialog.dismiss();
	        }
	        super.onDestroy();
	    }

}
代码也只是简单的定义一个按钮,按钮绑定一个事件。
事件发起时,我们设定了挺多参数,重要的有平台分配的APP_KEY,SECRET_KEY,PROP,语言的选择等。
根据这些,百度对我们发出的声音,在服务端,得到匹配的内容并返回前端。
效果如下:


这个效果我们是能看到的。简单的用法就是这样的。类似谷歌语音,一个请求即可。

源码

源码包括以上的谷歌和百度语音识别,可供下载:

2019-04-09 16:52:21 weixin_43346716 阅读数 654
  • C++语音识别开篇

    本篇mark老师将教大家使用第三方库的调用来简单的实现语音识别。随着机器学习和人工智能的热闹,国内语音行业也可谓是百花齐放。 语音识别一个伟大的时代已在我们身边悄悄走来。

    5927 人正在学习 去看看 杨波

语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的高新技术。语音识别技术主要包括特征提取技术、模式匹配准则及模型训练技术三个方面。
  语音识别是人机交互的基础,主要解决让机器听清楚人说什么的难题。人工智能目前落地最成功的就是语音识别技术。

语音识别目前主要应用在车联网、智能翻译、智能家居、自动驾驶方面,国内最具代表性的企业是科大讯飞,此外还有云知声、普强信息、声智科技、GMEMS通用微科技等初创企业。

2019-05-22 00:45:34 zhinengxuexi 阅读数 813
  • C++语音识别开篇

    本篇mark老师将教大家使用第三方库的调用来简单的实现语音识别。随着机器学习和人工智能的热闹,国内语音行业也可谓是百花齐放。 语音识别一个伟大的时代已在我们身边悄悄走来。

    5927 人正在学习 去看看 杨波

随着AI快速发展的今天,语音识别也成为众多设备的标配,语音识别开始被越来越多人的关注,国外微软、苹果、谷歌,国内的科大讯飞、思必弛、云知声等厂商都在研发语音识别新策略新算法,似乎人类与语音的自然交互渐行渐近。

语音识别是以语音的研究为对象,通过语音信号处理和模式识别让机器自动识别和理解人类口述的语言。

语音识别系统本质上是一种模式识别系统,包括特征提取、模式匹配、参考模式库等三个基本单元。

一套完整的语音识别系统,工作过程分为7步:

1.对语音信号进行分析和处理,除去冗余信息。

2.提取影响语音识别的关键信息和表达语言含义的特征信息。

3.紧扣特征信息,用最小单元识别字词。

4.按照不同语言的各自语法,依照先后次序识别字词。

5.把前后意思当作辅助识别条件,有利于分析和识别。

6.按照语义分析,给关键信息划分段落,取出所识别出的字词并连接起来,同时根据语句意思调整句子构成。

7.结合语义,仔细分析上下文的相互联系,对当前正在处理的语句进行适当修正

语音识别、图像识别均为模式识别中的主要应用,有兴趣的可以多学习一下

人工智能、大数据、云计算和物联网的未来发展值得重视,均为前沿产业,多智时代专注于人工智能和大数据的入门和科谱,在此为你推荐几篇优质好文:
人工智能与语音识别技术
http://www.duozhishidai.com/article-3157-1.html
人工智能快速发展的今天,语音识别现在发展到什么阶段了?
http://www.duozhishidai.com/article-2278-1.html
语音的识别过程主要分哪几步,常用的识别方法是什么?
http://www.duozhishidai.com/article-1739-1.html


多智时代-人工智能大数据学习入门网站|人工智能、大数据、物联网云计算的学习交流网站

多智时代-人工智能大数据学习入门网站|人工智能、大数据、云计算、物联网的学习服务的好平台

2014-04-16 16:08:46 chenlei0630 阅读数 6385
  • C++语音识别开篇

    本篇mark老师将教大家使用第三方库的调用来简单的实现语音识别。随着机器学习和人工智能的热闹,国内语音行业也可谓是百花齐放。 语音识别一个伟大的时代已在我们身边悄悄走来。

    5927 人正在学习 去看看 杨波

        语音识别结果的纠错是语音理解过程中的一项重要工作。由于受限于语音识别的准确性,语音识别的结果常常会出现错误,这将对语音理解的后续工作造成障碍,增加了语音理解的难度。语音识别结果的纠错可以对一些识别的错误结果进行纠正,从而提高语音理解的准确性。

        目前,用于语音识别结果纠正的方法有很多,但还没有一个主流的方法。大多数的ASR后修正的研究都是基于词的识别结果概率信息的统计方法。

        Ringger和Allen (1996)提出了一种对语音识别错误进行后处理的方法。该方法包含两部分,信道模型反映识别器识别的错误,而语言模型则用于表示说话人说出的词序列的似然概率。文中用TRAINS-95对话系统的结果作为训练集。

        Zhou和Meng (2004)提出了一种两层模式的语音识别错误发现方法。第一层用SVM进行分类并判断识别结果是否有错,如果有错,传递到第二层。第二层利用SVM对有错的句子进行分类并判断哪些是错误的单词。实验证明,第一层错误句子召回率为16.5%,第二层错误句子中错误单词的召回率为19.8%。

还有一些方法用到了相邻词的共现信息。Zhou et al. (2006)提出了一种3阶段错误检查方法。第一步检测识别句子是否包含错误;第二步检测错误哪些词是错误的;第三步检测错误的字符;错误纠正首先产生错误可能的修正候选词列表,然后结合互信息和trigram语言模型对候选词进行re-ranked。

         Arup Sarma和David D. Palmer提出一种基于上下文词共现进行语音识别错误纠错的方法。该方法需要较大的语料,设定一个窗口长度w,定义在中心词keyword周围w/2范围内都算作keyword的邻居。遍历文本统计每个词的邻居和出现次数,并对邻居进行排序。根据识别错误的词和读音相似的候选词对于上下文的KL距离判断识别结果是否是错误的词,并找出正确的候选词。

        Minwoo Jeong提出了一种结合词汇信息和高层语言知识并利用最大熵语言模型(MELM)进行错误识别的方法。Minwoo Jeong认为语音识别结果的N-best重排是一个主要的方法。但是这种方法假设正确的识别结果一定出现在N-best结果中,如果正确结果不在N-best中,则一定不能找到正确结果。这种方法受限于语音识别系统的准确率。于是他提出了一种信道上下文和高级语言知识的方法。该方法需要三部分语料:很大的背景语料(收集自相关或一些其他任务的语音识别结果语料)B、小部分ASR修正语料对A1和小部分特定领域识别任务的特征语料A2。利用语料A1训练信道模型,用B和A2训练语言模型。通过MAP找出最可能的正确结果。

        在考虑信道模型时,为了解决添音和吞音的错误,文中采用了机器翻译中IBM model4 来解决这个问题。在用语料B和A2进行语言模型训练时,采用对两种语料分别训练语言模型并对产生的语言模型进行线性差值。此外该文章中还利用了两层的语言模型:ngram和whole-sentence ME languageModel(WSME-LM)。Ngram语言模型是第一层,用于捕捉那些局部依赖的特征,并且可以快速的处理。WSME-LM是第二层,可以捕捉长距离的依赖和更高层次的语言现象。

        结果经过试验证明,这种方法的词错率下降了8%左右。

李晶皎等根据汉语语音的特点,通过总结在连续汉语语音识别的汉字序列中出现错误的规律,写出相应的用于查错和校正的语法和句法语义规则,利用“词汇语义驱动”的分析方法,找到汉字序列中的错误并校正,最终得到正确的汉字序列。

      李蕾、许晶等提出了用全信息自然语言理解的知识对语音识别文本进行纠错的方法。该方法将全信息(自然语言的语法、语义和语用信息)自然语言理解应用到可靠语音功能的研究中,提出的语音识别之后增加一个全信息自然语言理解的文本后处理功能,通过深入分析词语的语法信息(位置、识别稳定度)、语义信息(语句目标含义)和语用信息(语境和谐度)对语音识别结果语句进行评估、检错和纠错,最终输出优化语句。

      张全、张倪等提出了基于概念层次网络(HNC)的汉语语音识别的纠错方法。该方法的主要思想是:当纠错处理发现识别系统产生的结果是一个有错误的句子,首先根据线索确定错误的位置。由于句类分析系统可以判断句中的词汇是否合理,因此只要在认为有错误的位置上给出可以替换的候选集,纠错实际上就转化为模糊消解。由于识别系统只能给出汉字结果不提供中间结果.所以回到有错汉字对应的音,看是否能找到正确的词语:如果无法找到,考虑混淆音组成的词。混淆音为纠错处理提供更多的候选,使其尽可能找到符合概念联想脉络的正确词语。

      王兴建提出了一种基于ngram语言模型的语音识别纠错方法。该方法根据对语音识别结果相邻词之间概率的分析,找出可能的错误词位置,并应用拼音的混淆规则、拼音稳定度和拼音相似度等信息,找到错误词最可能的替换词。

 

 参考文献:

Allen J F, Miller B W, Ringger E K, et al. Arobust system for natural spoken dialogue[C]//Proceedings of the 34th annualmeeting on Association for Computational Linguistics. Association forComputational Linguistics, 1996: 62-70.

Zhou Z, Meng HM. A two-level schema for detecting recognition errors[C]//INTERSPEECH. 2004.

Jeong M, Kim B,Lee G. Using higher-level linguistic knowledge for speech recognition errorcorrection in a spoken Q/A dialog[C]//Proceedings of the HLT-NAACL specialworkshop on Higher-Level Linguistic Information for Speech Processing. 2004:48-55.

Jeong M, Eun J,Jung S, et al. An error-corrective language-model adaptation for automaticspeech recognition[C]//INTERSPEECH. 2005: 729-732.

李晶饺, 张喇, 姚天顺. 汉语语音理解中自动纠错系统的研究[J]. 1999.

张全, 张倪, 韦向峰. 汉语语音识别的纠错处理[C]//自然语言理解与机器翻译——全国第六届计算语言学联合学术会议论文集. 2001.

Sarma A, PalmerD D. Context-based speech recognition error detection andcorrection[C]//Proceedings of HLT-NAACL 2004: Short Papers. Association forComputational Linguistics, 2004: 85-88.


转载请注明出处:http://blog.csdn.net/chenlei0630/article/details/23852789

语音识别的介绍

阅读数 3135

语音识别的流程

阅读数 39

没有更多推荐了,返回首页