这里只有精品久久,国产精品xxx视频,在线日韩三级

主頁 > 智能呼叫系統 > 行業資訊 > 智能電話機器人--基于 UniMRCP 實現訊飛 ASR MRCP Server

智能電話機器人--基于 UniMRCP 實現訊飛 ASR MRCP Server

POST TIME:2021-08-22 21:18

通過實現 UniMRCP 的 plugin，我們可以封裝訊飛、百度、阿里等廠家的 ASR 接口，實現我們自己的 MRCP 服務器。

什是 MRCP

媒體資源控制協議（Media Resource Control Protocol, MRCP）是一種通訊協議，用于媒體資源服務器向客戶端提供各種語音服務，目前已定義的媒體資源服務有語音識別(Speech Recognition)、語音合成(Speech Synthesis)、錄音(Recording)、說話人鑒別和確認(Speaker Verification and Identifi-cation)。MRCP并不定義會話連接，不關心服務器與客戶端是如何連接的，MRCP消息使用RTSP、SIP等作為控制協議，目前最新的MRCPv2版本使用SIP控制協議。（本文使用的是MRCPv2）。

從源碼編譯、安裝 UniMRCP

本文所有操作均在 CentOS 7 下進行。

UniMRCP 簡介

UniMRCP is an open source cross-platform implementation of the MRCP client and server in the C/C++ language distributed under the terms of the Apache License 2.0. The implementation encapsulates SIP, RTSP, SDP, MRCPv2, RTP/RTCP stacks and provides integrators with an MRCP version consistent API.

編譯、安裝、運行

首先去官網下載“UniMRCP 1.5.0”和“UniMRCP Deps 1.5.0”。

切換到 root 賬戶，首先進入 Deps 目錄進行依賴安裝：

1	./build-dep-libs.sh

UniMRCP 安裝可參考官網：

./bootstrap

The usual "configure", "make", "make install" sequence of commands should follow in order to build 
and install the project from source.

./configure
make
make install

As a result, the project will be installed in the directory "/usr/local/unimrcp" with the following
layout:

bin binaries (unimrcpserver, unimrcpclient, ...)
conf configuration files (unimrcpserver.xml, unimrcpclient.xml, ...)
data data files
include header files
lib shared (convenience) libraries
log log files
plugin run-time loadable modules

安裝完成后，可進入/usr/local/unimrcp/bin目錄下，運行 server：

1	./unimrcpserver -o 3

啟動成功后會提示“MRCP Server Started”。我們可以使用提供的 Client 進行驗證：

./unimrcpclient

.
.
.
>help
usage:

- run [app_name] [profile_name] (run demo application)
       app_name is one of 'synth', 'recog', 'bypass', 'discover'
       profile_name is one of 'uni2', 'uni1', ...

       examples:
           run synth
           run recog
           run synth uni1
           run recog uni1

如上圖所示，啟動完 Client 后，可輸入run synth等命令，可以觀察 Server 和 Client 端的日志，synth 是語音合成，recog 是語音解析。

MRCP plugin

直接從源代碼切入其實是比較費勁的，我們可以結合服務器端的日志打印，從源代碼中找出相應的調用過程。調用過程較復雜，后面只列出較為關鍵的部分。

加載流程

首先看日志，這里我們篩選了 Demo Recog 的日志，其他 plugin 道理上是一樣的：

[INFO]   Load Plugin [Demo-Recog-1] [/usr/local/unimrcp/plugin/demorecog.so]
[INFO]   Register MRCP Engine [Demo-Recog-1]
[INFO]   Open Engine [Recorder-1]
[INFO]   Start Task [Demo Recog Engine]

通過上面的信息我們可以去搜索源代碼，查看一個 plugin 的加載流程。

下面是從配置文件解析到 plugin 到 .so 被加載的流程：

unimrcp_server.c
/** Load plugin */
static apt_bool_t unimrcp_server_plugin_load(unimrcp_server_loader_t *loader, const apr_xml_elem *root) {
...
	engine = mrcp_server_engine_load(loader->server,plugin_id,plugin_path,config);
...
}

mrcp_server.c
/** Load MRCP engine */
MRCP_DECLARE(mrcp_engine_t*) mrcp_server_engine_load(
								mrcp_server_t *server,
								const char *id,
								const char *path,
								mrcp_engine_config_t *config) {
...
	engine = mrcp_engine_loader_plugin_load(server->engine_loader,id,path,config);
...
}

mrcp_engine_loader.h
/** Load engine plugin */
MRCP_DECLARE(mrcp_engine_t*) mrcp_engine_loader_plugin_load(mrcp_engine_loader_t *loader, const char *id, const char *path, mrcp_engine_config_t *config) {
...
apr_dso_load(&plugin,path,loader->pool)
...
}

load 成功之后，注冊了該 engine：

unimrcp_server.c
/** Load plugin */
static apt_bool_t unimrcp_server_plugin_load(unimrcp_server_loader_t *loader, const apr_xml_elem *root) {
...
	return mrcp_server_engine_register(loader->server,engine);
...
}

最終會加到 hash 表中：

mrcp_engine_factory.c
/** Register new engine */
MRCP_DECLARE(apt_bool_t) mrcp_engine_factory_engine_register(mrcp_engine_factory_t *factory, mrcp_engine_t *engine)
{
...
	apr_hash_set(factory->engines,engine->id,APR_HASH_KEY_STRING,engine);
...
}

上面是 unimrcp_server_load調用后的一系列加載，成功之后將會啟動服務器：

unimrcp_server.c
/** Start UniMRCP server */
MRCP_DECLARE(mrcp_server_t*) unimrcp_server_start(apt_dir_layout_t *dir_layout)
{
...
unimrcp_server_load(server,dir_layout,pool)
...
mrcp_server_start(server)
...
}

apt_bool_t mrcp_engine_virtual_open(mrcp_engine_t *engine) {
...
mrcp_engine_iface.c
/** Open engine */
engine->method_vtable->open(engine)
...
}

method_vtable 就涉及到 plugin 具體是如何被調用的了。

調用流程

通過查看具體的調用流程，在對比官網 plugin 實現手冊，就很容易理解手冊里需要我們實現的接口具體是什么作用。

具體調用細節這里就不詳細展開了，最終對 plugin 的所有操作，都是通過下面三個虛表中的函數指針來進行回調觸發。

首先是 engine 層面的回調，其實對應的就是 plugin 的創建、打開、關閉、刪除：

/** Table of MRCP engine virtual methods */
struct mrcp_engine_method_vtable_t {
       /** Virtual destroy */
       apt_bool_t (*destroy)(mrcp_engine_t *engine);
       /** Virtual open */
       apt_bool_t (*open)(mrcp_engine_t *engine);
       /** Virtual close */
       apt_bool_t (*close)(mrcp_engine_t *engine);
       /** Virtual channel create */
       mrcp_engine_channel_t* (*create_channel)(mrcp_engine_t *engine, apr_pool_t *pool);
};

客戶端與服務器 plugin 通信時，在一個 session 內會創建 channel，并在會話終止時銷毀該 channel。以下就是 channel 相關的回調：

/** Table of channel virtual methods */
struct mrcp_engine_channel_method_vtable_t {
       /** Virtual destroy */
       apt_bool_t (*destroy)(mrcp_engine_channel_t *channel);
       /** Virtual open */
       apt_bool_t (*open)(mrcp_engine_channel_t *channel);
       /** Virtual close */
       apt_bool_t (*close)(mrcp_engine_channel_t *channel);
       /** Virtual process_request */
       apt_bool_t (*process_request)(mrcp_engine_channel_t *channel, mrcp_message_t *request);
};

當使用 ASR 時需要有音頻數據的流入，TTS 時需要有音頻數據的流出，下面的回調就是為了處理音頻數據：

/** Table of audio stream virtual methods */
struct mpf_audio_stream_vtable_t {
       /** Virtual destroy method */
       apt_bool_t (*destroy)(mpf_audio_stream_t *stream);
       /** Virtual open receiver method */
       apt_bool_t (*open_rx)(mpf_audio_stream_t *stream, mpf_codec_t *codec);
       /** Virtual close receiver method */
       apt_bool_t (*close_rx)(mpf_audio_stream_t *stream);
       /** Virtual read frame method */
       apt_bool_t (*read_frame)(mpf_audio_stream_t *stream, mpf_frame_t *frame);
       /** Virtual open transmitter method */
       apt_bool_t (*open_tx)(mpf_audio_stream_t *stream, mpf_codec_t *codec);
       /** Virtual close transmitter method */
       apt_bool_t (*close_tx)(mpf_audio_stream_t *stream);
       /** Virtual write frame method */
       apt_bool_t (*write_frame)(mpf_audio_stream_t *stream, const mpf_frame_t *frame);
       /** Virtual trace method */
       void (*trace)(mpf_audio_stream_t *stream, mpf_stream_direction_e direction, apt_text_stream_t *output);
};

通過對上面三個虛表內回調方法的實現，就可以對客戶端發送過來的相應請求進行處理。

使用科大訊飛 ASR 實現 MRCP plugin

新建 plugin

修改 configure.ac

因為 unimrcp 使用 automake 進行源碼編譯管理，所以除了添加源代碼，我們還需要進行相應配置添加。
首先編輯 configure.ac 文件，添加如下，其實是一個宏定義會在后面的 Makefile 中使用到，以及添加后面我們新增的 Makefile：

dnl XFyun recognizer plugin.
UNI_PLUGIN_ENABLED(xfyunrecog)

AM_CONDITIONAL([XFYUNRECOG_PLUGIN],[test "${enable_xfyunrecog_plugin}" = "yes"])

...

plugins/xfyun-recog/Makefile

...

echo XFyun recognizer plugin....... : $enable_xfyunrecog_plugin

新增源代碼及目錄

在 plugin 目錄下，新建 xfyun-recog 目錄，并在該目錄下新建 src 目錄，可以將 demo_recog_engine.c 拷貝到該目錄下改名為 xfyun_recog_engine.c，并將源代碼中的所有“demo”替換為“xfyun”，當然也可以自己從 0 開始敲一遍。

新建 Makefile.am 文件，內容如下：

AM_CPPFLAGS                = $(UNIMRCP_PLUGIN_INCLUDES)

plugin_LTLIBRARIES         = xfyunrecog.la

xfyunrecog_la_SOURCES       = src/xfyun_recog_engine.c
xfyunrecog_la_LDFLAGS       = $(UNIMRCP_PLUGIN_OPTS)

include $(top_srcdir)/build/rules/uniplugin.am

修改 plugin 目錄下的 Makefile.am 文件，新增如下內容：

1
2
3

if XFYUNRECOG_PLUGIN
SUBDIRS               += xfyun-recog
endif

XFYUNRECOG_PLUGIN 就是 configure.ac 里面我們添加的內容。

最終目錄結構如下圖（請忽略紅框外的文件）：

xfyun recog dir

完成后我們可以從第一步開始重新把 UniMRCP 編譯一遍，應該可以看到 xfyun_recog_engine.so 的生成。

導入訊飛 SDK

首先去訊飛開放平臺下載語言聽寫及在線語音合成（后面 TTS 實現時用到）的SDK。

在 plugin 目錄下新建 third-party 目錄，將訊飛的 SDK 拷貝進去：

third party dir

修改 xfyun_recog_engine 的 Makefile.am，添加對訊飛庫的鏈接及安裝時的拷貝：

plugin_LTLIBRARIES         = xfyunrecog.la

xfyunrecog_la_SOURCES       = src/xfyun_recog_engine.c
xfyunrecog_la_LDFLAGS       = $(UNIMRCP_PLUGIN_OPTS) \
                              -L$(top_srcdir)/plugins/third-party/xfyun/libs/x64 \
                              -lmsc -ldl -lpthread -lrt
xfyunrecog_ladir            = $(libdir)
xfyunrecog_la_DATA          = $(top_srcdir)/plugins/third-party/xfyun/libs/x64/libmsc.so

include $(top_srcdir)/build/rules/uniplugin.am

UNIMRCP_PLUGIN_INCLUDES     += -I$(top_srcdir)/plugins/third-party/xfyun/include

調用訊飛 API 實現 plugin

訊飛的實現可以參考官方文檔和 SDK 里面提供的 asr_sample。

xfyun asr

引用頭文件

#include <stdlib.h>
#include "qisr.h"
#include "msp_cmn.h"
#include "msp_errors.h"

channel 新增變量

struct xfyun_recog_channel_t {
	...
	const char				*session_id;	//訊飛session_id
	const char				*last_result;	//存放識別結果
	apt_bool_t				recog_started;	//是否已開始識別
};

訊飛 login

static apt_bool_t xfyun_login()
{
	int			ret						=	MSP_SUCCESS;
	const char* login_params			=	"appid = 5ac1c462, work_dir = ."; // 登錄參數，appid與msc庫綁定,請勿隨意改動

	/* 用戶登錄 */
	ret = MSPLogin(NULL, NULL, login_params); //第一個參數是用戶名，第二個參數是密碼，均傳NULL即可，第三個參數是登錄參數	
	if (MSP_SUCCESS != ret)
	{
		apt_log(RECOG_LOG_MARK,APT_PRIO_ERROR,"[xfyun] MSPLogin failed , Error code %d.", ret);
		return FALSE; //登錄失敗，退出登錄
	}
	apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] MSPLogin success");
	return TRUE;
}

我們在創建 engine 的時候調用該函數即可。

訊飛 session 創建、終止

首先我們需要找到 session 創建、終止的時機。xfyun_recog_msg_process是處理 channel 中的 request 的回調，RECOGNIZER_RECOGNIZE 正是請求識別，所以我們在請求時創建 session，識別結束或者 RECOGNIZER_STOP 時終止該 session。

/** Process RECOGNIZE request */
static apt_bool_t xfyun_recog_channel_recognize(mrcp_engine_channel_t *channel, mrcp_message_t *request, mrcp_message_t *response)
{
...
/* reset */
	int errcode = MSP_SUCCESS;
	const char*	session_begin_params = "sub = iat, domain = iat, language = zh_cn, accent = mandarin, sample_rate = 8000, result_type = plain, result_encoding = utf8";
	recog_channel->session_id = QISRSessionBegin(NULL, session_begin_params, &errcode); //聽寫不需要語法，第一個參數為NULL
	if (MSP_SUCCESS != errcode)
	{
		apt_log(RECOG_LOG_MARK,APT_PRIO_WARNING,"[xfyun] QISRSessionBegin failed! error code:%d\n", errcode);
		return FALSE;
	}
	apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] QISRSessionBegin suceess!");
	
	recog_channel->last_result = NULL;
	recog_channel->recog_started = FALSE;

	recog_channel->recog_request = request;
}

void xfyun_recog_end_session(xfyun_recog_channel_t *recog_channel){
	if(recog_channel->session_id) {
		apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] QISRSessionEnd suceess!");
		QISRSessionEnd(recog_channel->session_id, "mrcp channel closed");
		recog_channel->session_id = NULL;
	}
}

處理語音流

xfyun_recog_stream_write是收到語音流的回調，很顯然具體的識別處理應該在這個里面進行調用，下面是具體的識別函數：

static apt_bool_t xfyun_recog_stream_recog(xfyun_recog_channel_t *recog_channel,
							   const void *voice_data,
							   unsigned int voice_len 
							   ) {
	// int MSPAPI QISRAudioWrite(const char* sessionID, const void* waveData, unsigned int waveLen, int audioStatus, int *epStatus, int *recogStatus);
	int aud_stat = MSP_AUDIO_SAMPLE_CONTINUE;		//音頻狀態
	int ep_stat	= MSP_EP_LOOKING_FOR_SPEECH;		//端點檢測
	int rec_stat = MSP_REC_STATUS_SUCCESS;			//識別狀態
	int ret = 0;
	if(FALSE == recog_channel->recog_started) {
		apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] start recog");
		recog_channel->recog_started = TRUE;
		aud_stat = MSP_AUDIO_SAMPLE_FIRST;
	} else if(0 == voice_len) {
		apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] finish recog");
		aud_stat = MSP_AUDIO_SAMPLE_LAST;
	}
	if(NULL == recog_channel->session_id) {
		return FALSE;
	}
	ret = QISRAudioWrite(recog_channel->session_id, voice_data, voice_len, aud_stat, &ep_stat, &rec_stat);
	if (MSP_SUCCESS != ret)
	{
		apt_log(RECOG_LOG_MARK,APT_PRIO_WARNING,"[xfyun] QISRAudioWrite failed! error code:%d", ret);
		return FALSE;
	}
	if(MSP_REC_STATUS_SUCCESS != rec_stat && MSP_AUDIO_SAMPLE_LAST != aud_stat) {
		// apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] no need recog,rec_stat=%d,aud_stat=%d",rec_stat,aud_stat);
		return TRUE;
	}
	while (1) 
	{
		const char *rslt = QISRGetResult(recog_channel->session_id, &rec_stat, 0, &ret);
		if (MSP_SUCCESS != ret)
		{
			apt_log(RECOG_LOG_MARK,APT_PRIO_WARNING,"[xfyun] QISRGetResult failed, error code: %d", ret);
			return FALSE;
		}
		if (NULL != rslt)
		{
			if(NULL == recog_channel->last_result) {
				recog_channel->last_result = apr_pstrdup(recog_channel->channel->pool,rslt);
			} else {
				// recog_channel->last_result = apr_psprintf(recog_channel->channel->pool,"%s%s",recog_channel->last_result,rslt);
				recog_channel->last_result = apr_pstrcat(recog_channel->channel->pool, recog_channel->last_result,rslt);
			}
		}
		apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] Get recog result:%s",rslt);

		if(MSP_AUDIO_SAMPLE_LAST == aud_stat && MSP_REC_STATUS_COMPLETE != rec_stat) {
			usleep(150*1000);
			continue;
		}
		break;
	}
	return TRUE;
}

發送識別結果

當xfyun_recog_stream_write中檢測到語音結束或者沒有任何輸入時，調用xfyun_recog_recognition_complete發送結束的消息，在該函數中我們就可以讀出最終的解析結果發送出去：

/* Load xfyun recognition result */
static apt_bool_t xfyun_recog_result_load(xfyun_recog_channel_t *recog_channel, mrcp_message_t *message)
{
	apt_str_t *body = &message->body;
	if(!recog_channel->last_result) {
		return FALSE;
	}

	body->buf = apr_psprintf(message->pool,
		"<?xml version=\"1.0\"?>\n"
		"<result>\n"
		"  <interpretation confidence=\"%d\">\n"
		"    <instance>%s</instance>\n"
		"    <input mode=\"speech\">%s</input>\n"
		"  </interpretation>\n"
		"</result>\n",
		99,
		recog_channel->last_result,
		recog_channel->last_result);
	if(body->buf) {
		mrcp_generic_header_t *generic_header;
		generic_header = mrcp_generic_header_prepare(message);
		if(generic_header) {
			/* set content type */
			apt_string_assign(&generic_header->content_type,"application/x-nlsml",message->pool);
			mrcp_generic_header_property_add(message,GENERIC_HEADER_CONTENT_TYPE);
		}
		
		body->length = strlen(body->buf);
	}
	return TRUE;
}

端點檢測問題

下面的方法進行了語音的端點檢測，在實際調試時，有遇到通話的 level 最低始終是 8，低于默認的閾值 2，可以適當的調高默認值，從而避免出現始終不會識別到語音結束的情況。