前几天接收到百度邮件推送,百度AI语言处理基础技术接口开始永久免费了.
打开网页仔细看了一下,突然感觉百度公司知道该做对的事情了.
百度AI开发者网址:http://ai.baidu.com/tech/nlp?hmsr=developeredm&hmpl=NLP_EDM
比较吸引我的几个接口:
1.根据文章标题内容分析关键词
2.根据文章标题内容获取板块分类
3.判断两个短文本的相似度(可用作智能推荐)
下面简单说一下使用流程:
1.获取accessToken
这个官方文档有范例,我稍微改进了一下.
环境一:Json正反序列化使用的Nuget插件LitJson
环境二:工程需添加引用System.Net.Http
using LitJson;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Web;
namespace com.baidu.ai
{
public static class BaiduAIAccess
{
// 百度云中开通对应服务应用的 API Key 建议开通应用的时候多选服务
private static String clientId = "2nXP88tZlViNQGiZddHPUC7W";
// 百度云中开通对应服务应用的 Secret Key
private static String clientSecret = "3lM79E9uiuYiuhrTBpNWlTW92k1yX8PX";
public static BaiduAIAccessTokenResult GetAccessToken()
{
String authHost = "https://aip.baidubce.com/oauth/2.0/token";
HttpClient client = new HttpClient();
List<keyvaluepair<string, string="">> paraList = new List<keyvaluepair<string, string="">>();
paraList.Add(new KeyValuePair<string, string="">("grant_type", "client_credentials"));
paraList.Add(new KeyValuePair<string, string="">("client_id", clientId));
paraList.Add(new KeyValuePair<string, string="">("client_secret", clientSecret));
HttpResponseMessage response = client.PostAsync(authHost, new FormUrlEncodedContent(paraList)).Result;
String result = response.Content.ReadAsStringAsync().Result;
BaiduAIAccessTokenResult accessTokenResult = JsonMapper.ToObject<baiduaiaccesstokenresult>(result);
return accessTokenResult;
}
}
public class BaiduAIAccessTokenResult
{
public string access_token { get; set; }
public string session_key { get; set; }
public string scope { get; set; }
public string refresh_token { get; set; }
public string session_secret { get; set; }
public int expires_in { get; set; }
}
}
2.调用接口,这里只写获取关键词的那个.
using cloud0.Models; using LitJson; using System; using System.Collections.Generic; using System.IO; using System.Net; using System.Net.Http; using System.Text; namespace com.baidu.ai { public class BaiduAIUtilization { //获取关键词 public static string url_keyword = "https://aip.baidubce.com/rpc/2.0/nlp/v1/keyword"; public static string url_topic = "https://aip.baidubce.com/rpc/2.0/nlp/v1/topic"; //获取关键词 public static KeywordsResult GetBlogTags(string title_para, string content_para, string token) { string hostUrl = url_keyword + "?charset=UTF-8&access_token=" + token; HttpWebRequest request = (HttpWebRequest)WebRequest.Create(hostUrl); request.Method = "Post"; request.ContentType = "application/json;charset=UTF-8"; using (var streamWriter = new StreamWriter(request.GetRequestStream())) { var jsonObject = new { title = title_para, content = content_para }; string json = JsonMapper.ToJson(jsonObject); streamWriter.Write(json); } string jsonString; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream myResponseStream = response.GetResponseStream(); using (StreamReader myStreamReader = new StreamReader(myResponseStream, Encoding.UTF8)) { jsonString = myStreamReader.ReadToEnd(); } } return JsonMapper.ToObject<keywordsresult>(jsonString); } } }
其中,KeywordsResult是自己根据json返回对象建立的类,如下:
//keyword
public class KeywordsResult
{
public long log_id { get; set; }
public Item[] items { get; set; }
public class Item
{
public double score { get; set; }
public string tag { get; set; }
}
}
使用方式调用静态GetBlogTags()方法即可,参数分别是标题,内容以及token.使用的时候自己前往baiduAI开发者网站注册获取用于得到token的app_id,目前及以后都是免费的.
初步手动输入title和content的内容测试了一下,得出的结果,还是比较令人满意的.提供了这样免费的资源,为百度点个赞!
目前做了两个小接口,可以先行体验一下:
http://www.songshizhao.com/blog/blogs.asmx?op=GetBlogKeywords
http://www.songshizhao.com/blog/blogs.asmx?op=GetBlogTopics
进过体验,如果上传的文本中包含/t(tab)和/r换行符的时候,返回的结果很不理想.所以content除了获取纯文本,还应该替换/t/r
类似使用Replace("\r", "").Replace("\t", "");例如:puretext = HttpUtility.UrlDecode(puretext).Replace("\r", "").Replace("\t", "");
当然使用ajax方式调用百度AI应该是也是很方便的.这个以后再进行尝试.