辦公文檔識(shí)別
專用API
【更新時(shí)間: 2024.04.12】
可對(duì)辦公類文檔的版面進(jìn)行分析,輸出圖、表、標(biāo)題、文本、目錄、印章、欄、頁(yè)眉、頁(yè)腳、頁(yè)碼和腳注的位置,并輸出分版塊內(nèi)容的OCR識(shí)別結(jié)果,支持表格識(shí)別和印章識(shí)別,支持中、英兩種語(yǔ)言,手寫、印刷體混排多種場(chǎng)景。
0.16/次
去服務(wù)商官網(wǎng)采購(gòu)>
|
瀏覽次數(shù)
33
采購(gòu)人數(shù)
1
試用次數(shù)
0
試用
收藏
×
完成
取消
×
書簽名稱
確定
|
- API詳情
- 使用指南
- 常見(jiàn) FAQ
- 關(guān)于我們
- 相關(guān)推薦


什么是辦公文檔識(shí)別?
服務(wù)詳情
百度智能云辦公文檔識(shí)別服務(wù)能夠?qū)k公類文檔的版面進(jìn)行詳細(xì)分析,輸出文檔中的圖像、表格、標(biāo)題、文本、目錄、印章、欄、頁(yè)眉、頁(yè)腳、頁(yè)碼和腳注的位置信息,并提供分版塊內(nèi)客的OCR識(shí)別結(jié)果。該服務(wù)支持表格識(shí)別和印章識(shí)別,適配中英文兩種語(yǔ)言,適用于手寫、印刷體混合等多種場(chǎng)景。
核心功能
- 文檔版面分析:識(shí)別文檔中的各個(gè)元素,如圖像、表格、標(biāo)題等,并定位其在文檔中的位置。
- 文檔混排識(shí)別:支持中文、英文兩種語(yǔ)言,適配純手寫、純印刷和手寫印刷混排等場(chǎng)景。
- 表格文字識(shí)別:識(shí)別文檔中的表格內(nèi)容,返回單元格文字內(nèi)容及其行列位置信息,支持各種表格樣式。
- 印章檢測(cè)識(shí)別:檢測(cè)并識(shí)別文檔中的印章,輸出印章內(nèi)文字內(nèi)容和印章位置信息,支持多種常見(jiàn)印章形狀。
使用場(chǎng)景
什么是辦公文檔識(shí)別接口?


辦公文檔識(shí)別服務(wù)Python示例代碼:
# encoding:utf-8
import requests
import base64
'''
辦公文檔識(shí)別
'''
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office"
# 二進(jìn)制方式打開(kāi)圖片文件
f = open('[本地文件]', 'rb')
img = base64.b64encode(f.read())
params = {"image":img}
access_token = '[調(diào)用鑒權(quán)接口獲取的token]'
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(request_url, data=params, headers=headers)
if response:
print (response.json())
辦公文檔識(shí)別服務(wù)JAVA示例代碼:
package com.baidu.ai.aip;
import com.baidu.ai.aip.utils.Base64Util;
import com.baidu.ai.aip.utils.FileUtil;
import com.baidu.ai.aip.utils.HttpUtil;
import java.net.URLEncoder;
/**
* 辦公文檔識(shí)別
*/
public class AnalysisOffice {
/**
* 重要提示代碼中所需工具類
* FileUtil,Base64Util,HttpUtil,GsonUtils請(qǐng)從
* https://ai.baidu.com/file/658A35ABAB2D404FBF903F64D47C1F72
* https://ai.baidu.com/file/C8D81F3301E24D2892968F09AE1AD6E2
* https://ai.baidu.com/file/544D677F5D4E4F17B4122FBD60DB82B3
* https://ai.baidu.com/file/470B3ACCA3FE43788B5A963BF0B625F3
* 下載
*/
public static String analysisOffice() {
// 請(qǐng)求url
String url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office";
try {
// 本地文件路徑
String filePath = "[本地文件路徑]";
byte[] imgData = FileUtil.readFileByBytes(filePath);
String imgStr = Base64Util.encode(imgData);
String imgParam = URLEncoder.encode(imgStr, "UTF-8");
String param = "image=" + imgParam;
// 注意這里僅為了簡(jiǎn)化編碼每一次請(qǐng)求都去獲取access_token,線上環(huán)境access_token有過(guò)期時(shí)間, 客戶端可自行緩存,過(guò)期后重新獲取。
String accessToken = "[調(diào)用鑒權(quán)接口獲取的token]";
String result = HttpUtil.post(url, accessToken, param);
System.out.println(result);
return result;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) {
AnalysisOffice.analysisOffice();
}
}
辦公文檔識(shí)別服務(wù)JAVA示例代碼:
#include <iostream>
#include <curl/curl.h>
// libcurl庫(kù)下載鏈接:https://curl.haxx.se/download.html
// jsoncpp庫(kù)下載鏈接:https://github.com/open-source-parsers/jsoncpp/
const static std::string request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office";
static std::string analysisOffice_result;
/**
* curl發(fā)送http請(qǐng)求調(diào)用的回調(diào)函數(shù),回調(diào)函數(shù)中對(duì)返回的json格式的body進(jìn)行了解析,解析結(jié)果儲(chǔ)存在全局的靜態(tài)變量當(dāng)中
* @param 參數(shù)定義見(jiàn)libcurl文檔
* @return 返回值定義見(jiàn)libcurl文檔
*/
static size_t callback(void *ptr, size_t size, size_t nmemb, void *stream) {
// 獲取到的body存放在ptr中,先將其轉(zhuǎn)換為string格式
analysisOffice_result = std::string((char *) ptr, size * nmemb);
return size * nmemb;
}
/**
* 辦公文檔識(shí)別
* @return 調(diào)用成功返回0,發(fā)生錯(cuò)誤返回其他錯(cuò)誤碼
*/
int analysisOffice(std::string &json_result, const std::string &access_token) {
std::string url = request_url + "?access_token=" + access_token;
CURL *curl = NULL;
CURLcode result_code;
int is_success;
curl = curl_easy_init();
if (curl) {
curl_easy_setopt(curl, CURLOPT_URL, url.data());
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_httppost *post = NULL;
curl_httppost *last = NULL;
curl_formadd(&post, &last, CURLFORM_COPYNAME, "image", CURLFORM_COPYCONTENTS, "【base64_img】", CURLFORM_END);
curl_easy_setopt(curl, CURLOPT_HTTPPOST, post);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback);
result_code = curl_easy_perform(curl);
if (result_code != CURLE_OK) {
fprintf(stderr, "curl_easy_perform() failed: %s
",
curl_easy_strerror(result_code));
is_success = 1;
return is_success;
}
json_result = analysisOffice_result;
curl_easy_cleanup(curl);
is_success = 0;
} else {
fprintf(stderr, "curl_easy_init() failed.");
is_success = 1;
}
return is_success;
}
辦公文檔識(shí)別服務(wù)PHP示例代碼:
<?php
/**
* 發(fā)起http post請(qǐng)求(REST API), 并獲取REST請(qǐng)求的結(jié)果
* @param string $url
* @param string $param
* @return - http response body if succeeds, else false.
*/
function request_post($url = '', $param = '')
{
if (empty($url) || empty($param)) {
return false;
}
$postUrl = $url;
$curlPost = $param;
// 初始化curl
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $postUrl);
curl_setopt($curl, CURLOPT_HEADER, 0);
// 要求結(jié)果為字符串且輸出到屏幕上
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
// post提交方式
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $curlPost);
// 運(yùn)行curl
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$token = '[調(diào)用鑒權(quán)接口獲取的token]';
$url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token=' . $token;
$img = file_get_contents('[本地文件路徑]');
$img = base64_encode($img);
$bodys = array(
'image' => $img
);
$res = request_post($url, $bodys);
var_dump($res);
辦公文檔識(shí)別服務(wù)C#示例代碼:
using System;
using System.IO;
using System.Net;
using System.Text;
using System.Web;
namespace com.baidu.ai
{
public class AnalysisOffice
{
// 辦公文檔識(shí)別
public static string analysisOffice()
{
string token = "[調(diào)用鑒權(quán)接口獲取的token]";
string host = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token=" + token;
Encoding encoding = Encoding.Default;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(host);
request.Method = "post";
request.KeepAlive = true;
// 圖片的base64編碼
string base64 = getFileBase64("[本地圖片文件]");
String str = "image=" + HttpUtility.UrlEncode(base64);
byte[] buffer = encoding.GetBytes(str);
request.ContentLength = buffer.Length;
request.GetRequestStream().Write(buffer, 0, buffer.Length);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.Default);
string result = reader.ReadToEnd();
Console.WriteLine("辦公文檔識(shí)別:");
Console.WriteLine(result);
return result;
}
public static String getFileBase64(String fileName) {
FileStream filestream = new FileStream(fileName, FileMode.Open);
byte[] arr = new byte[filestream.Length];
filestream.Read(arr, 0, (int)filestream.Length);
string baser64 = Convert.ToBase64String(arr);
filestream.Close();
return baser64;
}
}
}




安全合規(guī)可信的云服務(wù) |
||||


辦公文檔識(shí)別服務(wù)Python示例代碼:
# encoding:utf-8
import requests
import base64
'''
辦公文檔識(shí)別
'''
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office"
# 二進(jìn)制方式打開(kāi)圖片文件
f = open('[本地文件]', 'rb')
img = base64.b64encode(f.read())
params = {"image":img}
access_token = '[調(diào)用鑒權(quán)接口獲取的token]'
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(request_url, data=params, headers=headers)
if response:
print (response.json())
辦公文檔識(shí)別服務(wù)JAVA示例代碼:
package com.baidu.ai.aip;
import com.baidu.ai.aip.utils.Base64Util;
import com.baidu.ai.aip.utils.FileUtil;
import com.baidu.ai.aip.utils.HttpUtil;
import java.net.URLEncoder;
/**
* 辦公文檔識(shí)別
*/
public class AnalysisOffice {
/**
* 重要提示代碼中所需工具類
* FileUtil,Base64Util,HttpUtil,GsonUtils請(qǐng)從
* https://ai.baidu.com/file/658A35ABAB2D404FBF903F64D47C1F72
* https://ai.baidu.com/file/C8D81F3301E24D2892968F09AE1AD6E2
* https://ai.baidu.com/file/544D677F5D4E4F17B4122FBD60DB82B3
* https://ai.baidu.com/file/470B3ACCA3FE43788B5A963BF0B625F3
* 下載
*/
public static String analysisOffice() {
// 請(qǐng)求url
String url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office";
try {
// 本地文件路徑
String filePath = "[本地文件路徑]";
byte[] imgData = FileUtil.readFileByBytes(filePath);
String imgStr = Base64Util.encode(imgData);
String imgParam = URLEncoder.encode(imgStr, "UTF-8");
String param = "image=" + imgParam;
// 注意這里僅為了簡(jiǎn)化編碼每一次請(qǐng)求都去獲取access_token,線上環(huán)境access_token有過(guò)期時(shí)間, 客戶端可自行緩存,過(guò)期后重新獲取。
String accessToken = "[調(diào)用鑒權(quán)接口獲取的token]";
String result = HttpUtil.post(url, accessToken, param);
System.out.println(result);
return result;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) {
AnalysisOffice.analysisOffice();
}
}
辦公文檔識(shí)別服務(wù)JAVA示例代碼:
#include <iostream>
#include <curl/curl.h>
// libcurl庫(kù)下載鏈接:https://curl.haxx.se/download.html
// jsoncpp庫(kù)下載鏈接:https://github.com/open-source-parsers/jsoncpp/
const static std::string request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office";
static std::string analysisOffice_result;
/**
* curl發(fā)送http請(qǐng)求調(diào)用的回調(diào)函數(shù),回調(diào)函數(shù)中對(duì)返回的json格式的body進(jìn)行了解析,解析結(jié)果儲(chǔ)存在全局的靜態(tài)變量當(dāng)中
* @param 參數(shù)定義見(jiàn)libcurl文檔
* @return 返回值定義見(jiàn)libcurl文檔
*/
static size_t callback(void *ptr, size_t size, size_t nmemb, void *stream) {
// 獲取到的body存放在ptr中,先將其轉(zhuǎn)換為string格式
analysisOffice_result = std::string((char *) ptr, size * nmemb);
return size * nmemb;
}
/**
* 辦公文檔識(shí)別
* @return 調(diào)用成功返回0,發(fā)生錯(cuò)誤返回其他錯(cuò)誤碼
*/
int analysisOffice(std::string &json_result, const std::string &access_token) {
std::string url = request_url + "?access_token=" + access_token;
CURL *curl = NULL;
CURLcode result_code;
int is_success;
curl = curl_easy_init();
if (curl) {
curl_easy_setopt(curl, CURLOPT_URL, url.data());
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_httppost *post = NULL;
curl_httppost *last = NULL;
curl_formadd(&post, &last, CURLFORM_COPYNAME, "image", CURLFORM_COPYCONTENTS, "【base64_img】", CURLFORM_END);
curl_easy_setopt(curl, CURLOPT_HTTPPOST, post);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback);
result_code = curl_easy_perform(curl);
if (result_code != CURLE_OK) {
fprintf(stderr, "curl_easy_perform() failed: %s
",
curl_easy_strerror(result_code));
is_success = 1;
return is_success;
}
json_result = analysisOffice_result;
curl_easy_cleanup(curl);
is_success = 0;
} else {
fprintf(stderr, "curl_easy_init() failed.");
is_success = 1;
}
return is_success;
}
辦公文檔識(shí)別服務(wù)PHP示例代碼:
<?php
/**
* 發(fā)起http post請(qǐng)求(REST API), 并獲取REST請(qǐng)求的結(jié)果
* @param string $url
* @param string $param
* @return - http response body if succeeds, else false.
*/
function request_post($url = '', $param = '')
{
if (empty($url) || empty($param)) {
return false;
}
$postUrl = $url;
$curlPost = $param;
// 初始化curl
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $postUrl);
curl_setopt($curl, CURLOPT_HEADER, 0);
// 要求結(jié)果為字符串且輸出到屏幕上
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
// post提交方式
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $curlPost);
// 運(yùn)行curl
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$token = '[調(diào)用鑒權(quán)接口獲取的token]';
$url = 'https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token=' . $token;
$img = file_get_contents('[本地文件路徑]');
$img = base64_encode($img);
$bodys = array(
'image' => $img
);
$res = request_post($url, $bodys);
var_dump($res);
辦公文檔識(shí)別服務(wù)C#示例代碼:
using System;
using System.IO;
using System.Net;
using System.Text;
using System.Web;
namespace com.baidu.ai
{
public class AnalysisOffice
{
// 辦公文檔識(shí)別
public static string analysisOffice()
{
string token = "[調(diào)用鑒權(quán)接口獲取的token]";
string host = "https://aip.baidubce.com/rest/2.0/ocr/v1/doc_analysis_office?access_token=" + token;
Encoding encoding = Encoding.Default;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(host);
request.Method = "post";
request.KeepAlive = true;
// 圖片的base64編碼
string base64 = getFileBase64("[本地圖片文件]");
String str = "image=" + HttpUtility.UrlEncode(base64);
byte[] buffer = encoding.GetBytes(str);
request.ContentLength = buffer.Length;
request.GetRequestStream().Write(buffer, 0, buffer.Length);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.Default);
string result = reader.ReadToEnd();
Console.WriteLine("辦公文檔識(shí)別:");
Console.WriteLine(result);
return result;
}
public static String getFileBase64(String fileName) {
FileStream filestream = new FileStream(fileName, FileMode.Open);
byte[] arr = new byte[filestream.Length];
filestream.Read(arr, 0, (int)filestream.Length);
string baser64 = Convert.ToBase64String(arr);
filestream.Close();
return baser64;
}
}
}






安全合規(guī)可信的云服務(wù) |
||||


|
|
|
|
|
|
|
|
|
|
|
|