图像中段落和表格检测技术

在处理扫描文档或图像时，经常需要识别图像中的段落和表格，并获取它们的布局以及边界框。这项技术在官方文件处理等多种应用场景中非常有用。本文将介绍如何通过深度学习模型和OCR技术实现这一目标。

解决方案概述

将使用深度学习模型和OpenCV以及API来执行OCR（光学字符识别），并获取边界框。以下是实现这一过程的一些步骤。

首先，需要安装布局解析器和detectron2进行检测。Detectron的详细信息可以在其GitHub页面查看。以下是安装命令：

!pip install layoutparser !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html

OpenCV默认使用BGR图像格式。因此，当使用cv2.imread()读取图像时，默认会以BGR格式解释。可以使用cvtColor()或image[..., ::-1]方法将BGR图像转换为RGB，反之亦然。

image = cv2.imread("/content/imagetemp.png") image = image[..., ::-1]

在计算机中，配置文件用于为某些计算机程序配置参数和初始设置。它们用于用户应用程序、服务器进程和操作系统设置。以下是获取配置文件的命令：

!wget https://www.dropbox.com/s/f3b12qc4hc0yh4m/config.yml?dl=1 !wget https://www.dropbox.com/s/nau5ut6zgthunil/config.yaml?dl=1

在这里，为扫描图像中的每个部分映射了颜色。之后，在它们周围绘制了布局框。

color_map = { 'Text': 'red', 'Title': 'blue', 'List': 'green', 'Table': 'purple', 'Figure': 'pink', } lp.draw_box(image, layout, box_width=3, color_map=color_map)

光学字符识别（OCR）是将图像中的打字、手写或印刷文本转换为机器编码文本的电子转换过程，无论是从扫描文档、文档照片还是场景照片中。在这里，使用了Python-tesseract作为Python的OCR工具。它将识别并“读取”图像中嵌入的文本。Python-tesseract是Google的Tesseract-OCR引擎的包装器。它也可以作为tesseract的独立调用脚本使用，因为它可以读取Pillow和Leptonica图像库支持的所有图像类型，包括jpeg、png、gif、BMP、tiff等。此外，如果作为脚本使用，Python-tesseract将打印识别的文本，而不是将其写入文件。

for block in text_blocks: segment_image = (block.pad(left=5, right=5, top=5, bottom=5).crop_image(image)) text = ocr_agent.detect(segment_image) block.set(text=text, inplace=True)

在数字图像处理中，阈值处理是分割图像的最简单方法。从灰度图像中，阈值处理可以用来创建二值图像。以下是相关文档链接：

file=r'/content/imagetemp.png' img = cv2.imread(file,0) img.shape # 将图像阈值处理为二值图像 thresh,img_bin = cv2.threshold(img,128,255,cv2.THRESH_BINARY) # 反转图像 img_bin = 255-img_bin cv2.imwrite('cv_inverted.png',img_bin) # 绘制图像以查看输出 plotting = plt.imshow(img_bin,cmap='gray') plt.show()

使用水平核检测并保存水平线到jpg文件中。

image_2 = cv2.erode(img_bin, hor_kernel, iterations=3) horizontal_lines = cv2.dilate(image_2, hor_kernel, iterations=3) cv2.imwrite("horizontal.jpg",horizontal_lines) # 绘制生成的图像 plotting = plt.imshow(image_2,cmap='gray') plt.show()

检测轮廓以进行后续的框检测。轮廓可以简单地解释为连接所有连续点（沿边界）的曲线，具有相同的颜色或强度。轮廓是形状分析、目标检测和识别的有用工具。

contours, hierarchy = cv2.findContours(img_vh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

以下是对轮廓进行排序的函数。

def sort_contours(cnts, method="left-to-right"): # 初始化反向标志和排序索引 reverse = False i = 0 # 处理如果需要反向排序 if method == "right-to-left" or method == "bottom-to-top": reverse = True # 处理如果按y坐标而不是 # 边界框的x坐标进行排序 if method == "top-to-bottom" or method == "bottom-to-top": i = 1 # 构建边界框列表并从上到下对它们进行排序 boundingBoxes = [cv2.boundingRect(c) for c in cnts] (cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes), key=lambda b:b[1][i], reverse=reverse)) # 返回排序后的轮廓列表和边界框 return (cnts, boundingBoxes)

将所有轮廓从上到下排序。

contours, boundingBoxes = sort_contours(contours, method="top-to-bottom")

创建一个包含所有检测到的盒子高度的列表，并获取高度的平均值。获取每个轮廓的位置（x,y）、宽度和高度，并在图像上显示轮廓。

heights = [boundingBoxes[i][3] for i in range(len(boundingBoxes))] # 获取高度的平均值 mean = np.mean(heights) # 创建一个存储所有盒子的列表 box = [] # 获取每个轮廓的位置（x,y）、宽度和高度，并在图像上显示轮廓 for c in contours: x, y, w, h = cv2.boundingRect(c) if (w<1100 and h<600): image = cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2) box.append([x,y,w,h]) plotting = plt.imshow(image,cmap='gray') plt.show()

创建两个列表来定义单元格所在的行和列。

row=[] column=[] j=0 # 将盒子排序到它们各自的行和列 for i in range(len(box)): if(i==0): column.append(box[i]) previous=box[i] else: if(box[i][1]<=previous[1]+mean/2): column.append(box[i]) previous=box[i] if(i==len(box)-1): row.append(column) else: row.append(column) column=[] previous = box[i] column.append(box[i]) print(column) print(row)

计算最大单元格数量。

countcol = 0 for i in range(len(row)) countcol = len(row[i]) if countcol > countcol: countcol = countcol

检索每列的中心。

center = [int(row[i][j][0]+row[i][j][2]/2) for j in range(len(row[i])) if row[0]] center=np.array(center) center.sort() print(center) finalboxes = [] for i in range(len(row)) lis=[] for k in range(countcol): lis.append([]) for j in range(len(row[i])): diff = abs(center-(row[i][j][0]+row[i][j][2]/4)) minimum = min(diff) indexing = list(diff).index(minimum) lis[indexing].append(row[i][j]) finalboxes.append(lis) outer=[] for i in range(len(finalboxes)): for j in range(len(finalboxes[i])): inner='' if(len(finalboxes[i][j])==0): outer.append(' ') else: for k in range(len(finalboxes[i][j])): y,x,w,h = finalboxes[i][j][k][0],finalboxes[i][j][k][1], finalboxes[i][j][k][2],finalboxes[i][j][k][3] finalimg = bitnot[x:x+h, y:y+w] kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1)) border = cv2.copyMakeBorder(finalimg,2,2,2,2, cv2.BORDER_CONSTANT,value=[255,255]) resizing = cv2.resize(border, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC) dilation = cv2.dilate(resizing, kernel,iterations=1) erosion = cv2.erode(dilation, kernel,iterations=2) out = pytesseract.image_to_string(erosion) if(len(out)==0): out = pytesseract.image_to_string(erosion, config='--psm 3') inner = inner +" "+ out outer.append(inner) arr = np.array(outer) dataframe = pd.DataFrame(arr.reshape(len(row), countcol)) dataframe

遥感分析云平台应用

本文介绍了如何利用地球引擎这一云计算平台发布一个包含多个面板的网络应用，展示了如何将多光谱卫星图像、SRTM高程数据、人口数据和NOAA夜间灯光数据等集成在一个互动式网络应用中，并说明了如何将该应用分享给他人。

数据探索的重要性与实践

本文探讨了数据探索的重要性，并通过研究生录取数据集的分析展示了如何进行有效的数据探索。

图像中段落和表格检测技术

解决方案概述

遥感分析云平台应用

数据探索的重要性与实践

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485

图像中段落和表格检测技术

解决方案概述

遥感分析云平台应用

数据探索的重要性与实践

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢 联系电话：17898875485

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485