The YOLO model that still excels in document layout analysis

Document layout analysis can help people better understand and use the information in a document. However, the diversity of document layouts and considerable variation in aspect ratios among document objects pose significant challenges. In this study, we designed the Multi-Convolutional Deformable Separation (MCDS) module as the main structure of the network, using the YOLO model as a baseline. Integration of this module into the Backbone and Neck layers enhances the image feature extraction process significantly. Moreover, we incorporate ParNet-Attention to direct the network's focus toward document objects through parallel networks, thereby facilitating a more exhaustive feature extraction. To optimize the model's predictive potential, the Decouple Fusion Head (DFH) is employed within the Head layer. This technique leverages multi-scale features based on the decoupled head, thereby enhancing the accuracy of predictions. Our proposed model achieves remarkable performance on three distinct public datasets with varying characteristics, namely ICDAR-POD, PubLayNet, and IIIT-AR-13K. Notably, in ICDAR-POD, both IoU$_{0.6}$ and IoU$_{0.8}$ achieve the optimal mean Average Precision (mAP), 96.2 and 94.4, respectively.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods