Tech

Improving Object Detection Accuracy with Smarter Orientation Modeling

Published on

May 12, 2026

Part of our Scientific Insight Series

This article is part of a series by Digital Sense exploring scientific papers authored by members of our team. We translate scientific research into clear, practical knowledge for engineering and product leaders.

On this occasion, we summarize the paper Structure Tensor Representation for Robust Oriented Object Detection by Xavier Bou, Gabriele Facciolo, Rafael Grompone von Gioi, Jean-Michel Morel, and Thibaud Ehret. Dr. Gabriele Facciolo is a professor at ENS Paris-Saclay and a Senior Scientific Consultant at Digital Sense. For more information on computer vision services, don’t hesitate to contact us.

What is Oriented Object Detection?

Oriented Object Detection (OOD) is the task of identifying objects in images and predicting their orientation. While traditional object detectors show horizontal bounding boxes (HBBs) aligned with the image axes, OOD models predict oriented bounding boxes (OBBs), typically defined as five parameters: (x, y, w, h, θ)—position, dimensions, and rotation angle.

But estimating angle θ is not trivial. This is due to the following challenges:

Angular periodicity: A rotation of 0° and 360° represent the same orientation, but naively comparing them during training results in large loss penalties, which lead to a performance degradation.
Symmetry ambiguity: Some objects (e.g., squares) look the same when rotated 90°, making unique angle estimation ill-posed.

These issues, already known in computer vision, limit the precision and robustness of existing OOD systems.

Why Does It Matter?

Orientation is not just a geometric detail—it is often a critical component of the semantics or functionality of an object in an image. Standard object detectors that rely on horizontal bounding boxes (HBB) are sufficient in many cases, but they fall short when precise alignment or rotation is essential for interpretation, measurement, or action. In these scenarios, Oriented Bounding Boxes (OBB) offer a significant advantage.

Orientation Carries Meaning in Real-World Applications

Across several high-impact domains, the alignment of objects provides context that cannot be captured by simple axis-aligned boxes:

Remote Sensing: In aerial imagery, extracting the direction of objects such as aircraft, ships, or ground vehicles is essential as they can appear in any orientation (when seen from above). A horizontal box may localize an object, but without orientation, especially for narrow objects can span a large uninformative box leading to cluttered detection results.

Scene Text Analysis (e.g., OCR): In natural or scanned images, text often appears in varied orientations. Detecting slanted or rotated text lines with standard boxes introduces redundancy or imprecision. Oriented detection allows for cleaner extraction, better layout analysis, and more accurate recognition pipelines.

Manufacturing and Industrial Inspection: In production lines, component orientation often signals whether parts are assembled correctly or if defects are present. Orientation-aware detection enables better quality control by reducing the ambiguity in object state.

Handling Slanted and Densely Packed Objects

Beyond semantic alignment, orientation-aware detection is functionally superior when dealing with dense layouts and rotated objects. A common problem in horizontal box detection is the failure of Non-Maximum Suppression (NMS) when slanted objects overlap along the axes but are clearly separate in angle. This leads to suppressed true positives and degraded recall in cluttered scenes.

Oriented Bounding Boxes resolve this by tightly fitting each object’s contour, allowing the detector to distinguish adjacent instances that happen to be aligned diagonally or placed at non-canonical angles. This is especially critical in scenarios such as:

Densely parked cars in satellite imagery.
Skewed or angled text in documents or signboards.
Closely aligned components on printed circuit boards.

Accurate orientation modeling does more than improve one prediction—it helps reduce downstream errors, improves post-processing reliability, and enables simpler logic in business rules that depend on object layout.

The Problem with Existing Approaches

Angle Coders:

Angle Coders transform orientation into a different space—e.g., circular smooth labels (CSL), phase coding, or classification-based encoding—to avoid the discontinuities in angle space. While modular, they often require heavy hyperparameter tuning, such as determining the number of angular bins or weight schemes.

Gaussian-Based Representations:

Some methods model the OBB as a 2D Gaussian distribution and compare them using statistical divergences (e.g., Kullback-Leibler, Wasserstein). These are theoretically elegant, continuous representations, but are computationally heavier and can be unstable, particularly under weak supervision.

Despite their cleverness, both paradigms fall short in two areas:

Precision in angular prediction
Robustness to symmetry (square-like objects)

The Innovation: Structure Tensors as an Angle Representation

The paper proposes a novel method: represent orientation using a structure tensor, a classical concept from image processing. Structure tensors are 2×2 symmetric matrices that encode both the orientation and anisotropy (directionality) of local image structures.

Mathematically:

A structure tensor T is formed from gradients and is decomposable into eigenvalues and eigenvectors:

The key idea in the paper is to encode an OBB as: ‍\[T=R\theta\bullet\wedge\bullet R_{\theta}^{\tau}\]

Where:

\(R\theta\) is the rotation matrix.

\(\land =diag(w/2,h/2)\) encodes half-width and half-height.

T is a symmetric matrix with 3 parameters that uniquely describe the orientation.

This representation elegantly sidesteps angle discontinuities and naturally models symmetry via eigenvalue ratios.

Comparison between traditional oriented bounding box and tensor representation — The proposed angle representation. Comparison between the traditional oriented bounding box format (x, y, w, h, θ) (in blue) and the structure tensor representation T (in orange). Orientation and anisotropy are represented in T by its eigenvalues λ1 and λ2 , and their corresponding eigenvectors v1 , v2 .

Handling Isotropic Objects

A key challenge with oriented detection is that isotropic objects (such as squares or circles) do not have a clearly defined principal orientation. In these cases, predicting a rotation angle becomes ambiguous and can destabilize training. The structure tensor framework addresses this by introducing controlled anisotropy: for objects where width and height are nearly equal, the method artificially sets a fixed ratio between the tensor’s eigenvalues (e.g., 2:1), thereby creating a dominant direction without significantly distorting the box shape. This subtle adjustment ensures that the loss function remains sensitive to angle deviations, allowing the network to learn meaningful orientations even for symmetric objects.

Integration Into Neural Networks

The authors adapt a standard two-stage detector (e.g., FCOS with ResNet-50 backbone), inserting a regression head that predicts the structure tensor instead of the angle directly.

During training, ground truth OBBs are converted into structure tensors using the encoder E(w,h,θ).
During inference, the predicted tensor is decoded back to the standard OBB using the inverse transformation.

This modular design allows plug-and-play use in existing object detectors with minimal overhead.

***Structure tensor representation in a neural network.*** (a) During training, the backbone extracts image features f , which are used for classification and regression. The angle head predicts orientation as a structure tensor Tpred , and the ground truth OBBgt is encoded into Tgt for angle loss computation. (b) At inference, Tpred is decoded into the standard OBB format (x, y, w, h, θ). Blue denotes standard detector components, while green highlights our method.

Empirical Results: Benchmarks and Comparisons

The authors evaluate their method across five datasets, covering aerial imagery, scene text, and images of COVID tests taken in the wild:

Dataset	Domain	Size (images)	Supervision Type
DOTA v1.0	Satellite	2806	OBB & HBB
HRSC2016	Ships / Aerial	1061	OBB & HBB
ICDAR2015	Scene Text	1500	OBB & HBB
MSRA-TD500	Scene Text	500	OBB & HBB
C19TD (new)	COVID Tests	1002	OBB & HBB

‍

***Qualitative results of the proposed approach on several datasets.*** On the top, from left to right, detection examples from HRSC2016, DOTA, and ICDAR2015 are shown. On the bottom, the left image corresponds to the MSRA-TD500 dataset, while the one on the right belongs to the C19TD test dataset.

Key performance metrics include:

mAP50: Mean Average Precision at 0.5 IoU threshold
MAEθ: Mean Absolute Error in angle (radians)
RMSEθ: Root Mean Square Error in angle

***Results on the remote sensing datasets DOTAv1.0 and HRSC***, showing mAP50 and AP50:95 for the proposed structure tensor representation and compared to SOTA methods. Both OBB-supervised and HBB-supervised approaches are reported. For each metric, the best score and the second best score are shown in green and blue, respectively.

These results show that the proposed structure tensor representation either matches or exceeds the best prior methods in both angular precision and detection accuracy.

Practical Impact for Computer Vision Applications

From a business and deployment perspective, this research introduces a practical innovation that has direct engineering advantages:

Modularity: Can be integrated into existing pipelines with minimal effort.
Precision: Higher angular accuracy, especially on symmetric or ambiguous objects.
Efficiency: Maintains low computational overhead.

Given the prevalence of orientation-sensitive applications—from satellite surveillance to industrial QA—this method provides a clean and efficient improvement path.

About Digital Sense

At Digital Sense, we especialize in taking complex projects and looking at them with a research eye to solve it in the best way possible. Our team includes PhDs, masters and experienced engineers that have produced more than 200 scientific publications and we have helped clients like ULTA Beauty, Orsted and Tonal deploy award-winning, production-ready systems in machine learning, computer vision, and remote sensing.

🔍 Read the full paper: arXiv:2411.10497

Reference

Bou, X., Facciolo, G., von Gioi, R. G., Morel, J. M., & Ehret, T. (2024). Structure Tensor Representation for Robust Oriented Object Detection. arXiv preprint arXiv:2411.10497.