Tech

Anomaly Detection: UFlow (Part II)

Published on

November 26, 2025

This is the second part of "Anomaly Detection: UFlow", in which we introduced UFlow, our proposed method for anomaly detection in images based on Normalizing Flows.

More information on Digital Sense's Computer Vision development services & Machine Learning services here.

‍

‍

In the first part of Anomaly Detection:UFlow, we went through the method and explained how to generate the first output, an anomaly map composed of an anomaly score for each image pixel.

In this part, we explain how to compute the second output of the method, an automatic segmentation of the anomalies. It can be seen as classifying each pixel as anomalous or normal. In this approach, an estimate of the Number of False Alarms (NFA) is associated to each anomaly candidate, and detections are obtained by thresholding the NFA [8]. The proposed method has no parameters to tune and uses level sets organized into a tree structure to segment the anomalies automatically.

Before discussing this method, we should first introduce the A Contrario framework and the Number of False Alarms.

NFA: Number of False Alarms

The a contrario framework [8] is a multiple-hypothesis testing methodology that has been successfully applied to derive unsupervised statistical detection thresholds in a wide variety of detection problems, such as alignments for line segment detection [44], clustering [45], image forgery detection [46], and even anomaly detection [47–49], to name a few.

It is based on the non-accidentalness principle [51]. Given that we do not usually know what the anomalies look like, it focuses on modeling normality by defining a null hypothesis, also called background model. Relevant structures are detected as large deviations from this model by evaluating how likely it is that an observed structure or event E would happen under the null hypothesis.

One of its most useful characteristics is that it provides an easily interpretable detection threshold based on controlling the NFA, defined as

\[\text{NFA}(E) = N_TPr_{\mathcal{H}_0}(E)\]

Here N_T denotes the number of events tested, and Pr_H0(E) is the probability of the event E being a realization of the background model H0. Besides providing a robust detection threshold, as it will be shown, the NFA value itself has a clear statistical meaning: it is an estimate of the expected number of times, among the tests that are performed, that such tested event could be generated by the background model [8]. In other words, an event E is ε-meaningful if its expected number of occurrences is less than ε under the normality assumption. Consequently, ε provides an upper bound on the expected number of false detections we can obtain on anomaly-free images. A low NFA value means that the observed pattern is too unlikely to be generated by the background model and, therefore, indicates a meaningful anomaly.

As the events E corresponding to anomalies can have arbitrary sizes and shapes, we must test all possible configurations of connected sets of pixels within every image. But this is computationally inefficient and sometimes even impossible in a reasonable amount of time. Therefore, we propose to use the anomaly map to select the regions to be tested from the set of connected components of its level sets. These components are naturally ordered by inclusion, providing a convenient hierarchical representation that can be organized in a tree structure.

‍

Detecting anomalies from the level sets of the anomaly map

In image processing, level sets are the basis for a wide variety of morphological filters, and, as they are closely related to object boundaries, they have been used for edge detection [52, 53], registration [54, 55], image quantization [56], and segmentation [57], among others. All the information of a gray-valued image u(i,j) is contained in a set of binary images obtained by thresholding at different levels. Since no information is lost, level sets provide a complete representation: any image can be easily reconstructed from the whole family of its (upper) level sets [58]:

\[\mathcal{L}_\lambda = \{ (i,j)| u(i,j) \geq \lambda \}\]

Level sets naturally define a hierarchical representation, ordered by their geometrical inclusion. In the case of the upper level sets, each connected component defined by a certain level is included in another connected component defined by a lower level. Therefore, they can be naturally embedded in a tree representation.

Specifically, for each scale we propose to use the upper level sets of an image defined from the NF embeddings as

\[u(i,j) = \sum\limits_{k=1}^C(z_{ijk})^2\]

where C is the number of channels of the embedding.

Figure 1 shows a toy example that illustrates the procedure. Figure 1a represents the image u, with values u(i,j) denoted by numbers inside each pixel. Figure 1b shows the raveled indexes that serve as labels for the tree nodes shown in Figure 1c. The tree is built so that the nodes are given by the connected components of the level set L_{λ,c}, and the edges represent the inclusion, connecting level set components from two different levels.

The tree of connected components allows us to significantly reduce the number of image regions we effectively test. Instead of computing the NFA of all possible N_T connected regions of any shape and size, which would be intractable, we will limit this computation to the connected components of the upper level sets in the tree. By construction, these regions are expected to be good candidates for anomalies.

To compute the NFA of a connected component L_{λ,c} in the tree, we need to evaluate the Probability of False Alarm (PFA), i.e., the probability of occurrence of a region like the observed L_{λ,c} under the background model of normality. In the absence of anomaly, the embedding produced by the U-Flow model is

\[z_{ijk}^l \sim \mathcal{N}(0,1)\]

i.i.d. Therefore, variables u(i,j) are identically distributed as a Chi-Squared distribution of order C. Then, as the minimum pixel value in the connected component L_{λ,c} is λ, it follows that its probability of false alarms is given by

\[\text{PFA}(\mathcal{L}_{\lambda,c}) = \text {Pr} \biggr(\underset{(i,j) \in \mathcal(L)_{\lambda,c}}{\min} \ \ u(i,j) \geq \lambda \biggr) = \biggr(1 - \underset{\mathcal{X}^2(C)}{CDF}(\lambda)\biggr)^{|\mathcal{L}_{\lambda,c}|}\]

Once we have computed the log(PFA) values of all connected components in the tree, to obtain the final regions, we iteratively perform prune and merge procedures until convergence (until the tree does not change anymore). See the paper for more details.

log(NFA) computation

The scales are merged by upsampling the resulting log(PFA) heatmaps at different scales l, up to the original input size (H, W), and keeping the minimum value at each pixel position (i, j):

\[\log(\text{PFA}(\mathcal{L}_{\lambda,c})) = |\mathcal{L}_{\lambda,c}|\frac{C_l}{2\ln(10)}\biggr(1 + \ln \biggr(\frac{\lambda}{C_l}\biggr) - \frac{\lambda}{C_l}\biggr).\]

Finally, the log(NFA) value is given by:

\[\log(\text{NFA}(i,j)) = \log(N_T) + log(\text{PFA}(i,j)),\]

with

\[N_T = \sum\limits_{l} H_lW_l\sum\limits_{r=1}^{H_lW_l}\alpha\frac{\beta^r}{r}\]

Results

Providing an operation point is crucial in almost any industrial application. Most recent deep-learning industrial anomaly detection methods in the literature focus on generating anomaly maps and evaluating them using the AUROC metric; they do not provide detection thresholds or anomaly segmentation masks.

Here, we report the results of anomaly segmentation based on an unsupervised threshold obtained by setting \(NFA=1(\log(NFA)=0)\) . As explained in the paper, this threshold means that, in theory, we authorize, on average, one false anomaly detection per image.

As the state-of-the-art methods to which we compare do not provide detection thresholds, we adopt two strategies: (i) we compute an oracle-like threshold that maximizes the mIoU for the testing set, and (ii) we use a fair strategy that only uses training data to find the threshold. In the latter, the threshold is set to allow at most one false positive in each training image, as it would be analogous to setting NFA = 1 false alarm. As seen in the table below, our automatic thresholding strategy significantly outperforms all others, even when compared with their oracle-like threshold.

Segmentation mloU comparion for MTec-AD — Table 1: Segmentation mIoU comparison for MVTec-AD, with the best flow-based methods in the literature: FastFlow [19], CFlow [35], and CS-Flow [20], for the oracle-like and fair thresholds defined in Section 4.1.1. Our method largely outperforms all others, and even exhibits a better performance comparing the proposed automatic threshold with their oracle-like threshold.

Visual results

Example results for all MVTec categories. The first row shows the example images with the ground truth over-imposed in red. The results for FastFlow, CFlow, and CS-Flow are shown in the second, third, and fourth rows. The last two rows correspond to our method: the anomaly score defined in (2), and the segmentation obtained with the automatic threshold log(NFA) < 0. While other methods achieve a very good performance, in some cases, they present artifacts and over-estimated anomaly scores. Our anomaly score achieves very good visual and numerical results, spotting anomalies with high confidence. Finally, the segmentation with the automatic threshold on the NFA is also able to spot and segment the anomaly accurately.

Normal image examples for MVTec-AD categories — Normal image examples for all MVTec-AD categories. As can be seen, we always predict low values in the anomaly maps, and no detections are made.

Robustness

Additionally, it is important to note that our results obtained with the oracle-like threshold and the automatic threshold are very close. This demonstrates the validity of the proposed statistical derivation and the inter-scale independence achieved by the proposed architecture. To further illustrate this point and to show the robustness of the anomaly detection method with respect to the threshold on the NFA, the figure below depicts the mIoU for all the MVTec-AD categories as a function of − log(NFA). These graphs show that the unsupervised threshold log(NFA) = 0 is always near the optimal point, which is the oracle threshold, corresponding to the maximum mIoU reported in Table 1. Not only are the oracle and the automatic thresholds close, but the variation in mIoU is very low in a wide range of possible thresholds, showing that this detection strategy is robust and parameter-free in practice.

References

[8] Desolneux, A., Moisan, L., Morel, J.-M.: From Gestalt theory to image analysis: A probabilistic approach. Interdisciplinary Applied Mathematics (2008)

[28] Tsai, C.-C., Wu, T.-H., Lai, S.-H.: Multi- scale patch-based representation learning for image anomaly detection and segmentation. In: Proceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pp. 3992–4000 (2022)

[31] Roth, K., Pemula, L., Zepeda, J., Sch ̈olkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceed- ings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 14318–14328 (2022)

[35] Gudovskiy, D., Ishizaka, S., Kozuka, K.: Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Proceedings of the IEEE/CVF Winter Conference on Applica- tions of Computer Vision, pp. 98–107 (2022)

[44] Von Gioi, R.G., Jakubowicz, J., Morel, J.-M., Randall, G.: On straight line segment detec- tion. Journal of Mathematical Imaging and Vision 32(3), 313–347 (2008)

[45] Cao, F., Delon, J., Desolneux, A., Mus ́e, P., Sur, F.: A unified framework for detecting groups and application to shape recognition. Journal of Mathematical Imaging and Vision 27, 91–119 (2007)

[46] Gardella, M., Mus ́e, P., Morel, J.-M., Colom, M.: Noisesniffer: a fully automatic image forgery detector based on noise analysis. In: 2021 IEEE International Workshop on Biometrics and Forensics (IWBF), pp. 1–6 (2021). IEEE

[47] Grosjean, B., Moisan, L.: A-contrario detectability of spots in textured back- grounds. Journal of Mathematical Imaging and Vision 33, 313–337 (2009)

[48] Davy, A., Ehret, T., Morel, J.-M., Delbracio, M.: Reducing anomaly detection in images to detection in noise. In: 2018 25th IEEE International Conference on Image Process- ing (ICIP), pp. 1058–1062 (2018). IEEE

[49] Tailanian, M., Musé, P., & Pardo, Á. (2022). A contrario multi-scale anomaly detection method for industrial quality inspection. In Deep Learning Applications, Volume 4 (pp. 193–216). Singapore: Springer Nature Singapore.

[51] Lowe, D.G.: Perceptual Organization and Visual Recognition. Kluwer Academic Pub- lishers, USA (1985)

[52] Desolneux, A., Moisan, L., Morel, J.-M.: Edge detection by Helmholtz principle. Jour- nal of mathematical imaging and vision 14(3), 271–284 (2001)

[53] Cao, F., Mus ́e, P., Sur, F.: Extracting mean- ingful curves from images. Journal of Math- ematical Imaging and Vision 22, 159–181 (2005)

[54] Monasse, P., Guichard, F.: Fast computa- tion of a contrast-invariant image representa- tion. IEEE Transactions on Image Processing 9(5), 860–872 (2000)

[55] Mus ́e, P., Sur, F., Cao, F., Gousseau, Y., Morel, J.-M.: An a contrario decision method for shape element recognition. International Journal of Computer Vision 69, 295–315 (2006)

[56] Ballester, C., Caselles, V., Monasse, P.: The tree of shapes of an image. ESAIM: Control, Optimisation and Calculus of Variations 9, 1–18 (2003)

[57] Xu, Y., G ́eraud, T., Najman, L.: Context- based energy estimator: Application to object segmentation on the tree of shapes. In: 2012 19th IEEE International Conference on Image Processing, pp. 1577–1580 (2012)

[58] Serra, J.: Image Analysis and Mathemati- cal Morphology. Academic Press, Inc., USA (1983)

[59] Chernoff, H.: A measure of asymptotic effi- ciency for tests of a hypothesis based on the sum of observations. The Annals of Mathe- matical Statistics, 493–507 (1952)

[60] Jensen, I., Guttmann, A.J.: Statistics of lat- tice animals (polyominoes) and polygons. Journal of Physics A: Mathematical and Gen- eral 33(29), 257 (2000) https://doi.org/10. 1088/0305–4470/33/29/102

[61] Gioi, R.G., Hessel, C., Dagobert, T., Morel, J.-M., Franchis, C.: Ground visibility in satel- lite optical time series based on a contrario local image matching. Image Processing On Line 11, 212–233 (2021)

‍