<?xml version="1.0"?>
<rss version="2.0"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>IPOL</title>
<link>http://www.ipol.im/feed/</link>
<atom:link href="http://www.ipol.im/feed/preprints.rss" rel="self" type="application/atom+xml"/>
<description>IPOL Preprints — Latest public preprints from IPOL.</description>
<item>
	<title>Transcribing Lines of Handwritten Text Using TrOCR: An Encoder-Decoder Model Based on Pre-Trained Image and Text Transformers</title>
	
	<dc:creator>Natalia Bottaioli,
Daniel Parres,
Yung-Hsin Chen</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/587/</guid>
	
	<link>http://www.ipol.im/pub/pre/587/</link>
	<pubDate>Thu, 23 Apr 2026 17:03:27 +0200</pubDate>
	<dcterms:modified>2026-04-23T15:03:27Z</dcterms:modified>
	<description>This article focuses on analyzing several aspects of the handwritten text recognition (HTR) models belonging to the TrOCR family introduced by Minghao Li et al. in [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, AAAI Conference on Artificial Intelligence, 2023]. 
The TrOCR models are designed to recognize single lines of English text using a transformer-based encoder-decoder architecture. All models incorporate a pre-trained vision transformer as the encoder and a pre-trained text transformer as the decoder. The encoder is responsible for extracting key features from the image, while the decoder autoregressively transcribes the text, subword by subword, based on the extracted features. The authors report state-of-the-art performance across different text types, including handwritten, scene, and printed text. 
Our analysis has several objectives. The first one is to gain a better understanding of the training process and the data used for producing the handwritten models. The second one is to highlight and explore the functionality and limitations of the TrOCR model in the context of HTR, which poses unique challenges such as variations in individual writing styles. Additionally, we propose an architecture diagram that helped us better understand what the model actually does with the input text line image, which we hope will be useful for the research community using TrOCR. 

**This is an MLBriefs article, the source code has not been reviewed!**&amp;#x3C;br&amp;#x3E;
**The original source code is available [[here|https://huggingface.co/microsoft/trocr-base-handwritten]]  (last checked 2026/04/23).**&amp;#x3C;br&amp;#x3E;</description>
</item>
<item>
	<title>QMSANet: A Quaternion Multi-Scale Attention Network for Color Image Denoising</title>
	
	<dc:creator>Yi Liu,
Qi Xie,
Yu Guo,
Guoqing Chen,
Boying Wu,
Deyu Meng,
Jean-Michel Morel,
Qiyu Jin,
Michael Kwok-Po Ng</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/607/</guid>
	
	<link>http://www.ipol.im/pub/pre/607/</link>
	<pubDate>Tue, 04 Nov 2025 14:38:41 +0100</pubDate>
	<dcterms:modified>2025-11-04T13:38:41Z</dcterms:modified>
	<description>Color image denoising is a critical task in computer vision, often hindered by the underutilization of inter-channel correlations, resulting in color distortion and loss of fine details. We
propose QMSANet, a Quaternion Multi-Scale Attention Network, to address these challenges
by leveraging quaternion operations for color image denoising. Operating in the quaternion domain, QMSANet preserves channel dependencies across all processing stages, enhancing noise
suppression and detail retention. The network comprises three innovative modules: the Quaternion Multi-Scale Sparse Block (QMSB) for extracting multi-scale features with sparsity enforcement, the Quaternion Stacked Enhancement Block (QSEB) for refining deep features through
inter-channel interactions, and the Lightweight Quaternion Attention Block (LQAB) for adaptively focusing on salient features with minimal computational overhead. These modules collectively mitigate color deviation, detail loss, and edge artifacts. Extensive experiments on
benchmark datasets demonstrate that QMSANet outperforms state-of-the-art denoising models
in both synthetic and real-world noisy conditions. Typically, a blind denoiser exhibits diminished performance in comparison to a non-blind denoiser. However, QMASNet-B, a blind
denoiser constructed based on our model, also surpasses most of the comparison models. At
sigma = 15, QMASNet and QMASNet-B achieve PSNR improvements of 0.53 dB and 0.50 dB,
respectively, compared to the state-of-the-art method on CBSD68. Visual comparisons further
highlight its ability to preserve structural details. QMSANet offers a balanced, efficient solution
for high-quality color image denoising, with significant potential for real-world applications.</description>
</item>
<item>
	<title>SiamTE: Siamese Trace Erasing for Camera Trace Extraction</title>
	
	<dc:creator>Marina Gardella</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/558/</guid>
	
	<link>http://www.ipol.im/pub/pre/558/</link>
	<pubDate>Wed, 17 Jul 2024 15:19:29 +0200</pubDate>
	<dcterms:modified>2024-10-05T22:16:56Z</dcterms:modified>
	<description>The camera trace is a signal embedded in the image during
the image formation process which implicitly encodes information about the camera processing chain. In the article &amp;#x27;Camera Trace Erasing&amp;#x27;, Chen et al. propose a Siamese Trace Erasing method aiming at extracting those traces. In order to do so, the authors design a novel hybrid loss for network training. This hybrid loss is defined as a combination of three different losses: the embedded similarity loss, the truncated fidelity loss and the cross-identity loss. In this article we briefly explore the method and its results.

**This is an MLBriefs article, the source code has not been reviewed!**&amp;#x3C;br&amp;#x3E;
**The original source code is [[available here|https://github.com/ngchc/CameraTE]] (last checked 2024/10/06).**&amp;#x3C;br&amp;#x3E;
 </description>
</item>
<item>
	<title>Termite Retinex</title>
	
	<dc:creator>Gabriele Simone,
Mauro Fiorentini,
Alessandro Rizzi</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/531/</guid>
	
	<link>http://www.ipol.im/pub/pre/531/</link>
	<pubDate>Sun, 31 Mar 2024 20:13:30 +0200</pubDate>
	<dcterms:modified>2025-10-02T10:48:15Z</dcterms:modified>
	<description>The original presentation of Retinex, a spatial color correction and image enhancement 
algorithm modeling the Human Vision System, uses paths to explore the image in search of a local
reference white point. Here we present a spatial color algorithm, called Termite Retinex, 
with an alternative way to explore local properties of Retinex, replacing random paths with 
a colony of agents, which uses swarm intelligence to explore the image, determining in this way 
the locality of its filtering. We show the efficacy of Termite Retinex for unsupervised image enhancement,
dynamic range stretching and color correction for digital images.</description>
</item>
<item>
	<title>A Closed Form Solution to Natural Image Matting</title>
	
	<dc:creator>Mahdi Ranjbar,
Aissa Abdelaziz,
Mohammad Ali Jauhar</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/532/</guid>
	
	<link>http://www.ipol.im/pub/pre/532/</link>
	<pubDate>Wed, 13 Mar 2024 19:46:46 +0100</pubDate>
	<dcterms:modified>2024-03-13T18:46:46Z</dcterms:modified>
	<description>Natural image matting refers to the process of estimating the foreground opacity, also known
as the alpha matte, of an input image and extracting the foreground layer. It is a fundamental
and challenging task in computer vision and finds applications in various fields related to image
processing such as film and image editing, advertising, and medical diagnosis. Numerous 
approaches have been proposed to address this challenge incorporating both traditional and deep
learning techniques. In this paper, we reproduce the paper &amp;#x22;A Closed Form Solution to Natural
Image Matting&amp;#x22;, also referred to as &amp;#x22;Closed Form Matting&amp;#x22;. The proposed algorithm is based
on local smoothness assumptions on foreground and background colors, the color-line model,
and provides a simple closed-form solution that requires sparse scribble constraints instead of
more cumbersome and hard-to-develop trimap constraints. We present extended derivations, a
Python-based implementation, an online demo, along with comparison on contemporary methods 
and ablation study on the effects of hyper-parameters.</description>
</item>
<item>
	<title>Experimental Improvements of Global Optimization Algorithms for Lipschitz Functions</title>
	
	<dc:creator>Perceval Beja-Battais,
Ga&#xEB;tan Serr&#xE9;,
Sophia Chirrane</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/469/</guid>
	
	<link>http://www.ipol.im/pub/pre/469/</link>
	<pubDate>Mon, 05 Jun 2023 20:33:05 +0200</pubDate>
	<dcterms:modified>2023-06-05T18:33:05Z</dcterms:modified>
	<description>In this paper, we define an experimental context in which we tested the performances of LIPO and AdaLIPO, two global optimization algorithms for Lipschitz functions, introduced in [C. Malherbe and N. Vayatis, Global optimization of lipschitz functions, 2017]. We provide experimental proofs of the efficiency of
those algorithms, led numerical statistical analysis of our results, and suggested
two intuitive improvements from the vanilla version of the algorithms, referred
as LIPO-E and AdaLIPO-E. Within our test bench, these improvements allow
the algorithms to converge significantly faster and whenever they struggle to
find a better maximizer. Finally, we defined the scope of application of LIPO
and AdaLIPO. We show that they are very prone to the curse of dimensionality
and tend quickly to Pure Random Search when the dimension increases. We
provide source code for LIPO, AdaLIPO, and our enhanced versions</description>
</item>
<item>
	<title>An implementation of &amp;#x27;Efficient Multi-Stage Video Denoising Method&amp;#x27; and some variants</title>
	
	<dc:creator>Zhe Zheng,
Gabriele Facciolo,
Pablo Arias</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/464/</guid>
	
	<link>http://www.ipol.im/pub/pre/464/</link>
	<pubDate>Tue, 28 Feb 2023 19:44:12 +0100</pubDate>
	<dcterms:modified>2023-03-14T12:42:14Z</dcterms:modified>
	<description>Recently, the field of image and video denoising has undergone a revolution
thanks to deep learning approaches. These methods outperform traditional 
model-based approaches in almost every image/video restoration problem. In this paper,
we propose an implementation of a recent approach proposed for video denoising,
namely Efficient Multi-stage Video Denoising method (EMVD). The method
has a lightweight and interpretable architecture consisting of three stages: temporal
fusion, denoising, and refinement stages. We reproduce this method and propose
three modifications aimed at improving its performance. (1) We apply motion
compensation to make better use of temporal redundancy, (2) we apply variance
stabilization to help this lightweight network deal with signal-dependent noise and
(3) we decouple occlusion detection and fusion weights prediction. We evaluate the
original method and the proposed modifications on a task of raw video denoising.</description>
</item>
<item>
	<title>Fixed Pattern Noise Reduction: Temporal High Pass Filter</title>
	
	<dc:creator>Arnaud Barral</dc:creator>
	
	
	  <guid>http://www.ipol.im/pub/pre/436/</guid>
	
	<link>http://www.ipol.im/pub/pre/436/</link>
	<pubDate>Fri, 25 Nov 2022 09:21:14 +0100</pubDate>
	<dcterms:modified>2022-11-25T08:21:14Z</dcterms:modified>
	<description>Temporal high pass filter methods are a family of methods for Fixed Pattern Noise (FPN) reduction. They are recursive real time methods that apply a high-pass temporal filter to remove the FPN. FPN is a temporally coherent noise present on video due to the non-uniformity response of the sensors. It is a common problem for infrared videos and can degrade the quality of the observation. In this work we will study and compare three classical temporal high pass filter FPNR methods.</description>
</item>

</channel>
</rss>
