Identifying the acquisition source of media data is one of the most widely studied problems in multimedia forensics. A crucial aspect in this field is the availability of representative, diverse and up-to-date data corpora, so that the potential of existing and newly proposed techniques can be assessed in a reliable and reproducible manner. In this light, we present a novel dataset, named DivNoise, which encompasses both image and video data from a wide range of device cameras and collected in different environmental conditions. In particular, differently from existing databases, the dataset also includes data acquired from frontal cameras of mobile devices (smartphones and tablets) and from webcams, which are increasingly used tools to enable remote video communications in many application scenarios. The dataset is made publicly available to the research community, with the goal of supporting the development of novel source identification techniques. We perform an experimental evaluation on the DivNoise dataset through state-of-the-art algorithms, thus exposing preliminary yet intriguing empirical insights.
DivNoise currently consists of 15,017 images and 338 videos, captured by 31 different devices (totaling 16 models produced by 8 different manufacturers), part of them being equipped with two different cameras for a total of 39 individual cameras in the dataset. During the acquisition phase, in order to ensure a uniform approach throughout the data collection process, all cameras were set to their default settings. For webcams, data have been acquired by using the native applications on such devices. Various scenes were captured in and around Media City Bergen (Norway), including images of the sky, grass, buildings, trees, flowers, motorcycles, offices, birds, traffic signs, streets, bikes, and scooters. Furthermore, it is worth noting that all cameras in the dataset captured the same scenes, encompassing identical locations and objects, with minor variations. This intentional consistency in scene selection holds significant value, as it encourages techniques to focus on characterizing camera-related artifacts rather than specific image content. The dataset encompasses a mixture of indoor and outdoor scenes, close-up and distant shots, as well as varied exposure levels, ensuring the dataset's versatility and applicability.
The dataset is divided into multiple parts for easier downloading. 'Part 1' includes all the smartphones, tablets, and webcams. The remaining parts, namely 'Part 2', 'Part 3', 'Part 4', 'Part 5' and 'Part 6' specifically contain the Canon cameras.
Check out which camera model took your picture here! (Only the DivNoise models are supported)
Demo