1. Introduction
Increasing threats on the ocean call for an urgent and comprehensive assessment of deep-sea ecosystems status and changes therein (
Franke et al., 2020;
Roberts et al., 2023). In the deep sea, hydrothermal vents still constitute a relatively ‘pristine’ environment, but industries are increasingly interested in these metal-rich environments (
Boschen et al., 2013). At vents, seawater percolates through the ocean crust and is expelled as a hydrothermal fluid that precipitates in contact with cold seawater, forming hydrothermal chimneys and polymetallic sulphide edifices in which accumulate valuable chemical elements (e.g., gold, cobalt, manganese). The mixing of hydrothermal fluid with cold seawater creates a steep centimetre-scale gradient of environmental conditions (e.g., pH, oxygen concentrations, temperature, chemicals). This habitat is colonised by highly specialised endemic species depending on their physiological tolerance and nutritional needs (
Tunnicliffe, 1991). Predicting biological responses to deep-sea mining and adapting mining regulations require a good understanding of deep-sea community responses to changes in environmental conditions, the role of biotic interactions in structuring communities as well as species biology (
Van Dover et al., 2020). Recently, the development of deep-sea observatories (
Favali and Beranzoli, 2006;
Juniper, and Escartı́n, J., Cannat, M., 2007;
Matabos et al., 2022) and associated instrumentation (
Porter et al., 2009) provides unprecedented means to investigate and characterise ecosystems at increasing temporal resolutions (
Matabos et al., 2016). This is particularly true in heterogeneous and remote environments, where the poor accessibility and limited amount of on-site ship time impede the detailed characterisation of the environment and its associated faunal communities. Deep-sea observatories provide power and communication to instruments deployed on the seafloor, allowing for long-term time series of multidisciplinary data (e.g., geological, physical, chemical, ecological) with resolutions from seconds to decades (
Matabos et al., 2016). More specifically, the use of optical imagery deployed on these deep-sea platforms now makes it possible to directly monitor faunal communities (e.g.,
Aguzzi et al., 2015;
Lelièvre et al., 2017;
Robert and Juniper, 2012;
Van Audenhaege et al., 2022). In this context, a TEMPO(−mini) ecological module equipped with deep-sea lights and a camera, called SMOOVE, was developed to monitor the dynamics of deep-sea hydrothermal vent communities on hourly to multi-decadal scales on the mid-Atlantic (MAR) and Juan de Fuca (JdFR) ridges (
Auffret et al., 2009;
Sarrazin et al., 2007). These observatories also include an environmental module that measures temperature and oxygen and iron concentrations in the field of view (
Laës-Huon et al., 2016). The analyses of these two unique, high-frequency, long-term imagery time-series offer information on species biology (e.g., growth, behaviour and potentially reproduction) and biotic interactions, the details of which are largely still little known for vent species (
Van Dover et al., 2020), and provide data on species responses to changes in fluid flow. This 13-year high-resolution time series can help explore and determine the role of biological rhythms, natural cycles and stochastic changes in the evolution of species abundance and distribution at local scales (
Matabos et al., 2022). To date, analyses of subsamples of the images acquired with the modules have brought new insights on local community dynamics, such as the role of tides and inertial currents on species behaviour (
Cuvelier et al., 2014;
Cuvelier et al., 2017;
Lelièvre et al., 2017), or the role of local variations in hydrothermal venting and the high stability of mussel habitats along the slow-spreading MAR (
Sarrazin et al., 2014;
Van Audenhaege et al., 2022). However, since their deployment in 2010, the observatories have amassed an archive that now contains over 7000 h of video sequences, representing over 11 TB of imagery data, and is still growing.
Because the technology to acquire and process underwater marine imagery has significantly evolved in recent years, in situ imaging sensors are increasingly used in marine science (review in
Durden et al., 2016a) to quantify species abundance and distribution in the water column (
Biard and Ohman, 2020) and on the seafloor (
Devine et al., 2020), to study species biology (
Matabos et al., 2015;
Zweifler et al., 2017) and to map benthic communities and habitats (e.g.,
Macedo et al., 2022;
Marcon et al., 2014;
Van Audenhaege et al., 2021). Image analysis is non-invasive and allows monitoring animals in their natural environment over long periods of time. However, these advances have led to new challenges for the marine science community including the storage, management and annotation of ‘big data’ (
Schoening et al., 2018). In particular, multidisciplinary seafloor observatories generate data that accumulate faster than the processing power of research laboratories. Manual processing of these data is time-consuming, highly labour-intensive, and beyond the current human capacity. The effective exploitation of these data requires more human resources and additional computational solutions.
Automated detection was one of the first paths explored to help annotate such large datasets, but the initial approaches showed that the human eye still performed better than a machine for extracting data from complex imagery (
Aguzzi et al., 2009;
Aron et al., 2010;
Matabos et al., 2017;
Schoening et al., 2012). More recently, deep-learning approaches have offered new solutions for automatic classification. However, in the absence of a large training dataset, these solutions cannot yet be applied to our images, although they are increasingly used for underwater imagery (
Han et al., 2020;
Ortenzi et al., 2024;
Soto Vega et al., 2024;
Villon et al., 2018). The astronomy community was the first to use citizen science for data processing, asking volunteers to classify galaxies from space imagery (
https://www.zooniverse.org/projects, Galaxy Zoo;
Fortson et al., 2012;
Lintott et al., 2008). Since then, the Zooniverse platform has hosted a growing number of projects in various disciplines and is a great success, leading to a large number of scientific publications (e.g.,
Edney et al., 2024;
Westphal et al., 2022). Crowdsourcing, where a large number of citizens contribute to research projects through online classification/processing of data with little prerequisite knowledge, has now become a recognised and popular form of citizen science (
Silvertown, 2009). On land, the projects using citizen science image analysis to answer ecological questions include the monitoring of invasive species (e.g.,
Kim et al., 2024;
Parretti et al., 2023), the study of population biology (e.g.,
Edney et al., 2024;
Ra et al., 2022;
Swanson et al., 2016), the documentation of changes in the landscape (
Scott et al., 2021) or biodiversity censuses (
Di Cecco et al., 2021). Crowdsourcing data, relying on the ‘vote’ principle, can help bridge that gap, but requires a workflow for data validation, including the aggregation of multi-participant data. Several studies have proposed a range of validation protocols, including the use of an agreement rate among participants (
Kuminski et al., 2014;
Wick et al., 2020), machine learning algorithms based on participants and observation features (
Saoud et al., 2020), the weighted majority voting of mixed models (
Bird et al., 2014) or Bayesian approaches (
Mugford et al., 2021). However, when aggregating multi-participant data, all these methods consider the presence or absence of a biological species on an image without taking into account the pixel coordinates of the given individual organism. In the context of fixed-point observatories, especially in highly heterogenous environments, where the abundance and/or distribution of organisms in a small area is paramount to addressing ecological questions, these pixel coordinates are key elements to consider for data validation. This information may also be also paramount when studying growth or behaviour of sessile species such as corals (e.g.,
Girard et al., 2022;
Osterloff et al., 2019).
In this paper, we present the image annotation platform Deep Sea Spy (DSS) that was developed to help annotate the video images acquired by the SMOOVE cameras on the TEMPO and TEMPO-mini ecological modules deployed at deep-sea hydrothermal vents. The main objective of the DSS project was to build a web-based application for manual imagery processing to compile useful information for scientists, while also raising awareness among the general public about these remote ecosystems and the threats they face (
Boschen et al., 2013). Indeed, the deep sea represents more than half of the surface of the planet and plays a crucial role in climate regulation and global ecosystem functioning. Nevertheless, the deep sea and its role in sustaining life on Earth remain unknown for most people. At a time when decision-makers must take major actions involving the (non-)use of the deep sea (e.g. Biodiversity Beyond National Jurisdiction treaty, International Seabed Authority mining code, fishing), increasing deep ocean literacy is a sine qua non condition for making informed decisions (
Darr et al., 2020). In this paper, we in particular aim to (i) describe the DSS platform and its associated database as well as the tools and actions to involve citizens, (ii) establish a data validation workflow for imagery analyses carried out through citizen science actions by providing a method for multi-participant data aggregation with regard to the pixel coordinates of each individual target species in the image, (iii) evaluate citizens’ behaviour and performance in annotating complex deep-sea hydrothermal images through expert cross-validation and statistical metrics. By involving citizens in the scientific process of imagery annotation, we tackled two important aspects of i) offering new approaches to data collection and processing to handle the bottleneck due to big data generated by research infrastructures (RIs), and ii) raising awareness on scientific research, environmental issues and the deep ocean. This paper presents a preliminary analysis of citizen data in the EMSO-Azores and Ocean Networks Canada observatories, and can be used as a reference guideline for future development within other RIs aiming to help process complex data through public participation.
4. Discussion and conclusions
This paper presents the first results of the DSS project since its launch in 2017 and highlights the tremendous potential of citizens to support research and contribute to building large databases of annotated images (e.g.,
Anton et al., 2021;
Kuminski et al., 2014;
Lintott et al., 2008). Along with the design of the DSS platform, we developed a workflow to process, validate and analyse multi-participant citizen-derived data from image annotation with (i) an R package to process multiple annotations into individual abundance; (ii) a computational approach to identify AR thresholds and an ideal number of annotation repetitions; and (iii) repeatedly engaging with the public to increase citizen participation. This study demonstrates that citizen science is highly valuable for processing large observatory imagery databases, as previously observed in similar projects (e.g.,
Edney et al., 2024;
Westphal et al., 2022). Therefore, the database is still growing and deep-sea benthic imagery will exponentially increase worldwide in the future as other observatories develop and new technologies emerge (e.g., drones; autonomous underwater vehicles; cameras; e.g.,
Aguzzi et al., 2019;
Danovaro et al., 2017). Our workflow completes the set of validation standards proposed in the last decade (e.g.,
Kosmala et al., 2016;
Mugford et al., 2021;
Saoud et al., 2020;
Swanson et al., 2016) and is particularly relevant to any citizen dataset generated from single-point, long-term video camera systems (e.g., observatories) or any image annotation of abundant species, where the pixel coordinates of single individuals is important to consider when aggregating multi-participant data. The workflow presented here is already being adapted to other scientific questions using other platforms derived from DSS (i.e., Ocean Spy), including the exploration of the feeding behaviour of deep-sea corals by counting open polyps, or the campaign of shrimp annotation to support the development of deep-learning algorithms.
Comparing annotations by a marine biology expert with those carried out by citizen participants showed some limitations in finding accurate abundances in our set of images. Discrepancies between expert and citizen data can be related to video quality, abundance of the species of interest, participant experience, task complexity as observed in similar studies (
Langenkämper et al., 2019;
Wick et al., 2020), or error in expert judgement (e.g.,
Aceves-Bueno et al., 2017;
Crall et al., 2011). In this study personal experience of participants did not lead to a better performance. In fact, annotations performed by experienced people accounted for most of the false positives, but also for false negatives. This is partly expected as the most active annotators contributed more than a third of the dataset. Alternatively, this observation can reflect contrasting behaviours related to the desire to perform well. This can translate in fear of missing an individual leading to over-annotation, or of wrongly identifying a species, which leads to precautionary behaviour and under-annotation. Given the complexity of the images in terms of objects, texture and homogeneous colours, targeted species can indeed be hard to classify. Organisms can be partially hidden behind large engineer species (e.g., mussels, tubeworms), making it hard to detect them based only on their shapes. Therefore, the best clue to the presence of an organism is the location in the field of view (i.e., its habitat) combined with colour. At vents, many organisms, including microbial mats and filaments, display colours in a gradient from white to light brown. It is thus easy to misidentify patches in areas where organisms are expected. In addition, video quality can be affected by lighting, which can decrease in intensity over time, or by biofouling on the camera lens, which can mask part of the field of view, thus making it difficult to distinguish organisms (
Cowling et al., 1998). These issues are common at deep-sea observatories where sensors are deployed for long periods, from several months up to several years.
The occurrence of false negatives or positives can also result from participant behaviour. Independently of their annotation experience, some participants tend to underestimate the number of individuals in the image in fear of wrongly annotating objects, whereas others appear more concerned about missing an organism and tend to over-annotate. Hence, participant experience does not appear to be an important factor of accuracy. This ambiguity contradicts many studies that have highlighted higher performance in trained participants (e.g.,
Delaney et al., 2008;
Matabos et al., 2017;
Wick et al., 2020) and several studies that have also shown that citizens can perform almost as well as professionals (
Crall et al., 2011;
Holt et al., 2013). In our case, because the field of view remains constant over time, a participant can quickly learn how to recognise an organism, as long as a tutorial picturing the targeted species in its environment is available. In addition, an expert can also make mistakes because annotating thousands of images is repetitive and can lead to fatigue and a drop in attention (
Durden et al., 2016b;
Swanson et al., 2016). Quality issues in data collected by professionals has been reported in several studies and can strongly affect accuracy assessment of citizen science data (
Aceves-Bueno et al., 2017;
Crall et al., 2011;
Kosmala et al., 2016). Expert misjudgement may thus explain part of the discrepancies in classification between the expert and the participants, and probably accounts for some of the false positives for ARs greater than 0.1 in buccinid snail counts.
Swanson et al. (2016) showed that the aggregated participant answers were more accurate (97.9 %) than those of individual experts (96.6 %) when compared with consensus expert assessments. Here, additional experts are needed to confirm this pattern. The annotation of 4000 images is extremely time-consuming and could not be carried out by another expert due to limited human resources and time. In the future, efforts on the part of experts in image annotation in the lab need to focus on this dataset to enhance our validation protocol. ARs in this study were significantly lower than those reported in other studies (
Aceves-Bueno et al., 2017;
Kosmala et al., 2016;
Swanson et al., 2016), probably due to the difficulty of the task inherent to the nature of the images (see above), which increases the chances of incorrect expert judgement that, in turn, strongly affects citizen accuracy assessment. These strong discrepancies thus highlight the need for multiple cross-validation and controls when it comes to using citizen science in imagery analyses, calling for hybrid systems for environmental monitoring mixing citizen science and professional expertise (
Becken et al., 2019;
Saoud et al., 2020). Cross-checking with expert data, coupled with a relevant AR threshold and other analyses is a valuable combination to increase the accuracy of citizen science data.
Although some studies have shown that even for more complicated tasks, participants can perform as well as experts (
Butt et al., 2013;
Delaney et al., 2008), task complexity is clearly an important factor to take into account. Here, the optimal AR clearly differed strongly between the two considered species even if both were considered as level 0 complexity on the DSS platform. Crabs are a territorial species that inhabit mussel beds (
Matabos et al., 2015). The relatively large-sized crabs are easily identifiable, but they are often partially hidden among mussels and can only be detected through the presence of a claw or piece of carapace among mussel shells, making them hard to see. In addition, due to their territorial and aggressive behaviour, only a few individuals can occupy the field of view, leading to a high number of images with a veritable absence of crabs. This frequent absence may lead the most active participants to quickly validate an image ‘by habit’, whether it contains a crab or not. The validation procedure thus needs to be adapted to and reconsidered for each analysed species. The different targeted species vary greatly in terms of size, shape and number. Some species, such as polychaetes, shrimp or pycnogonids are small and hard to distinguish, even for experts (
Lelièvre et al., 2017;
Matabos et al., 2015) and correspond to more advanced task complexity. We thus expect citizen performance to decrease for these species. To compensate, their annotation is only available to trained participants (higher levels), although our results suggest that trained and highly active participants do not necessarily perform better. Considering this finding, future missions should perhaps consider targeting only one species to make the task more manageable and facilitate the detection by non-trained citizens (
Langenkämper et al., 2019).
Finally, and interestingly, the application was less effective when applied in a museum setting. Indeed, while the computer terminal ranked second best participant in terms of the number of images analysed, visitors’ annotations displayed a low accuracy and bad performance. This result has important implications for the accuracy of citizen science data when collected in different settings, an aspect that should thus be considered in the participant recruitment process. This difference among settings may result from the context, where visitors tend to just ‘play’ distractedly with the interactive set-up and equipment, in contrast with citizens contributing more seriously from their own computer. Motivation factors of participants include an interest in science, or more specifically in the project’s topic, or the desire to learn (
De Vries et al., 2019;
Raddick et al., 2010), and we can expect that citizens perform better when they independently make the effort to participate. Many programmes attract involvement through direct contact with the public at community events and conferences, but increasingly through social media platforms (e.g.
Saoud et al., 2020). However, none have quantitatively measured their efficiency (reviewed in
De Vries et al., 2019 and
Golumbic et al., 2020). In addition, maintaining participant commitment, by providing access to the data they collected and sharing scientific findings through social media and popular science articles, is essential to ensure continued participation (
De Vries et al., 2019;
Scott et al., 2021). The novelty in our approach was the development of free learning and outreach materials for teachers and instructors, thus ensuring continuous recruitment, while maintaining long-term collaborations. Our collaborations with high-school classrooms involved providing data collected by the students that they could then use and explore, an approach that constitutes a great educational incentive (
Bonney et al., 2009a). The choice of the outreach method is of utmost importance, and we emphasise here the importance of developing outreach resources in collaboration with formal educational settings and in accordance with national programmes. However, these efforts require a dedicated science outreach program, human resources for data curation and preparation and a communication plan (
Golumbic et al., 2020). Ensuring all of these aspects of outreach can be difficult for individual laboratories with limited human resources. The recruitment process and the ability to train and engage participants on the long term is also expected to affect the quality of data and should thus be carefully planned during the design phase of the project (
Golumbic et al., 2020).
A growing body of literature is now available on methods for citizen science data validation (
Bird et al., 2014;
Bonter and Cooper, 2012;
Kosmala et al., 2016;
Mugford et al., 2021;
Saoud et al., 2020) and will provide guidelines for future analyses. However, due to the wide variety of citizen science data, even in the specific case of imagery in terms of annotation types, it remains difficult to offer a common standardised validation approach. This large-scale crowdsourcing approach differs from other participatory species-monitoring programs in the fact that data are acquired by more than 1000 anonymous participants who annotated very few pictures, making it hard to include other variables, such as the level of training or participant experience, in the data validation process (
Aceves-Bueno et al., 2017;
Saoud et al., 2020). However, the detection of the temporal variation in abundance was intriguingly robust, independently of the AR. This result is of utmost importance in the context of environmental monitoring to distinguish natural rhythms from long-term trends, essential knowledge for predicting and detecting variations related to anthropogenic activities and global change. Thus robust performance in citizen science to detect trends in time series holds great potential for processing observatory data and will help to unlock the bottleneck associated with the exponential growth of imagery databases.
Another approach that has developed exponentially over the last decade, is to use citizen data to train deep-learning algorithms to detect organisms in photos (e.g.,
Cardoso et al., 2024;
Kuminski et al., 2014;
Langenkämper et al., 2019). Reaching good detection rates and performance using machine learning requires large reference datasets that are not currently available in marine environments (
Durden et al., 2021). Citizens can produce these datasets, which, if properly validated, have great potential to advance machine-learning applications (
Anton et al., 2021;
Langenkämper et al., 2019;
Van den Bergh et al., 2021). Although a number of studies propose new methodologies for the validation of citizen data (e.g., this study;
Bird et al., 2014;
Wick et al., 2020), training algorithms require clean datasets to ensure a good and reliable learning process. One solution is to enlist a community of participants to review thumbnail images produced by cropping the photo based on annotation coordinates (
Sullivan et al., 2009). This initial validation process provides a library of thumbnails of species of interest to be submitted to citizens and/or experts for validation. This validation process requires the development of a platform where citizens can validate and correct existing annotations that originate from volunteer annotations, or from machine-learning predictions (e.g., YOLO,
Ortenzi et al., 2024) to help clean reference databases. This validation task would help to reduce false positives and correct wrong classifications. Optimising the efficiency of such an approach and ensuring proper validation may require the selection of trained and committed participants (
Bonter and Cooper, 2012;
Sullivan et al., 2009). In the light of the behaviour analysis provided here, a deep-learning algorithm could also be developed to take into account the annotation history (time, date) and the participant statistics (recruitment mean, number of annotations, number of sessions) similar to the approach in
Saoud et al. (2020) to detect misidentification in citizen data.
In conclusion, DSS is more than a citizen science project but rather constitutes a full scientific programme, not only allowing for processing a large volume of imagery data (i.e., crowdsourcing;
Silvertown, 2009), but also provides a platform to raise awareness on the deep sea through media, public events and conferences, as well as educational resources for kids and teachers (
Bonney et al., 2009a,
Bonney et al., 2009b). The DSS platform has proven to be a valuable tool to increase ocean literacy, but its efficiency was not quantitatively measured or assessed. Literature reviews highlight the benefits of communication, data accessibility and easy-to-use platforms, but only provide a qualitative analysis of their efficiency (
De Vries et al., 2019;
Golumbic et al., 2020). We gave priority to preserving participant anonymity to maximise the number of participants. Thus, one possible avenue to quantify ocean literacy is to provide questionnaires to participants before and after their participation (
Sattler and Bogner, 2017). Assessing participants’ level of knowledge can help weight their annotations and improve the accuracy of detection. However, conducting this type of approach constitutes a specific transdisciplinary research project with the involvement of social sciences and requires dedicated funding, student and research time, more particularly considering the many pitfalls that persist when measuring ocean literacy (but see
Molloy et al., 2021).
The tools developed here helped facilitate, popularise and explain the scientific approach, demonstrating that everyone can contribute to research, thus removing barriers between science and society. The citizen science approach can help improve the quality, credibility and/or relevance of research projects, raises awareness on environmental issues and conservation and contributes to citizen engagement and empowerment (
De Vries et al., 2019;
Winickoff et al., 2016). This new way of ‘doing science’ can benefit both citizens and researchers by accelerating the processing of large imagery datasets for researchers, and by learning about and engaging in science for participants (
Bonney et al., 2009b). Recently, this annotation platform was extended to other ecosystem compartments into a single digital infrastructure, Ocean Spy (
https://ocean-spy.ifremer.fr), using a common web-based portal and a unique database hosted at IFREMER. Oceans are changing fast and are increasingly affected by human activities. Acquiring the necessary knowledge to properly inform environmental management requires technological developments to increase our observation and monitoring capacities, but also new means to accelerate data processing and analyses. Citizens represent a great reservoir of scientists, and citizen science has tremendous potential to enhance scientific knowledge in time and space, inform the management and conservation of ecosystems (
Bosso et al., 2024) and increase ocean literacy for the benefit of all (
Garcia-Soto et al., 2017). The DSS platform, and more broadly Ocean Spy, along with the developed validation protocol for data aggregation, can be applied to many imagery-based marine research projects, setting the foundation for future standards to support large-scale comparisons with the development of ocean observatories and large-scale seafloor optical mapping worldwide (
Aguzzi et al., 2019;
Levin et al., 2019).
CRediT authorship contribution statement
Marjolaine Matabos: Writing – review & editing, Writing – original draft, Validation, Supervision, Project administration, Methodology, Funding acquisition, Formal analysis, Conceptualization. Pierre Cottais: Writing – original draft, Visualization, Formal analysis, Data curation. Riwan Leroux: Writing – review & editing, Validation, Supervision, Formal analysis, Data curation. Yannick Cenatiempo: Writing – review & editing, Software, Conceptualization. Charlotte Gasne-Destaville: Writing – review & editing, Investigation, Formal analysis. Nicolas Roullet: Writing – review & editing, Software, Conceptualization. Jozée Sarrazin: Writing – review & editing, Supervision, Methodology, Conceptualization. Julie Tourolle: Writing – review & editing, Project administration, Conceptualization. Catherine Borremans: Writing – review & editing, Validation, Data curation, Conceptualization.