1. Introduction
It can be considered that the methodology we know as the Scientific Method started no later than the 17th century. It seeks to build knowledge from observations, using a set of techniques such as the formulation of hypotheses, their validation or refusal according to the results obtained from other experiments, as well as several other logical formalisms (Skinner, 1956).
Since then, the scientific method has contributed to the intellectual and technical development of our societies in many fields such as physics, applied mathematics, or biomedical science, just to cite a few of them.
This methodology has two important requirements: repeatability and reproducibility. The first refers to the capacity of performing the same experiment as many times as needed in order to obtain results. For example, the detection of a gravitational wave is a singular astrophysical event on which researchers do not have the control, and certainly it is not possible to repeat the experiment. On the other hand, reproducibility is the ability to obtain equivalent results following the procedures detailed in a scientific publication assuming that the experiments can be repeated. Repeatability does not guarantee reproducibility, but reproducibility requires repeatability (Mandel, 1972). We refer the reader to the precise definitions by ACM1.
Some methods in the scientific literature nowadays are not reproducible and indeed articles have been retracted or retired given that the claimed results could not be reproduced by other researchers, among other causes.
We can identify several reasons why an article might not be reproducible. One common cause is that the authors do not provide enough details on the procedure. For example, a biologist might fail to tell exactly at which temperature some culture grew thus other researchers being unable to obtain the same results even if they follow the given procedure strictly. Another common cause is not being able to access the complete source code of some method. The pseudo-code descriptions in the published papers are not always enough to perform exactly the same experiment given that they do not contain absolutely all the implementation details. Most of the time the reason why authors do not disclose their source code is that whereas they are experts in their respective scientific fields, they are not necessarily specialists on software engineering and therefore they prefer not to release source code of disputable quality.
Not only the source code, but also failure to obtain all required data makes a publication not reproducible. For example, an article showing classification performance results with an artificial neural network but without providing the data from which it was trained makes the method not reproducible by others. In the case of neural networks others could obtain the same results when given a pre-trained network, but in that case, we should not consider the method fully reproducible if the input data that trained the network is not available.
Eventually, the reason why a method is not reproducible might be simply fraud (Crocker & Cooper, 2011), sadly. There is a counterproductive pressure to publish research, sometimes at the cost of communicating results which do not correspond accurately with reality, or that are simply frauds. This has, of course, devastating effects on the credibility of scientific research.
In some scientific fields, as for example in Biology or Astrophysics, it might be very difficult (if not impossible) to find exactly the same conditions when performing an experiment. As pointed out before, not all the physical conditions of the environment might be known, repeating the experiment can be impossible. For example, Particle Physics requires large accelerators (Evans, 1999) which are costly to build and operate, and out of reach to many scientists.
However, in computational sciences there is no excuse not to make reproducible research, given that in most cases the experiments are easy to repeat and verify. Also, the deterministic nature of many algorithms (given the same input, one obtains exactly the same output) which does not depend on external conditions, makes computational sciences a perfect candidate for reproducible research. If researchers are given the same inputs, they can verify by themselves the results and confront them to what is published.
A fundamental premise is that scientific results are only reliable when they can be verified by others. In 2009 David Donoho and others warned about the credibility crisis in scientific research after the observation that many publications in reputed journals failed to be reproduced by other researchers (Donoho et al., 2009). We can cite the case where the director of oncology of Amgen, an American biopharmaceutical company based in California, announced the failure to reproduce 47 out of 53 of the most important articles in oncology despite the company’s best efforts (Begley & Ellis, 2012). This certainly puts in doubt the work of all the scientific community and their methodologies.
Since then the situation has improved and indeed an increasing number of journals opt for the open access model and encourage the publication of any data and source code as supplementary material (Colom et al., 2019). However, this is not enough to reach fully reproducible research, since these materials should be part of the main publication body, not as a supplementary or optional element.
A reproducible publication needs to include both a detailed body text (including pseudocode descriptions), as well as the actual source code and any data which is needed to reproduce the results. This implies that both the article and the source code need to be reviewed (Petre & Wilson, 2014), and depending on the case, also the procedure which was followed to generate the data. However, the need of reviewers who are competent in understanding both the source code and the field of the article is challenging.
In the following we will discuss how IPOL publishes reproducible research, and how it deals with difficulties such as the one mentioned before, among others. The plan of the paper is the following: section 2 presents the IPOL journal and its motivations, challengers and actions taken, section 3 is an overview of the publication process from the point of view of one Editor in Chief, section 4 presents three different applications of methods published in image: biomedical and health public policy, explainable methods on image forgery detection, and IPOL as a tool for education. Finally, Section 5 follows the discussion by commenting on initiatives inspired by IPOL, such as the RRPR workshop within the ICPR conference. Section 6 concludes this article.
2. The Image Processing on Line (IPOL) journal
In 2009 a group of researchers from ENS-Cachan (France), Universitat de les Illes Balears (Spain), and Universidad de la República (Uruguay), founded the Image Processing On Line (IPOL) journal (ISSN: 2105-1232), as an attempt to formalize the state of the art in image processing with a special focus on the mathematical details of the algorithms and reproducible research. A modest contribution to fight against the credibility crisis in scientific research and, in our case, in computational sciences (Arévalo et al., 2017). The model of the journal follows a Diamond Open Access including the advantages of gold open access without any charge for the author.
2.1. A journal focused on Reproducible Research
IPOL started as a journal focused exclusively on Image Processing, but soon it expanded to other disciplines, for example Remote Sensing (Colom et al., 2020) and data types other than images, including audio, video and even physiological data.
So far, the state of the art in classic image processing methods is well covered, and we start to publish methods based on artificial neural networks. These methods, mainly using convolutional neural networks, are now indeed the new state of the art due to their superior performance, though they bring up new challenges: what about the training data? how to publish these new methods? how to ensure reproducibility?
IPOL publishes detailed signal-processing methods, with a strong emphasis on the mathematical details. We re-define the concept of publication, which is no longer just the article (say, printed or downloadable as a PDF), but also the source code and the data which is needed to obtain exactly the same results as claimed by the authors. The source code and the data are not supplementary or optional material, but part of the publication itself. The article is released under an open-source license (GPL, BSD, at the choice of the author), the article under a free-documentation license (Creative Commons CC-BY-NC-SA), and the data mostly under the Creative Commons CC-BY license. In the case an algorithm is patented2, the following text would be added:
This file implements an algorithm possibly linked to the patent <REFERENCE OF THE PATENT>. This file is made available for the exclusive aim of serving as scientific tool to verify the soundness and completeness of the algorithm description. Compilation, execution and redistribution of this file may violate patents rights in certain countries. The situation being different for every country and changing over time, it is your responsibility to determine which patent rights restrictions apply to you before you compile, use, modify, or redistribute this file. A patent lawyer is qualified to make this determination. If and only if they don't conflict with any patent terms, you can benefit from the following license terms attached to this file.
Every article in IPOL comes along with an online demo which allows users to test the algorithms with the proposed or their own data, thus allowing for reproducible research. This has led to an interesting observation: the state of the art in the scientific literature does not necessarily coincide with the use measured by IPOL according to the statistics of executions. Indeed, the size of the archives and the nature of the data uploaded by the users allow to reveal which is the real state of the art in the discipline, that is, which methods are considered as a reference and therefore used by many researchers and professional and amateur users around the world.
Many initiatives strongly foster reproducible research and open science nowadays, as for example the openAIRE program operated by CERN ("Shift scholarly communication towards openness and transparency and facilitate innovative ways to communicate and monitor research”) (Manghi et al., 2010), infrastructure and computational services such as IEEE's Code Ocean3, the open-access repository Zenodo developed also under the openAIRE program (Sicilia et al., 2017), platforms not focused mainly on reproducible research but which could indeed implement it, as the case of the Galaxy project in genomic search (Giardine, 2005), or the badging system proposed by Frery et al. to label works which comply with a list of requirements of reproducibility (Frery et al., 2020). Finally, we absolutely need to cite the Software Heritage initiative (Di Cosmo, 2020), a universal software repository for the preservation of software as our cultural heritage, as well as a pillar for science and industry. Software Heritage does not only store a fixed version of a source code, but tracks artefacts at different contexts (contents, directories, revisions, releases, and snapshots) according to unique and persistent identifiers (SWHIDs). Although not specifically designed to implement reproducible research, the capability to track source code at different levels and allowing to link a publication with a specific version of the source code makes Software Heritage an important asset for reproducibility in scientific research.
In the next section we shall discuss the challenges of IPOL since it was founded and the decisions we made to address several issues.
2.2. Challenges in IPOL and actions taken
The team of editors and scientific computing engineers behind IPOL has been and is continuously improving IPOL from the accumulated experience and the feedback of the users, authors, and editors. We shall comment in this section several on the difficulties we found and the decisions we took to address them.
Review of both the text of the article and the source code. Since the source code is part of the publication, the reviewers must ensure that the pseudocode descriptions given in the article match accurately what the submitted code does, without hidden tricks. That was a first difficulty which the IPOL editors had to address: reviewers who are both experts in the field of the article and in computer science up to the point to understand all details of the implementation, are scarce. Even more, the importance of quality research code is not yet fully acknowledged (European Commission. Directorate General for Research and Innovation., 2020).
A solution to mitigate this problem in IPOL was to use at least two reviewers for each article. One of the reviewers is an expert on the scientific field of the article, and the other is more specialized on reading the source code details and checking it against the pseudocodes. Of course, we expect that both reviewers are competent in both the field and understanding source code, but the level of expertise might be different according to their role. This strategy has been proved successful for IPOL and now it is applied for the review of most of the submitted articles.
Several bottlenecks from the demo system. The first version of the IPOL's demo system was written as part of the doctoral project of Nicolas Limare, a PhD student at ENS-Cachan (now ENS Paris-Saclay) (Limare, 2012), along with the responsibility to define the software guidelines of the journal, the configuration of the platform to submit articles (OJS), the mailing services, the website of the journal, and the first demo system (see (Limare & Morel, 2011) for one of the first descriptions). That was indeed a titanic work of a single person and the first articles were published along with functional demos. Of course, that very valuable demo system can be considered a pioneer prototype but nevertheless it had to be improved if we wanted IPOL to grow. A first problem of that demo system was that it was a monolithic system, in the sense that it was a single Python program which integrated all required functionality, making it somehow difficult to debug. Also, there was no possibility to distribute the computations along several servers, and the only way was to execute in the same unique machine that ran the complete demo system. That approach was problematic, since soon the server would not be enough to execute several demos at the same time, and the number of demos can only, luckily, increase. Eventually, a severe problem we had with that first prototype was that demo editors needed to write actual Python, HTML and CSS code to build a demo. That was a large bottleneck that started to block the publication of articles given that IPOL had to find demo editors who were willing to understand the internals of the demo system and to dedicate some of their time to code and design web pages. We could say that it was quite an artisanal work at that moment.
To get rid of that bottleneck we completely rewrote the demo system as a service-oriented architecture (SOA) of microservices (Papazoglou & van den Heuvel, 2007), thus allowing the concurrent executions along many servers (Arévalo et al., 2017). Moreover, instead of requiring demo editors to write Python and HTML code to write a demo, we managed to define a demo with a few lines of codes (the "DDL": Demo Description Lines), which mainly specify the input, parameters, and output. This allowed any editor to write new demos in a few minutes, without requiring any special technical skills, thus removing the bottleneck.
The choice of the programming languages. There exist many different programming languages and libraries available, and therefore their choice was a first problem when launching IPOL. At first the only option was to use C/C++ programs with a limited list of libraries accepted by the editorial board. Soon it became apparent that this approach limited much the freedom of authors and in 2013 the first discussions about Python were the first step towards its acceptance. Some of the reticence about using Python was on the instability of the function signatures (API) of the packages along different versions, but with the use of virtual environments this was no longer a problem. Many authors also had programs written in MATLAB or its free equivalent, Octave, and thus the IPOL's editorial board accepted these frameworks as well.
Our current approach is to accept C/C++ code with a set of limited libraries which can be expanded by the editorial board with the feedback of authors, Python code using any package found in PyPi (The Python Package Index) if a virtual environment is used, and also MATLAB/Octave code if authors are willing to use the version installed in our servers.
Maintenance of source codes and dependencies with third party libraries. A computer program is usually generated from a high-level description (source code) which is transformed into an executable binary in a process known as build or compilation, which also links to external libraries providing extra functionality, thus avoiding reinventing the wheel. This introduces dependencies at several levels. One is the compiler which generates the binary form, which evolves over time. Indeed, there is no guarantee that a program written a decade ago can be even compiled with the latest version of a language. For example, in January 1, 2000 the support of Python 2 was dropped in favor of Python 3, thus making the former obsolete. This does not mean that it is not possible to run the old code, but that it requires maintenance. Also, third party libraries might change their function signatures or the way the main program interacts with them, thus requiring support. One needs to assume that any computer program requires continuous maintenance to make it run for the long term. In the case of a publication, the article is linked to a certain version of the source code, and any further modifications are not official and not peer-reviewed. If a demo stops working because the code got too obsolete, IPOL asks the authors to optionally update their codes and add the modifications to the article's page history. If the authors do not wish to do so, IPOL would simply remove the demo and keep the article with the associated source code available. Note that the ??source code of the demo itself is not part of the published material, but just supplementary (but for sure, by design the demos ensure that the program they execute is the published peer-reviewed code). About the reference operative system and library versions, in IPOL the minimal requirement is Debian Stable, and any codes which are obsolete with respect to that version are not supported or maintained.
In the case of Python programs, only Python 3 is supported (given that Python 2 got superseded by Python 3 in 2020). We allow authors to use any packages found in Pypi under a specific version number. This certainly has improved the long-term online executions of the codes.
Journal indexation. Researchers want to communicate the result of their investigations as a fundamental part of their activity, and to this purpose they look for the journals with the best impact. These are the journals which are expected to reach the maximum number of readers. Moreover, most of the quality of the research is evaluated on the number of articles published in high-impact factor journals. Therefore, it has been important from the beginning that IPOL was an indexed journal with some impact factor.
At this moment IPOL is an indexed journal4 with ISSN(S) 2105-1232 and DOI 10.5201/ipol, included in the Emerging Sources Citation Index (ESCI), the previous step before obtaining a definitive Impact Factor. IPOL is indexed by SCOPUS5 since 2019, by DBLP since 2015 and found in the Directory of Open Access Journals (DOAJ), and the SJR, SCImago Journal & Country.
IPOL is more focused on the quality of the publications than on the quantity (about 20 articles/year), and this is certainly delaying getting an official Impact Factor. However, the fact that IPOL has been added to the ESCI indicates that we are in the right direction to obtain it, the same way IPOL is indexed already by SCOPUS.
Many of the challenges discussed here are related to the specificity of IPOL publications, which are not just articles, but also their source code and associated data, the three of these elements as a whole. This adds another level of complexity in the management of the journal, but certainly because IPOL does its best to ensure the quality of the research software it publishes.
3. IPOL from the perspective of an Editor in Chief
An IPOL submission goes through several steps before reaching publication. First, one of the editors in chief (EIC) does a quick check of the article and decides whether the proposed algorithm answers a real problem of interest in the field.
Additionally, the form of the submission is checked: the article should include pseudo-code, follow the formatting guidelines, and be accompanied by source code for the demo. Failing any of these criteria, the submission is rejected. In case of success, a section editor is named, who manages the designation of reviewers and makes the article follow one or several rounds of corrections/reviews. Most new contributors do not know how to create a demo. The section editor can do that or ask a demo editor to do it. This might require some feedback from the authors: extreme and default values of the different parameters and proposed input data. Once the demo is working properly, the editor can ask that a preprint of the article be prepared. This allows reviewers to test the demo directly. In case of acceptance of the paper, a copyeditor performs the mundane language and typo corrections and checks the layout of the article. Finally, an EIC has to approve the publication after a final reading of the article and some stress-test of the demo (typically trying it with extreme values of the parameters and on new data). The preprint webpage becomes the final webpage of the article and a DOI is attributed to it.
Experience shows that almost all submissions passing the preliminary EIC check reach the publication stage: authors have put more effort than with a standard journal or conference submission since clean source code had to be written (including that it matches the algorithmic descriptions of the main paper), and authors are motivated to answer the reviewers’ criticisms completely. On the contrary, the submissions going through early rejection are due to lack of understanding of IPOL requirements (no pseudo-code, or too high level for reproducibility, no source code).
4. Discussion on the applicability of IPOL to several fields
IPOL started as a journal on image processing, but then it expanded to more general signal processing algorithms and lately to other fields. In this section we present a selection of three different application fields: biomedical and health public policy, explicable methods on image forgery detection, and IPOL as a tool for education.
4.1. Signal processing methods and datasets for biomedical applications
Human physiological signals coming from electrocardiograms, post urography, eye trackers or inertial measurement units (IMUs), are frequently studied to infer information about underlying human biological processes and medical conditions. Such data are often complex to analyze as they generally are non-stationary, contain randomness, and can lie in high dimensional spaces. Consequently, many algorithmic methods have been used for their processing (Fulcher et al., 2013). However, the characterization of commonly recorded physiological signals is still the subject of debate (Duarte & Freitas, 2010; Liu et al., 2020). This can be explained both by the complexity of biosignals, the lack of reproducible and standardized methods of analysis, which has been reported in many fields (Föll et al., 2021; Quijoux et al., 2021), and the lack of standardized datasets (Goldberger et al., 2000).
Furthermore, two crucial issues in the medical context are first the interpretability of methods, which is important for the integration of the tools in the clinical practice, and secondly the reliability of the methods, which is necessary for the practitioner to trust these methods.
IPOL tools can be of great interest for all the challenges previously mentioned. In the following, we describe several examples of IPOL applications for the study of human gait and postural control that highlight these benefits.
Human postural control Postural control is an essential component of sensorimotor control (Macmahon, 2005). It is a key aspect to assess in the elderly, especially in the prevention of falls and frailty (Kurz et al., 2013), which are associated with serious injury, loss of autonomy and death. This higher susceptibility to injury is caused by age-related physiological changes and clinical diseases (Sterling et al., 2001).
Precise quantification of balance quality is crucial in order to enable the deployment of personalized strategies, which are proving effective in reducing frailty and falls (Rubenstein, 2006).
Postural control quality is commonly assessed through the recording of the Center of Pressure (CoP) trajectory - the point of application of the resultant of the reaction forces of the platform under the feet - using a force platform. Many variables have been proposed to extract information from the CoP trajectories, and no consensus is established about their repeatability, robustness with respect to changes in protocol or preprocessing methods, as well as discriminative power to quantify the risk of falling (Quijoux et al., 2020, 2021). This results mainly from the lack of standardized definitions and algorithmic procedures, which makes the comparison between studies difficult.
To deal with these issues, a review of the main CoP variables used to quantify postural control in elderly people has been proposed (Quijoux et al., 2021), accompanied by a code in open access. However, these variables are intended to be used by clinical practitioners, who may not necessarily have the knowledge and tools to run the code. For this reason, an additional IPOL demo has been set up (Nicolaï & Quijoux, 2021). This demo enables the user to provide any CoP trajectory in a text format, and outputs the list of CoP variables. In addition, graphical representations are displayed in order to ease the interpretability of the algorithms. Finally, IPOL demos provide access to the archive of signals that have already been submitted by users, a feature of high interest for the better understanding of algorithms behavior on various data, coming potentially from very different sources.
The above plethora of standardized features, although beneficial, challenges the efforts for valid statistics via standard univariate approaches. Due to the necessity of investigating the different posturographic profiles between fallers and non-fallers individuals, (Bargiotas et al., 2021) proposed a novel statistical learning approach in order to perform two sample hypothesis testing in a highly multidimensional setting, offering two main advantages:
1. It explores non-parametrically the groups’ significant differences (if any) avoiding p-value adjustments which may tend to be highly conservative with dimension increase,
2. It informs about the level of contribution (importance) of every feature in the final result, which is extremely important for the interpretation. The approach is completely generic (like the traditional statistics) and any dataset with two groups can be used.
An IPOL online demon is offered, enabling the user to use a simple spreadsheet document file to perform the test (Bargiotas, 2020).
IPOL also published novel algorithmic approaches for statokinesigram analysis. A machine-learning based approach has been recently proposed (Bargiotas et al., 2018) in order not only to evaluate the postural control, but also to detect and visualize interesting changes and transition phases of statokinesigrams through time. This work was based on the fact that homogeneity is not always a valid assumption for statokinesigram signals. Every individual may present periods of “bad postural control” (called Unquiet Blocks or UB) as well as periods of “good postural control” (called Quiet Blocks or QB). Consequently, the measure that is indicative to the general quality of postural control is the level of presence and intensity of UB periods compared to the QB (otherwise, the proportion between UB and QB periods of statokinesigram). Therefore, identifying QB/UB parts of a signal is key to this method of quantifying static balance. Briefly, statokinesigrams were split into blocks of predefined time-periods, each block is then described with a multidimensional vector with established postural features from the literature (see (Quijoux et al., 2021) for relevant features); each block (its multidimensional description) is then scored, using the posterior probability to belong to the QB and UB clusters. The technical details and the performance of the algorithm were presented in an IPOL article (Bargiotas et al., 2019) accompanied by a code in open access as well as an IPOL online demo, which enables the user to provide any statokinesigram, and outputs a score and a unidimensional representation of QB and UB periods.
Human gait Human gait is a complex mechanism which can be altered in a number of ways by a wide range of pathologies (e.g. Parkinson’s disease or stroke). Gait analysis is therefore a central problem in medical research. Because a deteriorated walk behaviour often results in a significant loss of autonomy and an increased risk of fall, this subject has profound consequences in the public health domain. To address these issues, researchers have sought to objectively quantify gait characteristics. A natural strategy is to measure a patient's movements while walking, calculate a number of characteristics from the collected time series, and explore meaningful relationships between these characteristics and the patient's health status. Many studies have been proposed, using many different measurement devices: surface electromyography sensors (Bovi et al., 2011), instrumented treadmills (Moore et al., 2015), motion capture camera (Santos et al., 2017), etc. In those works, gait is usually described as a succession of footsteps or gait cycles, which are the core atoms of the human gait.
Strikingly, compared to the volume of published articles, very little of the collected data is freely available and documented enough so as to be reusable. The objective of open and curated sets of gait data is twofold. First, clinicians would be able to test and fairly compare clinical hypotheses, such as the discriminative power of walking patterns in a faller/non-faller population (R. P. Barrois et al., 2016; R. P.-M. Barrois et al., 2017). Second, bio-engineers would be able to design and measure the accuracy of algorithmic procedures. In gait analysis, such procedures include, for instance, methods to detect, in a signal, the temporal boundaries (i.e. the starts and ends) of footsteps (Laurent Oudre et al., 2018).
Several initiatives have emerged to achieve both objectives. A well-known example of freely available gait signals is the Daphnet data set (Bachlin et al., 2010), which consists of more than 8 h of time series, recorded with Inertial Measurement Units (IMUs), from patients with Parkinson’s disease. In the same way, the HuGaDB database (Chereshnev & Kertész-Farkas, 2018) contains 10 h of signals, from IMUs and electromyography sensors, with activity annotations (e.g. walking or climbing stairs). Similarly, in (Brajdic & Harle, 2013; Harle & Brajdic, 2017), 27 subjects have been monitored with the inertial sensor contained in a smartphone, resulting in about 3 h of signal. None of those data sets provide precise information about the subjects’ footsteps.
To address this, we published (Truong et al., 2019) data collected with IMUs from a total of 230 subjects who have been monitored while undergoing a given protocol during one or several clinical consultations. The protocol consisted in performing simple activities (standing still, walking 10 m, turning around and walking back 10 m) in a fixed order. In contrast to existing data sets, our data set includes the exact start and end time stamps of all footsteps recorded by the sensors (more than 40,000 in total), which is the most precise level of analysis for gait studies. Overall, our data set contains around 8.5 h of gait signals, collected from 230 subjects. This is currently the largest data set freely available, in terms of participants and footstep annotations. The published work is on IPOL where the time series and metadata can be downloaded as well as explored online (without code) using the graphical interface of the journal’s website. To illustrate the richness of this data set, several articles have already used part of it, in computer science and clinical research (R. Barrois et al., 2015, 2016; R. P.-M. Barrois et al., 2017; L. Oudre et al., 2015). One article has also used this exact data set (Laurent Oudre et al., 2018). In addition, it is frequently used in data challenges, where new ways to tackle the step detection problem are often explored.
4.2. Explicable methods of medical and health public policy
In critical situations such as a pandemic, urgent political decisions have to be made. This is the case for instance with regards to non-pharmaceutical measures: lockdown, mandate on mask wearing indoor or outdoor, mandatory vaccination (Boulant et al., 2020). Those public policy decisions have to be made in a complex and evolving environment in which the medical dimension is only one of several others: civil order, procurement of medicines or basic necessities to name a few. It is thus of great importance that the decision makers can understand and interpret properly the tools they are using so that it helps reduce the entropy of the situation to be addressed. For some of the decisions, policy makers rely on simulations or models in order to assess potential outcomes under various hypotheses. The COVID-19 pandemic crisis outlined the importance of the simulations and models being used as well as the possible huge impact on public policy decisions affecting millions of people in numerous countries facing different situations.
Moreover, as we are referring to public policies related to public liberties, the public trust on such simulations and models is a dimension of the public policy itself. Indeed, a decided lockdown has to be observed, people need to accept to wear a mask and mandatory vaccination is only useful if people comply with it.
In a context that requires urgent decision making, various expert groups were formed with no clear decision path on what model/simulation were used under which hypothesis and with what confidence interval under the given information at decision time (Baron et al., 2021).
In this context, IPOL hosts at this moment two publications:
- One introducing a model that offers the possibility for policy-makers to explore differentiated containment strategies, by varying sizes for the low risk segment and varying dates for 'progressive release' of the population, while exploring the discriminative capacity of the risk score, for instance through its AUC (Boulant et al., 2020).
- One proposing a documented and transparent implementation of a model highly inspired from the compartmental model developed by (Sofonea et al., 2020). This model has been used as a reference for several working groups in France during the COVID-19 crisis. The IPOL publication (Baron et al., 2021) and online demo aim to make the model implementation fully transparent as well as the corresponding code available and give control to users so that they are able to test the model in total transparency. Focus has been put on reproducibility and explanation of the various parameters.
Those two publications benefit from IPOL valuable features in the context of health application for public policy:
-Transparency Everything is available:
-the article describing the simulation or model used, from theoretical approaches;
-the source code running the explicit model with user defined parameter inputs.
-Implementation and reproducibility Alongside the paper, an IPOL publication comes with an implementation of the proposed method. This implementation also comes with an online demo that packages it so that a user can run the proposed method from the web interface of the publication allowing her to see if the approach fits her needs in terms of hypothesis or information available at decision time.
-Mathematical concept explicitation and peer reviewed Papers are peer-reviewed, hence guaranteeing that some validations have been made to ensure that the proposed methods comply with the state of the art at publication time.
-Confidence interval with regards to variations on parameter values A user can make as many runs as he wishes according to various hypotheses matching her context and immediately see the impact on the outcome of the simulation. If a user has a complicated use case, he can even download the source code and run it locally (not on the platform) or modify it to suit her particular needs. It allows for incremental research.
4.3. Explicable methods on images forgery detection
Forged images can be used as fake evidence in various contexts such as news, journalistic articles, scientific publications and legal proceedings. This raises the question of the trust that can be given to images. The rise of image manipulation in order to create fake documentation has degraded public trust on images.
Digital image forgery detection has emerged as a research field to tackle this problem. Several public image verification platforms have been developed to bring these tools closer to society. Examples of such platforms include the InViD-WeVerify plug-in, the REVEAL Image Verification Assistant, the FotoForensics and Forensically online tools, the Ghiro open-source project and the Authenticate software by Amped.
These tools all share a similar approach; they apply a series of forensic algorithms to the suspected image in order to guide the user in its interpretation. However, they are unable to evidence an image’s forged or authentic status, and cannot reliably estimate their confidence in a given detection. Furthermore, details on the used algorithms as well as their limitations are not correctly documented nor publicly visible. Further limiting usability in many scenarios, source codes are most often not provided.
Indeed, the aforementioned problems are crucial when using the results of these platforms as scientific proof of forgery. The end-users need to perform their own final interpretation of the result, without having the expertise required to do so. Even with the necessary knowledge, the justification and exact details of the methods are not documented.
IPOL fills this gap by displaying, together with the demos, an article explaining the foundations of the methods and their limitations and use cases. Indeed, in order to be used as evidence in courtrooms, methods need to be reproducible and explainable.
We wish to emphasize that IPOL stands as a complement to image verification platforms rather than as a competitor. IPOL online demo can directly be included in those. Indeed, IPOL is already connected to the InViD-WeVerify plug-in through an API call and provides the engine behind the CheckGIF feature as well as several forensics methods (Bammey et al., 2021).
4.4. IPOL as a tool for education
Another important topic regarding reproducible research is its large and positive impact in scientific education. Educating in scientific research means to teach the students that they must follow a reflective and systematic process that explores, understands, and demonstrates their theories in an empirical, measurable, and truthful way. This is also known as the scientific method and it is the key to consider a research publication as science. In fact, it could be said that "where there is no scientific method, there is no science (Bunge, 1967)".
The academic community, in the performance of research, education and scientific dissemination, bases their activities on principles of rigor, responsibility and honesty (Mario Bunge, 1960). For this reason, we consider that the students, and specially the PhD candidates, must learn under the premise that their research has to be reproducible, and their results based on evidence perfectly verifiable (Monzón López, 2018). A research work may be mistaken in its conclusions but, if it is reproducible, it is at least an honest work (Peng, 2011).
Some of the most interesting teaching proposals highlight that the formation process should not be achieved only from a theoretical perspective without real application. The student's activity, guided by the teacher, must be supported by empirical references encompassed within the experimental activity (Vergnaud, 1989) (Hodson, 1994).
The creation of an experimental practice environment requires, according to (Pickering, 2013), the addition of three structural elements: a material procedure, an instrumental model, and a phenomenal model. The first two imply that the teacher must have the necessary material and instruments to create the work environment, in addition to the necessary skills in its management and understanding. The third involves correctly understanding the phenomena we study in order to transmit them. The intersection of these three elements means achieving our goal: an environment where to study phenomena (and interpret them) by instilling scientific thinking in our students.
In this sense, research training using IPOL, due to its demo system and its reproducible background, is quite a valuable scientific tool to prepare new scientists (Monzón, 2018), not only because it will be useful to ensure the reliability of their papers, but also because it will facilitate them to open new lines of collaboration.
In fact, and as we exposed in previous sections, IPOL is a quite useful journal to show the robustness and effectiveness of published scientific methods due to its demo system. Moreover, as pointed out in a recent survey on reproducibility, the educational role of reproducing projects is of main interest both from AI researcher and students’ point of view (Yildiz et al., 2021). It is an opportunity to enhance their science contributions, facilitate its understanding and reuse in a way that other members of the scientific community can increase its reach. This will likely translate into citations, acknowledgment, and possible co-authorship of publications with other research groups (Piwowar et al., 2007). In addition, the publication of proven code enhances its quality and helps to accelerate scientific progress.
5. Initiatives inspired by IPOL: the RRPR workshop
IPOL has inspired other initiatives related to reproducible research. In this section we present the Reproducible Research on Pattern Recognition workshop which is organized bi-annually. This workshop is devoted to incite and highlight reproducible contributions in the context of the Pattern Recognition domain. To reach a maximum of researchers of that community we decided to propose this event in association to a large international conference dedicated to pattern recognition, the International Conference on Pattern Recognition (ICPR). It is a major event that started in 1973, and gathers a large community of generally around 1500 visitors. In this context, we proposed the RRPR event as a satellite workshop6. These types of events are usually held one day just before the main conference. From the first edition in 2016, two other editions were then proposed in 20187 and 20208.
The call for papers was initially structured into two main tracks, the RR frameworks and RR results. The first track was dedicated to the platforms helping for RR research. Different contributions were proposed including for instance the presentation of new libraries like OpenMVG (Moulon et al., 2017) useful for reproducibility or more recently a new platform called ReproducedPapers focusing on reproducibility and openly teaching in machine learning (Yildiz et al., 2021). The second call of RR results was introduced in order to offer the possibility to main ICPR authors (or other past events) to highlight the reproducibility quality of their work including sensibility measure from parameter changes, implementation details and detailed algorithms, links to online demo and source code. In addition to these two main tracks, a new call for ICPR companion paper was proposed in order to allow exclusively the authors of an already accepted paper to detail how to reproduce their work, both from implementation or technical details that are generally difficult to include in the initial contribution.
A reviewing process with classical two or three single blind reviews was organized. This step was also including special technical code review, in particular for contributions of tracks on RR results and short companion papers. Moreover, such reviews were also sometimes in coordination with the Reproducible Label in Pattern Recognition (RLPR) that was proposed during the workshop.
The motivation of the RLPR was to offer the possibility to the main ICPR authors to demonstrate the reproducibility of the work they present during the main conference. The process was based on the use of an online code source repository (hosted on GitHub) that was publicly visible. The submission includes all instruction steps to reproduce the results presented in the paper (including resulting figure, tabular). The benefit for authors was to have a proof the result could be produced by others on potentially different platforms and was the occasion also to detect potential issues on specific configurations. Moreover, the authors have the possibility to mention the label during the main conference presentations that may advertise the research quality to the main conference audience. Note that such an initiative follows the spirit of other initiatives, as the replicability stamp or the study of replicability in the domain of computer graphics (Bonneel et al., 2020).
This workshop is to be continued in further editions and we hope that authors will keep their manifest interest and participate actively in both tracks. For the companion papers one could imagine to associate more visibility by proposing to address the review process at the same time than the main conference review process. That could help to increase quality selection by using the reproducibility aspect as new evaluation criteria. Another strategy to extend the initiative in the future, could be to propose a variant of this workshop in other domains such as computer vision inside major events (CVPR, ICCV).
6. Conclusions
IPOL has been our contribution to address the credibility crisis in scientific research pointed out by Donoho and many others. Whereas it is certainly difficult to reach full reproducible research in some disciplines (as in Biology) or even impossible to repeat experiments (such as the observation of gravitational waves and other singular events in Physics), in computational sciences most of the time there is no special impediment to implement reproducible methods.
The editors and system designers behind IPOL learned from the experience gained after several years of existence of the journal. We identified weaknesses in the demo system architecture, in the review procedures, among others. Indeed, IPOL is a challenging endeavour, mainly because of the way we understand what a publication is. It is no longer just an article, but the combination of article, its source code, and any associated data, all of them as an indivisible whole.
There are almost no journals equivalent to IPOL (although some initiatives which share some of the same principles do exist), thus making it even more challenging to know which direction to follow due to the lack of clear references. We could say that IPOL is one of the pioneers in publishing reproducible strictly peer-reviewed articles with source code.
IPOL started as a journal focused only on image processing algorithms, then it expanded to video processing, and lately to more general signal-processing methods (including physiological signals), and eventually has widened to general artificial intelligence methods, including deep neural networks.
The benefit of IPOL is not only for the authors, but the scientific community has now a large corpus or rigorously peer-reviewed methods available at no cost. The focus on reproducibility is fundamental for some methods, as for example those providing detections on image falsification. Scientific police services and journalist agencies have a special interest in reproducible and explainable detection methods such as the ones published in IPOL. We have discussed in this article other important applications, such as the algorithms with biomedical applications already published in IPOL.
IPOL will keep publishing reproducible articles and offering a valuable service to society, in particular to the scientific and industrial communities. To do so, the demo system will be adapted to the needs of the demos, as we are doing recently with neural network methods. There is a clear interest in reproducible and explainable methods on artificial intelligence, and thus this is a direction which IPOL will most probably follow in the short term. Sister platforms such as OVD-SaaS (a project currently under development now at Centre Borelli, not yet released) will complement IPOL for industrial artificial intelligence methods.
Reproducible research is an absolute need to perform reliable science and to ensure that the conclusions that will become part of the literature can be checked, disproven, and completed by other researchers. Nanos gigantium humeris insidentes.