1. Introduction
The European Water Framework Directive (WFD) instructed all member states to achieve good ecological status in all water bodies [
1]. The process required the implementation of monitoring programs to assess the ecological integrity of each water body in a continuous manner. Within this context, bioassessment became relevant in ecological monitoring, and substantial effort was put into developing multimetric indices that would best express human pressures on ecosystems [
2].
Bioassessment programs need qualified expertise to make key decisions in the field and to identify organisms in the laboratory. A rather interesting way to circumvent the low number of experts in ecological monitoring has been the development of citizen science (CS), that is, the science done in conjunction with volunteers, under the direction of professionals and scientific institutions [
3]. In addition to the benefits for science, CS produces important societal benefits through innovative thinking and by improving: (a) societal conditions through scientific outreach; (b) education for the promotion of conservation [
4]; and (c) relationships among institutional actors, bringing together individuals from diverse backgrounds [
5].
CS activities involving local people in volunteer-supported and sustainable monitoring programs is a common practice [
6,
7,
8,
9,
10,
11,
12]. For example, in Canada, both federal and provincial governments have initiated aquatic biomonitoring networks such as the Ontario Benthic Biomonitoring Network (OBBN), accessible to both volunteers and scientists [
3]. In general, CS has been used in hydrological observations and/or in the monitoring of surface water quality in all different continents of the globe. This is the case of Europe [
13], North America [
14], Central America [
15], South America [
11], Oceania [
16], Asia [
17], and Africa [
18], demonstrating that CS is a low-cost and crucial tool in raising awareness on the importance of good water quality [
13], and important in decision-making instances [
19]. When educators are involved in CS, the benefits for school environmental education are multiple [
20,
21]. The improvement of student learning skills and the professional development of science teachers through engagement in scientific inquiry and research, are two of such benefits. These ultimately improve literacy in Science, Technology, Engineering, and Mathematics (STEM). STEM is defined as the ability to read and/or write science texts, tables, and graphics, and the derived skill to apply scientific knowledge [
22].
The integration of results coming from CS volunteer programs as a part of the National Classification System (NCS) must be used carefully, because potential conflicts might occur between environmental authorities and volunteers [
23]. As an attempt to prevent such conflicts, certain compromises are required [
24], for example: (a) water authorities need to have valuable and precise results to support management actions, but volunteers prefer more friendly tools that are not necessarily precise; (b) water authorities desire results to accomplish legislative requirements, whilst volunteers are most likely driven by self-interest (hobbies, leisurely pursuits) [
20,
25]. Thus, a participative balance has to be reached in choosing the sites to be monitored [
24], and an equilibrium must be set between friendly tools and realistic results, quick results and valuable data, and inexpensive and sufficiently precise results. The challenge is to come up with the friendliest tool for volunteers to produce useful information for management proposals. For this, water authorities should know how volunteer results differ from expert results, because any metric of ecological quality is of little use without an understanding of the uncertainties in its estimation [
26,
27,
28,
29].
To develop a supported and sustainable volunteer monitoring program, it is necessary that volunteers have grasped the basic scientific concepts through adequate literature or through training courses [
30]. To prevent errors, sampling and laboratory protocols need to be simple and standardized. Finally, an expert audit should be implemented to assess the quality of the results, detecting main gaps and improvements to be made, thus increasing the efficiency of the process.
Macroinvertebrates are probably the most popular biological quality element evaluated by volunteers on lotic ecosystems [
29]. Macroinvertebrates are easy to collect with simple equipment and to identify with the naked eye in the field. They colonize all aquatic habitats and they have a great diverse sensitivity to physical, chemical, organic, and morphological pressures. However, the general metrics and multimetric indices developed to assess ecological status based on macroinvertebrates require counting and identification of, at least, all sampled families or even the identification of some species [
2]. The amount of work and time required for these identifications could be annoying to volunteers, increasing the risk of low accuracy and desistance. Thus, to maintain the interest of the volunteers, it is necessary to simplify the metrics through a reduction of the taxonomic detail [
29] and to reduce the counting process, thus obtaining quick results but maintaining the accuracy.
The starting point of the present study was a set of tools developed by the European project on Conservation and Sustainable Development of Freshwater Ecosystems (CONFRESH,
https://www.nhmc.uoc.gr/en/museum/programs/1814) to assess stream ecological quality based on macroinvertebrate communities. These tools were applied in a regional project conducted by the Regional Hydrographic Administration of Algarve of the Portuguese Environmental Agency (APA-ARHAlg). The objective was to evaluate the accuracy of the results obtained by students from 5th to 12th grades (volunteers), during extracurricular activities supervised by their teachers and following a participatory/contributory model of participation [
31]. The following steps were used for this activity: (1) APA-ARHAlg selected study sites, (2) teachers were taught skills on stream ecology and monitoring during courses given by experts, (3) teachers and their students (volunteers) collected and examined the samples and determined on their own the ecological class of each sampling site, and (4) experts on macroinvertebrate taxonomy audited the results. During the audit procedure, experts reanalyzed samples and quantified the differences between the two analyses, expressing them as the accuracy of the results obtained by volunteers. It was expected, as a hypothesis, that the characteristics of volunteers (age, number of students, and number of teachers) could influence the accuracy of the results and, in opposition, that this accuracy is independent from the ecological quality.
We intended to test if simple tolerance metrics based on the reduction of the more than 120 macroinvertebrate families to just 18 surrogates (Class, Order, and Family), as proposed by the CONFRESH project, did not significantly compromise the quality of the biological assessment when compared to the results of the National Classification System of Portugal. Thus, the aim of this paper is to propose a simple Metric of Aquatic Invertebrates for Volunteers (MAIV) through a reduction of the taxonomic detail, eliminating the counting process but maintaining the accuracy. We propose MAIV as an upgrade of the metric calculated by the CONFRESH project, since its score is the sum of tolerance scores of all surrogates (summative metric) and not only the tolerance score of the most sensible surrogate (nonsummative metric), thus increasing accuracy.
4. Discussion
Evaluating the accuracy that volunteers have in determining the ecological quality class of a site using macroinvertebrates comprises four main components: (1) sampling, (2) sorting, (3) identification, and (4) acceptability of the final results (assignation to an ecological quality class). The sampling procedure, if well standardized, seems not to be a significant source of error [
28]. Sorting seems to be the most interesting phase to volunteers. The curiosity stimulates the demand of new (different) macroinvertebrates and a constant attention is devoted to this task, this being a possible reason for the absence of gains on the sorting audit. The most important source of error occurred during identification, leading to some gains and to a few losses. Gains probably resulted from two different sources: (1) error on the identification of morphologically similar surrogates (e.g., Ephemeroptera and Plecoptera, Diptera and some larvae of Coleoptera, or Trichoptera) and (2) confusion with grains of gravel (e.g., Gastropoda) or small plant pieces (e.g., Oligochaeta). In any case, mean gains per sample were not too high. Thus, it can be concluded that these results are acceptable and can probably be enhanced with more experience and adequate materials to support the identification process [
39].
Results showed independence of gains and losses from the characteristics of the groups of volunteers, thereby rejecting the initial hypothesis. Then, the adopted procedure of sorting and identification can be applied to a wide universe of possible volunteers. However, number of gains and losses seem to be dependent on human pressures, thus rejecting the hypothesis of independence. The most degraded sites tend to be dominated by a small number of more tolerant taxa [
40], making it difficult to single out the few different ones; therefore, more gains were detected (positive significant correlation,
Table 5). In contrast, sites with better quality tend to have richer communities [
40], and individuals belonging to the same surrogate can be distributed by several surrogates. However, those gains and losses did not affect the ecological quality assessment, because the number of correct classifications (the same between primary and audited analyses) outweighed the incorrect ones, a fact that confirms other studies done with experts [
37] and nonexperts [
29,
41].
Previous studies also assessed the reliability of stream monitoring by volunteers in relation to professionals (e.g., [
6,
16,
26,
27,
29,
42,
43]) and most of them concluded that, with appropriate resourcing and robust protocols, volunteer data closely agree with the professional data used by government reporting and decision making. High correlations were also obtained between primary and audit analyses in New Zealand for %EPT—Ephemeroptera, Plecoptera and Trichoptera [
16] and in the U.S.A. by the volunteer programs in Virginia [
43] and Seattle [
26].
Although significant correlations were obtained between scores of the four tested metrics and the NCS, the three summative metrics gave better results when the quality classes were compared. This confirms the hypothesis of a better performance of the summative metrics. Generally, the NS tended to overevaluate the ecological assessment because it reduces information to only the most sensitive taxa, independently from community composition. The summative metrics accounting for the tolerance scores of all the surrogates present were found to balance taxa richness and tolerances. Reducing the five quality classes of the WFD to two classes, separated by the good–moderate boundary, showed that the number of correct classifications increased, making these summative metrics more acceptable to classify water bodies that are below the good quality status (the environmental target of the WFD).
The accuracy of the volunteer procedure (difference between primary and audited analyses) is very similar to the tested metrics (summative and nonsummative) when five ecological quality classes are established. However, if only two quality classes are established (boundary good/moderate), the accuracy of the summative metrics is higher than the nonsummative one.
Since the results obtained by S1, S2, and S3 were very similar and more accurate than those obtained by NS, S2 was selected to be the Metric of Aquatic Invertebrates for Volunteers (MAIV) for two main reasons: (1) the tolerance scores of the surrogates are based on published results [
38] and (2) the scores of all surrogates included in each of the four groups of tolerance are equal, thus, easier to use. A detailed calculation protocol of the MAIV is provided in
Tables S4 and S5.
The MAIV proved to be adequate for volunteers, because it is based on a set of a few surrogates easily recognized by nonexpert people, saving time when compared to other volunteer monitoring experiences in which the identification is done to the family level [
26]. In addition, its application is independent from the characteristics of the volunteers, and the determination of the ecological quality class is simple and quick, keeping volunteers interested. However, the use of MAIV, to complement the NCS raised some concerns that need to be pointed out. In fact, MAIV reduces the assessment to one tolerance metric, while the NCS [
35] is based on an index composed by metrics of tolerance, composition, and diversity. This is the reason for the lower accuracy of MAIV in arranging water bodies into five ecological quality classes. However, MAIV still is a good tool for the detection of water bodies that are below the good ecological quality status (the WFD environmental target).
The use of single volunteer data (MAIV classifications) for management purposes can be risky, due to the lower precision. However, its use integrated with a set of samples to detect geographical and/or temporal environmental patterns can be an important complement for water authorities, as already pointed out by Deutsch et al. [
9], saving time and costs [
30,
44,
45]. Thus, to prevent incorrect management decisions based on MAIV, it is advisable to keep an interval of 5% of uncertainty around the good–moderate boundary (scores 41 to 49).
The results obtained in the present exercise show that MAIV can be applied to different Portuguese river types with the same boundaries, with acceptable results, like the south limestone and south small rivers types. But no definitive conclusions can be reached concerning the other two river types, due to the reduced number of sites used in the exercise. However, if the MAIV will be used in a first step by volunteers, it had to be kept simple. Different boundaries for different river types could introduce a complexity not important for the objective of MAIV. This latter metric could provide an extensive coverage of river assessments, complementing the NCS. Thus, the obtained results by MAIV could act as a warning of human pressures on river sites that are not regularly assessed by the NCS, indicating where more detailed monitoring or management actions could be needed.
The involvement of the educational community in the training courses proved to be quite effective. Teachers had the chance to develop their professional profile by improving the STEM literacy, through theoretical lectures that related to practical aspects (i.e., field and laboratory practical lectures) on conservation and sustainability of freshwater ecosystems. Some other benefits of this study were student citizenship, connection of knowledge, awareness and behavior, and partnering student biomonitoring research with community needs, as well as those of stakeholders and policy makers. In general terms, volunteers are strongly encouraged by interaction with scientists, and identified learning as one of the main rewards of taking part in CS projects, as documented by other studies [
46,
47,
48]. The training procedure confirmed that SC, providing only theoretical information, is insufficient to elicit the required behavioral changes, with the training on identification being a key factor to the success of volunteer monitoring programs [
27,
49]. But identification should be complemented with other tools [
11,
42,
43] such as simplified identification keys [
27]. Hence, if it is desired to elicit new perspectives on how the behavior of individuals connect them to services provided by their ecosystem, and if the premise that “individuals act ethically as an integral part of an interconnected society and biosphere” is to be accomplished, it should be reached through personal belief rather than rational understanding [
50].
Although there has been an increase in the number of studies on environmental monitoring research, precise tools to measure data quality are still scarce [
25,
51], compared to other volunteer practices, such as those in the health sector, for example [
25]. More effort is required to evaluate data quality and to adjust more specific and simple methodologies, this to give credibility to volunteer data and enhance the effect on environmental community education [
25]. A compromise is needed, between less uncertainties in environmental data and environmental community education, supporting volunteers in a balanced way.
The dimension of the regional project reported herein, involving six teacher training centers, is too large to be repeated, mainly due to the limited available resources (time and budget). However, the skills obtained by citizens was the seed for the creation of several small groups of volunteers (environmental school clubs involving senior citizens) that are being trained specifically in sampling procedures and identification; a much lower effort than organizing six entire training courses.