1. Introduction
According to data from the Brazilian Ministry of Health (MoH) (Ministério da Saúde - Brazilian term) and the National Cancer Institute (INCA - Brazilian term - Instituto Nacional do Cancer), the number of deaths caused by neoplasms is among the three main causes of death in Brazil (Bray et al., 2018; INCA, 2019). In the world, according to the International Union for Cancer Control (UICC), in the year 2020, there were about 15 million new cases of cancer, leading to 12 million deaths worldwide ((Bray et al., 2018; INCA, 2019).
Currently available anticancer therapies include chemotherapeutic agents, biologics, molecular target therapy, radiotherapy, surgery, and interventional oncology (Ramos et al., 2021; Sag et al., 2016). In addition to therapy with drugs that act directly against various types of cancer, other drugs are used to minimize the toxicities caused by anticancer drugs: antiemetics, urinary protectors, corticoids and intravenous hydration, including, when indicated, red blood cell and platelet transfusions, antibiotics and growth factors (Matz & Hsieh, 2017; Oun et al., 2018).
The handling of injectable drugs is an activity of great importance and complexity developed by the pharmaceutical sector, since these drugs are not commercially available, with the possibility of customizing the preparation to meet the specific needs of the patient: individualization of the dose, adequacy of the formulation to the route of administration, adding components to the formulation and choosing the type of diluent volume appropriate to the patient's clinical condition (ASHP, 2014; Mohiuddin, 2020; Pergolizzi Jr et al., 2013). One of the great challenges of chemotherapy is the selective delivery of drugs to tumour cells with minimal interaction with healthy tissues (Webster et al., 2014)
In recent decades, due to the enormous amount of information constantly generated in the most diverse contexts of society, information has come to be classified as a strategic resource with price, cost and value (Valentim et al., 2006). In the business environment, information is considered the most important input in the decision-making process, and it is necessary to analyse information from various aspects: administrative, economic, technical-scientific, marketing, legal, environmental and political (Contador, 2010).
Information has been gaining increased relevance both in public and private organizations, which is why it is necessary to work on the Information Management process also in the sphere of public health organizations, which over the years have sought to develop, innovate and improve their services. in the face of the new society in which they find themselves. Thus, information should be considered as a fundamental part of the work practices of public health organizations and in decision-making processes (Santos & Damian, 2019).
To process the large volume of Big Data at high speed, the traditional methods previously used in the health area have shown low applicability, reinforcing the need for facilitating tools for the analysis of this data (Galvão & Valentim, 2019). An example of Big Data is the Brazilian scientific repository, the CNPq Lattes Platform, which integrates the academic curricula of professionals from all areas of knowledge, both Brazilians and foreign researchers who at some point developed projects in partnership with Brazilians researchers (A. G. C. de Brito et al., 2016). This information has been essential for technical-scientific strategies in Brazil, as they make it possible to study and reflect on scientific and technological advances in all areas of Science, Technology and Innovation.
Since the resume extraction system on the Lattes Platform occurs through the individual search by name of the person registered on the platform, a computational tool called ScriptLattes was developed, which allows the extraction and recovery of several data simultaneously (A. G. C. de Brito et al., 2016; Magalhães et al., 2020; Mena-Chalco & Junior, 2009). This program is open source based and presents, organizes and presents various information from the curricula (CVs) of registered researchers such as scholarly productions, guidelines, collaborations, geolocation, among others (Ferraz et al., 2018).
ScriptLattes has been presenting advanced results in the management of Big Data information in different areas of knowledge. Several studies using ScriptLattes are found in the Scielo, Web of Science and Google Scholar databases, such as: i) the analysis of the scientific production of researchers in neglected tropical diseases, nanotechnology, pediatrics, philosophy, physical education, genetics and reproduction freshwater fish; ii) evaluate the evolution of scientific production and collaboration networks in Brazilian regions; iii) in the analysis of the scientific production of Brazilian graduate programs, among others.(Alves et al., 2016; Fernandes & Silva, 2018; Ferraz et al., 2014, 2019; Iriart & Trad, 2020; Klepa & Pedroso, 2019; Nigro et al., 2015; Pedroso et al., 2016; Silva & Viegas, 2017; Sobral et al., 2020).
ScriptLattes is a specific open-source tool, and, in this context, the references are exclusive to the scientific and technological environment. In this context, its use, testing and academic reflections aim at constant and open construction together with its developers. In this sense, projects were successfully executed, highlighting opportunities and challenges both in the Lattes base and in the constant improvement of ScriptLattes.
In this sense, the scientific literature shows the use of ScriptLattes to analyse the curricula of researchers in the field of oncology (Nascimento, 2020; Nascimento & Gouveia, 2021). ScriptLattes in the specific field of injectable oncology, the present study was carried out to evaluate the feasibility of using the Lattes Platform and the ScriptLattes program in the management of Big Data technologies in the area in question, in the which the authors of this article decided to identify and retrieve the CVs of senior specialists registered on the platform in Brazilian territory. The knowledge generated with the ScriptLattes analyses will make it possible to analyse different aspects of Brazilian science, both from a micro perspective (individual researcher and groups of researchers) and macro (collaboration network or representativeness of the area), thus allowing to identify or validate patterns of activities academic, which will result in bibliometric information about the group of researchers in question(A. G. C. D. Brito et al., 2016; Nicholson, 2006).
2. Materials and Methods
The research was based on scientometrics and bibliometrics of shared open data in science. The free open-source program ScriptLattes in a Linux environment was used to extract the open data on the Brazilian Lattes platform. Among the filters available on the Lattes platform for data extraction, the ones listed below were selected in order to identify researchers with greater academic experience:
a) academic level (doctorate);
b) researcher’s nationality (Brazilian and foreign);
c) CNPq productivity fellows (all categories);
d) presence in the directories of Brazilian Research Groups (DGP, https://lattes.cnpq.br/web/dgp).
In order to preliminarily identify the number of CVs available in each search strategy, in May 2022, searches were carried out in the “advanced” field of the Lattes Platform using terms in both English and Portuguese to assemble a search string: “(oncologic AND injectable), (oncology AND injectable), (cancer AND injectables), (cancer AND injectables), (antineoplastics AND injectables), (injectable AND neoplasm), (injectables AND neoplasia)”. Filters associated with the category “CNPq Productivity Fellows” and “Presence in the Directory of Research Groups” were also applied. Identification and extraction took place in March 2022. Each specialist was categorized by the Lattes ID, a 16-digit identification code for the CVs available on the platform (Mena-Chalco et al., 2014). To be able to use Scriptlattes, it was necessary to create a list of these CVs, listing a list with the ID number of each researcher on the Lattes platform (Mena-Chalco & Junior, 2009). The data were grouped, organized, processed and later made available on the web in HTML format. The variables analyzed in this work were scientific productions (articles published in journals), research projects, events, guidelines and geolocation data.
Social network analyzes (SAR) were performed with the aid of free open-source software Gephi®, which allows visualization and exploration for all types of graphs and networks. The software in question allows easy and broad access to network data, enabling the import, visualization, filtering, browsing and grouping of data (clustering) (Bastian et al., 2009). Social Network Analyzes (SAR) focused on scientific collaboration allow the analysis of several indicators, such as co-authorship networks. These networks are formed when two or more researchers publish work together. The SAR has been used more and more in the scientific field, since they allow identifying and understanding skills, interests, demands and gaps present in the various fields of knowledge. These networks can be identified through co-written documents (Ferreira, 2009; Hayashi et al., 2012). In this type of graph, the size of the vertex represents the number of connections of each researcher, while the color represents one of the communities identified by their form of collaboration.
Two metrics were analyzed with the help of Gephi®: modularity and degree centrality. The degree centrality will indicate which are the authors who collaborated the most, publishing jointly with other authors, taking into account the number of co-authors who collaborated with a given author, together with the number of publications that they made in partnership (Bordin et al., 2014). In complex networks, modularity is one of the metrics used to detect communities, groups or clusters. Modularity is a measure of the network as a whole, dividing the network into communities, according to the strength of the connections between the various vertices. Vertices more connected to each other than to the others are included in the same group. In this way, it can be inferred that a complex network with a high degree of modularity has a strong community structure, that is, the community vertices have a consistent connection with each other and a sparse connection with other communities (Vincenzo, 2008).
3. Results and Discussion
Table 1 presents the results obtained on the Lattes Platform using the search string cited in the methodology. Comparing the results obtained with the same search terms in English and Portuguese, a greater number of CVs in Portuguese was observed in all cases. A possible explanation is due to the fact that most of the CVs registered on the Lattes Platform are from Brazilian researchers, in which they index their information on the platform in the Portuguese language.
As can be seen in the table, when applying the filter “Brazilian and foreign doctors”, the number of CVs decreased substantially, reaching in some cases up to 1/4 of those found initially. It was also observed that when adding the filter “CNPq Productivity Scholars,” again there was a large decrease in relation to the previous filter. By adding the filter “Presence in the Directory of Search Groups”, the number of resumes remains practically constant in all searched terms. Another fact observed is that when using the search terms “cancer,” “oncology” and “antineoplastics” with the term “injectables”, joined by the Boolean operator “AND”, the same results were obtained in all cases.
Among the Boolean expressions used, “injectables AND neoplasia” was the one that presented the best results. In this way, it was decided to analyse, with the help of the ScriptLattes program, the curricula of senior researchers using the search filters “Brazilian and foreign doctors with any level of productivity scholarship and presence in the GP directory”, adding a total of 535 resumes. The results with scientific and technological information on the competencies listed in the objective in full are available at the following URL address:
https://pesquisa.ufabc.edu.br/cientometria/oncologicos_injetaveis/
Below, in Figure 1, the main page generated by ScripLattes in HTML format can be seen. The header of the page shows the links in which you can obtain relevant information about the experts analyzed, such as bibliographic production, technical production, guidelines, projects, among others.
The results obtained shows a large number of publications in the studied area. The bibliographical production of specialists concerns: papers, books, book chapters, among others. The amount identified and extracted were 264,332 products. It should be noted that, in case the analyzed researchers have carried out a publication in partnership, ScriptLattes will extract the single publications of the analyzed group, that is, it will not duplicate the information. Therefore, there may be some publications and/or technical products, which appear simultaneously listed in other senior specialists, who were listed here according to the methodology of this work It is worth noting, once again, that ScriptLattes is constantly being updated, just like any other IT tool. However, it does not invalidate the relevance of the core competences in the area of injectable oncology identified here.
Figure 2 shows the evolution of the number of papers published by experts in the field since 1981. The published total was 85,465. It should be noted a substantial increase in 2020, where only in this year was published a total of 5,234 manuscript. In addition, it is possible to access the production by year on this page to have the list of publications sorted alphabetically by the title of the published article. For further details of the scientific production of each specialist, it is possible to access the work on the Website: https://pesquisa.ufabc.edu.br/cientometria/oncologicos_injetaveis/PB-0.html
Regarding the technical production of the researchers analyzed with ScriptLattes, there have been increases and decreases over the years, adding up to a total of 24,068 items, among technological products (316), processes or techniques (421), technical works (11,186) and other types of technical production (12,145) (Figure 3). The first technical productions date back to 1982, with a total of 13 productions. Again, the year 2020 stands out in relation to the rest, adding up to a total of 1,331 technical productions.
In the “Orientations” (advisors) menu, it is possible to observe the orientations and supervisions in progress and concluded by these researchers, including work orientations for conclusion of graduation and specialization course (TCC), scientific initiation, masters, doctorate, postdoctoral and research orientations of another nature. When the searches were carried out on the Lattes Platform, there were a total of 4,999 supervisions/guidance in progress and 56,595 completed (Figure 4a). There is an increase in the number of completed orientations over the years, with a maximum in 2016 (3,267), followed by a decrease in the following years. Among the orientations in progress, there is a greater number of orientations for doctoral students (2,152) while in the completed orientations, a greater number of orientations for scientific initiation students (15,446) are observed (Figure 4b). The first completed guidelines among the analyzed researchers date back to 1982, totaling 33 guidelines.
The results obtained show a large number of events held over the years, adding a total of 7,687 between the years 1984 and 2022. There is an increase in events year after year, with emphasis on the year 2011, with a total of 175 events. As of 2019, there is a significant drop in the number of events, decreasing from 168 to 116 events. This fact may be associated with the Covid-19 pandemic, which was first reported in December 2019 in China. In this way, many events were canceled or started to be held remotely through videoconferencing platforms such as Google Meet®, Microsoft Teams®, Zoom®, Cisco Webex® e GoToMeeting® (Agbehadji et al., 2020; Esakandari et al., 2020).
Regarding the projects implemented by the analyzed researchers, a total of 13,496 were found in the analyzed period. There is an increase in the number of projects implemented from 1983 (8) to 2010 (877), followed by a consecutive drop in the following years. In the year 2021, only 221 projects had been implemented by these researchers. One of the possible explanations is due to the decrease in investments in education in Brazil between 2014 and 2018, which decreased by around 56% and in the current government suffers even more expressive cuts (Mazieiro, 2019; SENADO NOTÍCIAS, 2020). To get an idea of the size of the cuts, in October of that year (2022), the current government blocked R$2.4 billion from the Ministry of Education (MEC) budget, impacting the activities of the ministry, which include federal institutes of education and universities (REVISTA GALILEU, 2022; Saldaña, 2022)
The geolocation map of senior injectable oncology specialists can be identified in Figure 5. The green symbols on the map identify where each researcher works, that is, their professional address reported in their CV (Magalhães et al., 2020). By clicking on this symbol, it is possible to see relevant information about that researcher, such as his name, the university or institute where he works and the address to access the Lattes curriculum. As can be seen in the graph, there is a greater concentration of these specialists in the southeastern region of the country, with emphasis on the state of São Paulo, which is the Brazilian state with the highest absolute number of public universities, present in at least 25 cities (FIA, 2019; Jornal da USP, 2019).
In addition to the analyzes presented above, the Scriptlattes tool enables the graphical visualization of social networks through other computational tools such as Treecloud®, Gephi®, Cowo® and VOSviewer®, already coupled to Scriptlattes (Quoniam, Ferraz, & Alvares, 2014). The academic collaboration between the selected researchers was identified based on the publications carried out in co-authorship between them (Figures 6 and 7).
Figure 6 shows the collaboration (co-authorship in articles) network, in which the vertices or nodes (535) represent the researchers and the edges or links (2,213) represent the co-authorship between the vertices. Both the dimensions of the vertices and their color are associated with the number of connections for each researcher. The larger the size of the vertex and the darker it is, the more connected a researcher will be in the formed network. To extract the information from Figure 6, the “degree centrality” metric was used, which corresponds to the number of incident edges or the number of vertices adjacent to it (Giordano et al., 2015). The most connected vertex in the network has a degree centrality equal to 37, that is, it collaborates directly with 37 researchers identified in the context of this research. The researcher in question is Dr. Fernando de Queiroz Cunha, biologist and professor at the Faculdade de Medicina de Ribeirão Preto (Universidade de São Paulo - USP), CNPq Research Productivity Scholar (Level 1A) and bibliographic production equal to 893 items. The second best-connected vertex has a degree centrality equal to 36 and corresponds to Dr. Geovanni Dantas Cassali, veterinarian and full professor at Universidade Federal de Minas Gerais (UFMG), CNPq Research Productivity Scholar (Level 1A) and bibliographic production equal to 688 items.
For the analysis of Figure 7, the Modularity algorithm was used to determine communities or clusters based on their topology or network configuration. To facilitate the visualization of the communities formed in the collaboration network, it was decided to analyze those in which the researchers had a centrality of degree greater than or equal to 13, resulting in 101 researchers present in 11 communities.
Thus, it can be inferred that 101 researchers collaborate with at least 13 others (Figure 7). Despite having other communities with greater modularity, they were not represented in the graph due to the chosen parameters. The largest community formed in the network is composed of 28 people, represented in orange, with a modularity equal to 1. The second largest group is composed of 21 people, represented in green and modularity equal to 11. The smallest group is composed of by only 7 people, represented in light green color and modularity equal to 3. Researchers with larger vertex sizes in communities are research leaders in their areas and develop a high number of collaborative research.
4. Conclusions
It is no longer news that humanity is living in the information age and having as a challenge the management of the knowledge generated every day. The exponential volume of data generated both in the academy and in organizations, provide conditions never seen and dealt with in all history. Therefore, identifying, extracting and treating confidential or otherwise data has been the great challenge for companies, academies and decision makers for the formulation of strategies in their organizations.
In this sense, this work contributed to a reflection of this context for public health. Through the analysis carried out, it was possible to obtain essential and reliable information from specialists in injectable oncology in Brazil. The information made available in the form of graphs facilitates and enables those interested in the subject to have a panoramic view of how research into injectable oncology has been carried out in the country over the years, as well as making it possible to obtain indicators on some aspects of science in Brazil. Micro (individual researcher and groups of researchers) and macro (collaboration networks or representativeness of the area) analysis were carried out, showing what is been developed and published in Brazil in the areas of R, D&I.
With regard to the spatial location of specialists in injectable oncology, it is possible to state that they are mainly concentrated in the southeast region (State of São Paulo). The contrast with the other states is clearly noted. This information obtained suggests that this disparity in the location of specialists is linked to the number of larger universities in the Southeast region, as well as the economic strength of the country in the region.
Nevertheless, it is worth highlighting the contribution of the analysis of co-authorship networks. A better understanding of these academic collaboration networks can help in the process of formulating S, T&I (Science, Technology and Innovation) policies, since they favor the aid of policies for the regional deconcentration of research centers in the area and/or the concession of research grants to the most economically disadvantaged regions of Brazil.
As observed in other studies carried out with ScriptLattes, the presence of a specialist in the analyzed topic is essential, since he will have the expertise both to formulate the curriculum search strategy and to perform a critical analysis of the results provided by the crawler.
Regarding the problems faced by ScriptLattes in data extraction is due to the indexing of information by researchers in the Lattes Platform, which makes it difficult for the program to retrieve information. Although the tool allows the extraction of a large volume of data, this process ends up being imprecise, since it requires human intervention to update the intellectual production data of the specialists in the Lattes Platform. This fact often means that the information is out of date.
Another limitation from the Lattes database is that all the information obtained can be considered 100% as "core competencies" of the researchers of the present study. This fact occurs, since the "Boolean expression" used for the identification and extraction of data, they appear in the curriculum in a general way (loose and spaced throughout the curriculum), but not necessarily in the same sentence of the work and/or keyword. New versions of the ScriptLattes software should seek a solution to this limitation.
Due to this research limitation, a proposal to solve the problem would be the normalization of indexing in the Lattes Platform, that is; that there was a standardization for the indexing of words by specialists, as well as for institutions to carry out their registrations in the Lattes database.
Therefore, it is recommended that administrators of the Lattes Platform, namely the National Council for Scientific and Technological Development (CNPq), of the Ministry of Science, Technology and Innovation, seek solutions for the standardization of indexing in the Lattes database. This action will increase strategic reliability for planning, management and policy formulation activities in the area of science, technology and innovation.
Thus, it can be inferred that the results obtained with ScriptLattes can help in the information management in the field of public health, particularly injectable oncology, area focused on this study, facilitating and optimizing the use of information in daily actions and serving as a support tool in decision-making processes.
With regard to Health Management, the information generated using ScritpLattes will also enable funding agencies in the country to monitor the productivity of research in progress, as well as predict the future results of this study. Furthermore, based on the methodology used in this work, similar studies should be carried out, contributing to the theoretical framework of scientific and technological knowledge in the country.