GARDIAN, the first pan-CGIAR search engine for agricultural data, is intended as an important step toward bringing together valuable scientific knowledge, generated across the CGIAR network and beyond. The big data platform launched the latest version of GARDIAN in June 2019. The platform embraces the power of big data analytics, supporting CGIAR as it becomes a leader in generating actionable data-driven insights. It shall build capacity to generate and manage big data, and is to comply with open access/open data principles to unlock important research and datasets. It also shall empower researchers to strengthen data analytical capacity, developing practical big data tools and services in a coordinated way, and it shall address critical gaps, both organisational and technical, expanding the horizon of CGIAR research. The platform is co-led by the International Center for Tropical Agriculture (CIAT) and the International Food Policy Research Institute (IFPRI).
“For data to be truly valuable, it has to be reusable. This is why we want to shift the conversation from discussing open data to why data also needs to be FAIR,” emphasises Medha Devare, author and architect of the GARDIAN. “For example, you could have a PDF of an open dataset uploaded into a repository, but the question is what can you really do with that data? For that data to really power innovation, and to aggregate with other interesting or relevant datasets, you need it to be findable, accessible, interoperable and reusable. You need that data to be FAIR,” Devare explains.
For a resource to be findable, it needs persistent identifiers, rich metadata and good documentation. To be accessible, it has to have a clear usage license – ideally not restrictive – and provide access to both the metadata and the physical file. To be interoperable, it should include industry standards for metadata and data which involves using controlled vocabularies and semantic standards. Such standards enable the use of the same ‘language’ across different information resources, allowing for interpretation and aggregation — at both a human and machine level. Once a resource has achieved a composite of all of these value dimensions, it can be considered reusable – and FAIR.
CGIAR Centers typically have two separate repositories: one for data and one for publications. Generally, the repositories are on different platforms that don’t speak to each other. GARDIAN shall allow users to quickly and easily find agricultural information across the 30-odd data and publications platforms of the 15 CGIAR Centers and 11 gene banks. It currently points to approximately 100,000 publications and 3,000 data sets. The tool shall soon enable the discovery of resources beyond CGIAR as well.
“We needed a way for people to search across CGIAR Centers and beyond, using single or multiple keywords – such as gender, nutrition, or drought-tolerant maize – to identify resources that exist for that topic,” explains Devare.