World Meteorological Organization

Date: 2022-06-15

Version: 0.1.0

Document location: TBD

Document status: DRAFT

Task Team on WIS Metadata (TT-WISMD)[1]

Expert Team on Metadata Standards (ET-Metadata)[2]

Standing Committee on Information Management and Technology (SC-IMT)[3]

Commission for Observation, Infrastructure and Information Systems (INFCOM)[4]

Copyright © 2021 World Meteorological Organization (WMO)

1. Subject

The subject of this document is the WMO WIS 2.0 Discovery Metadata exchange, harvesting and search pilot project. The document provides an overview of use cases, requirements, standards, architecture, testing and recommendations in support of modernizing WIS 2.0 metadata and search workflows.

2. Executive Summary

TODO

2.1. Contributors

All questions regarding this document should be directed to the editor or the contributors:

Name

Affiliation

Tom Kralidis (editor)

Meteorological Service of Canada (MSC)

Jeremy Tandy

UK Met Office

Steve Olson

National Oceanographic and Atmospheric Administration (NOAA)

Douglas Fils

Consortium for Ocean Leadership

MetOcean Domain Working Group

Open Geospatial Consortium

3. References

  • OGC: OGC 20-004, OGC API - Records - Part 1: Core 1.0 (2021) [5]

  • IETF: RFC-7946 The GeoJSON Format (2016) [6]

4. Terms and definitions

4.1. Abbreviated terms

Table 1. Symbols and abbreviated terms
Abbreviation Term

AMQP

Advanced Message Queuing Protocol

API

Application Programming Interface

DCAT

Data Catalog Vocabulary

DCPC

Data Collection and Production Centres

GIS

Geographic Information System

GISC

Global Information System Centre

GTS

Global Telecommunication System

HTML

Hypertext Markup Language

HTTP

Hypertext Transfer Protocol

HTTPS

Hypertext Transfer Protocol Secure

ISO

Internatioal Organization for Standardization

JSON

JavaScript Object Notation

MQTT

Message Queuing Telemetry Transport

NC

National Centre

NWP

Numerical Weather Prediction

OAI-PMH

Open Archives Initiative Protocol for Metadata Harvesting

OARec

OGC API - Records

OGC

Open Geospatial Consortium

REST

Representational State Transfer

ROA

Resource-oriented architecture

S3

Simple Storage Service

SEO

Search engine optimization

SOA

Service-oriented architecture

SRU

Search/Retrieval via URL

STAC

SpatioTemporal Asset Catalog

URL

Uniform Resource Locator

W3C

World Wide Web Consortium

WCMP

WMO Core Metadata Profile

WIS

WMO Information System

WMO

World Meteorological Organization

XML

eXtensible Markup Language

5. Introduction

WIS 1.0 discovery is primarily comprised of WMO Core Metadata Profile, OAI-PMH for harvesting and SRU for search.

Current realities of the interfaces and encodings include:

  • use of XML for metadata description and utilization in web applications

  • based on an era of service-oriented architecture

  • overloading of web architecture principles

    • using HTTP as a tunnel

    • little to no use of HTTP status codes

    • large, monolithic standards and systems

    • not "of the web" or "webby"

    • challenging for web developers to implement

    • challenging for mass market integration (search engine optimization)

As a result, WIS and weather/climate/water data and services related to discovery and search should be improved to take advantage of current approaches and opportunities.

Weather/climate/water data is by nature geospatial, and temporal. The W3C Data on the Web Best Practices [7] and Spatial Data on the Web Best Practices [8] provide guidelines on how to best enable spatiotemporal data to lower the barrier for users, search engine optimization and linked data.

The current evolution in data exchange standards, systems and architecture are grounded in the following:

  • Resource-oriented architecture (ROA)

  • Representational State Transfer (REST)

  • JSON and HTML as core web formats

Following this trend is the current evolution of OGC interface standards via OGC API [9], which are a clean break against legacy standards, and implement APIs using core, broad industry approaches (W3C, OpenAPI, JSON, etc.).

OGC APIs are designed to be web developer friendly and are being developed with a minimal core and extension mechanism. Example:

  • Service-oriented: /api?request=GetFeature&typename=roads&featureid=5

  • Resource-oriented: /api/collections/roads/items/5

5.1. Objectives

This project aims to experiment implementing WMO discovery metadata using the OGC API - Records draft standard. This project will also experiment actionable linkages with demonstration project 1 (AMQP/MQTT), search/access of collections of variables of NWP data, as well as enabling search capability against WIS 2.0 topics.

6. WIS 2.0

6.1. Principles

WIS 2.0 puts forth the following principles (those applying to this pilot are in bold):

  • Principle 1: WIS 2.0 adopts Web technologies and leverages industry best practices and open standards

  • Principle 2: WIS 2.0 uses Uniform Resource Locators (URL) to identify resources

  • Principle 3: WIS 2.0 prioritizes use of public telecommunications networks (i.e. Internet) when publishing digital resources

  • Principle 4: WIS2.0 requires provision of Web service(s) to access or interact with digital resources (e.g. data, information, products) published using WIS

  • Principle 5: WIS 2.0 encourages NCs and DCPCs to provide 'data reduction' services via WIS that process 'big data' to create results or products that are small enough to be conveniently downloaded and used by those with minimal technical infrastructure

  • Principle 6: WIS 2.0 will add open standard messaging protocols that use the publish-subscribe message pattern to the list of data exchange mechanisms approved for use within WIS and GTS

  • Principle 7: WIS 2.0 will require all services that provide real-time distribution of messages to cache/store the messages for a minimum of 24-hours, and allow users to request cached messages for download

  • Principle 8: WIS 2.0 will adopt direct data exchange between provider and consumer

  • Principle 9: WIS 2.0 will phase out the use of routing tables and bulletin headers

  • Principle 10: WIS 2.0 will provide a Catalogue containing metadata that describes both data and the service(s) provided to access that data

  • Principle 11: WIS 2.0 encourages data providers to publish metadata describing their data [EF1] and Web services in a way that can be indexed by commercial search engines

6.2. User stories

As part of requirements gathering [10], the following user stories provide a description of features that are relevant to WIS 2.0 metadata and search, and are cast from a user perspective:

  • As an NWP center operator I want to quickly and easily publish information about the data that my centre provides and update it as needed in a (semi)automated way using the information that I already have in my vast databases so that I can concentrate on my core business

  • As the leader of a forecasting team of a national meteorological institution, I would like to be able to find more sources of data that might be relevant/useful for the work of my team, notably NWP and satellite imagery so that we could further improve our predictions. That should work for unprocessed outputs of a prediction model or a satellite as well as for services that offer more sophisticated access to the data, e.g. tailing

  • As an entrepreneur (start-up) that provides (wants to provide) tailored weather information I want to be able to find services (free or commercial) that provide meteorological data in a cloud or even better, provide customizable processing of such data - to be able to build my own service on top of it. And I want to be able to find out if a new such service appears or if an existing one changes its abilities so that my company can keep on advancing

  • As a software developer (working for a national met center or a private company), I would like to find a relevant technical description of the service (API) that my boss wants me to integrate with, so that the declared interoperability becomes reality

  • As a user I would like to search for real-time observations for a given time and geographical area of interest so that I can have up to date information on weather for my city

  • As a web developer I would like to access to a search API that provides easy to read documentation, examples, and a simple, intuitive RESTful API with JSON so that I can integrate into my web application quickly

  • As a GIS professional, I would like to search for weather/climate/water data from my GIS Desktop support tool so that I can integrate forecast data into my workflow

The following WIS 2.0 marketing video [11] adds the following user stories:

  • As an everyday user, I would like to find easy to understand and precise weather data so that I can plan to have people over for an outdoor BBQ on a nice day

  • As a smart home owner, I would like access to frequently updated data so that I can keep my smart home monitoring up to date

  • As a weather specialist, I would like to access weather data in native data formats and subscribe to product updates, so that I can provide tailor made weather services to my users

Given the above, we see a variety of users/actors to which WIS 2.0, driving the need for low barrier, ubiquitous and efficient discovery, visualization, access of weather/climate/water (real-time, near real-time, archive, etc.) data.

7. Standards

The standards put forth in this pilot are a clean break from WIS 1.0 standards (OAI-PMH, SRU, ISO 19115/19139) in order to lower the barrier to entry/implementation for a vast range of users/actors. The WIS 1.0 standards for search and metadata, applying a Service-Oriented Architecture (SOA), while applicable for their purpose at the time, present the following challenges in current web architecture:

  • complex machinery and service provisioning is required for crawling/traversing resources

  • challenging for web developers to implement

  • challenging for mainstream Web integration

Current web architecture is primarily rooted in Representational State Transfer (REST) [12] which has the following features:

  • HTTP verbs (GET/PUT/POST/DELETE/OPTIONS/PATCH)

  • HTTP status codes (200, 201, 404, etc.)

  • URIs to identify resources

  • Content negotiation (media types)

  • Stateless

REST puts a focus on Resource Oriented Architecture (ROA). In addition, resources can have numerous representations (raw data, webpage, etc.)depending.

The following standards are well positioned to meet WIS 2.0 principles, consistent with current web architecture and RESTful design patterns.

7.1. Data

Although this pilot is primarily focused on discovery metadata, it is important to outline core baseline standards that are used and / or extended throughout this pilot.

7.1.1. GeoJSON

GeoJSON (RFC 7946 [13]) is a format for encoding a variety of geographic data structures [14]. In the geospatial community, GeoJSON typically represents vector (or feature) data. GeoJSON is a dialect of JSON [15].

GeoJSON is an extremely popular and widely used format in GIS appliations, web applications, mobile apps. GeoJSON also serves as a baseline for the development of downstream models and extensions, as described below.

7.2. Metadata

7.2.1. STAC (Catalog, Collection, Item)

The SpatioTemporal Asset Catalog (STAC) [16] specification provides a common language to describe a range of geospatial information, so it can more easily be indexed and discovered. A 'spatiotemporal asset' is any file that represents information about the earth captured in a certain space and time.

The core STAC specification provides the following metadata models:

  • Item: the lowest granularity of STAC metadata (we can imagine metadata about a single file)

  • Catalog: a list of Items or child Catalogs

  • Collection: collection level metadata

7.3. APIs

7.3.1. OpenAPI

The OpenAPI Specification (OAS) defines a standard, programming language-agnostic interface description for HTTP APIs, which allows both humans and computers to discover and understand the capabilities of a service without requiring access to source code, additional documentation, or inspection of network traffic. When properly defined via OpenAPI, a consumer can understand and interact with the remote service with a minimal amount of implementation logic. Similar to what interface descriptions have done for lower-level programming, the OpenAPI Specification removes guesswork in calling a service [17].

7.3.2. OGC API - Records

OGC API - Records [18] (OARec) offers the capability to create, modify, and query metadata on the Web. The draft specification enables the discovery of geospatial resources by standardizing the way collections of descriptive information about the resources (metadata) are exposed. The draft specification also enables the discovery and sharing of related resources that may be referenced from geospatial resources or their metadata by standardizing the way all kinds of records are exposed and managed.

Consistent with all OGC API standards, OARec will provide core and extension functionality, to allow for modular specification development.

The primary unit of information in OARec is the Record, which is based on GeoJSON and provides a set of properties informed by catalogue schemas in earlier OGC CSW specifications and DCAT. The OARec Record is extensible and can be profiled.

7.3.3. STAC API

STAC also provides an API specification that enables search of STAC items via OpenAPI, following OGC API - Features.

7.4. Considerations

  • WMO is conservative - it would be easier to adopt if STAC is endorsed as an OGC Community Standard

  • WMO likes stable standards; need to think about how we would manage evolution of the spec

  • WMO would likely snapshot a version of the STAC standard

Recommendation: mint versions of OARec and STAC and their relevant extensions for the WIS 2.0 architecture

8. Architecture

8.1. Considerations

Given the WIS 2.0 principles, requirements, standards and current design patterns, the following describes envisioned workflows of WIS 2.0 in the context of metadata search and harvesting.

We consider the following:

  • flexible metadata publishing mechanisms: providers need to be able to publish discovery metadata in the easiest and most efficient way possible

  • basic, HTTPS crawlable metadata files (filesystem, object storage). For example, publishing discovery metadata as JSON files to an S3 bucket, and then making that bucket available for harvesting and traversal to search engines and metadata harvesters

  • the browser as the catalogue: here, browsers utilize mass market search engines as the gateway to low barrier discovery. This pattern works without a dedicated WIS catalogue per se, and also means discovery metadata would not need to be duplicated/harvested across each GISC, with the idea that search engines will harvest from the closest point to the authoritative source

  • the canonical WIS catalogue: given WIS 2.0 will not (and should not!) control or enforce search engine behaviour, there is a need to provide a resource discovery experience without the various intricacies or "value add" typically provided by search engines. Hence a canonical WIS 2.0 Catalogue will be recognized by WMO, approved by permanent representatives (PRs). The canonical WIS 2.0 Catalogue would provide a search API and metadata encodings consistent with the standards described in Standards, and thereby can equally be harvested by search engines just the same.

The main difference between the above options is whether providers publish descriptions of resources via basic methods or APIs.

In either option, providing HTML will facilitate search engine integration, while lower level encodings/formats will facilitate machine to machine workflow. In addition, either option will be required for NCs and DCPCs to publish their discvovery metadata for GISCs to harvest into the canonical WIS Catalogue.

8.1.1. Resource-oriented architecture

In alignment with RESTful patterns, it is important to note that data is the "first class citizen". Operations on the data (e.g. services/APIs) are secondary. Following standards-based practices (such as the OGC API efforts), metadata records about data will reference services related to those data. In this approach, the user first finds the data they are interested in, then can bind to those services via the associated link relations.

8.1.2. Data granularity

In order to provide a WIS 2.0 catalogue of value, it is important to clarify the granularity levels of which providers are to provide metadata. Clarifying granularity will reduce catalogue "pollution" and bring the user closer to the data they are looking for, as well as clarify the appropriate metadata standard to implement for the provider.

  • discovery metadata: OARec record model / STAC

  • station metadata: WIGOS Metadata Standard

  • observation metadata: OGC O&M

The Discovery metadata workflow below illustrates example metadata publiciation and discovery workflows against common meteorological data types:

Discovery metadata workflow
Figure 1. Discovery metadata workflow
  • collection (model): NWP model (OARec record metadata). Example: Canada GDPS

    • collection (variable): NWP model output by forecast variable (including vertical levels) (OARec record metadata). Example: Canada GDPS air temperature

    • product options:

      • API endpoint to interrogate the data/variable

      • x/y/z/t (granule) (STAC Item with link to actual data asset)

  • collection (observations): surface weather observations (OARec record metadata)

    • station metadata as WIGOS metadata via WMO OSCAR/Surface

    • product options:

      • API endpoint to interrogate the data

      • x/y/z/t (granule) (STAC Item, with link to actual data asset)

  • collection (product): METAR

    • product options:

      • API endpoint to interrogate the data

      • product: single message (granule) (STAC Item with link to actual data asset)

GISC catalogues can harvest from DCPCs and/or with the following options:

  • harvest all metadata with clear data type identification

  • harvest only discovery metadata

8.1.3. Publication

Discovery metadata publishing can be provided via two approaches: a static catalogue or API provisioning. In both cases, the metadata model and content are identical, but the means to access and interrogate the metadata vary as described below.

8.1.3.1. Static catalogue publication
  • provide discovery metadata as basic, HTTPS crawelable metadata files (filesystem, object storage). For example, publishing discovery metadata as JSON files to an S3 bucket, and then making that bucket available for harvesting and traversal to search engines and metadata harvesters

  • optionally provide lower granularity metadata (i.e. products) in JSON for traversal

  • optionally augment all metadata by providing alternate representations in HTML with SEO constructs such as schema.org [19] and/or JSON-LD [20]

  • register a basic catalogue landing page to GISC for harvest/ingest

8.1.3.2. API provisioning
  • provide discovery metadata in JSON

  • make discovery metadata available via search API, providing predicates on which to query and filter metadata collections

  • register API landing page to GISC for harvest/ingest

8.1.4. GISC harvesting (aggregation)

In either case, GISCs could support push or pull mechanisms for metadata ingest/harvesting:

  1. pull: traversing the catalogue or API resources and ingest metadata into the GISC local catalogue

  2. push: subscribe to a given NC/DCPC for notifications of resource updates

We also have to consider harvesting depth (hops) as an option for providers to specify when registering their resource catalogues to a GISC.

The primary means of GISC harvesting would be via OARec (API) or basic catalogue crawling. A third option is possible subject to locally agreed upon arrangements between providers and GISCs.

GISCs will provide an OARec endpoint to enable users to search all content provided by the local GISC or content from other GISCs. The aggregation will facilitate a fulsome search of all WIS 2.0 resources as harvested by/between given GISCs.

There is another option to consider: distributed search. Here, metadata stays with each relevant GISC and search requests perform realtime searches against remote GISCs. While this results in simplified metadata management, it also presents issues concerning network latency/failure, as well as providing meaningful sets of search results (each GISC would potentially have various relevance algorithms depending on their Catalogue tooling or backend database or document store). This requires further discussion.

8.1.6. Standards implementation approach

#include::testing.adoc[] #include::results.adoc[] #include::discussion.adoc[] #include::conclusion.adoc[]

Annex A: Revision History

Date Release Editor Primary clauses modified Description

2021-06-23

Template

Tom Kralidis

all

initial template

#include::annex-b-bibiography.adoc[]


1. https://community.wmo.int/governance/commission-membership/commission-observation-infrastructures-and-information-systems-infcom/commission-infrastructure-officers/infcom-management-group/standing-committee-information-management-and-technology-sc-imt/expert-team-metadata-0
2. https://community.wmo.int/governance/commission-membership/commission-observation-infrastructures-and-information-systems-infcom/commission-infrastructure-national-representatives/infcom-management-group/standing-committee-information-management-and-technology-sc-imt/et-metadata
3. https://community.wmo.int/governance/commission-membership/commission-observation-infrastructures-and-information-systems-infcom/commission-infrastructure-officers/infcom-management-group/standing-committee-information-management-and-technology-sc-imt
4. https://community.wmo.int/governance/commission-membership/infcom
5. https://docs.ogc.org/DRAFTS/20-004.html
6. https://datatracker.ietf.org/doc/html/rfc7231
7. https://www.w3.org/TR/dwbp
8. https://www.w3.org/TR/sdw-bp
9. https://ogcapi.org
10. https://github.com/wmo-im/wcmp/issues/107
11. https://gisc.dwd.de/wis2.0/WIS_2.0_final.mp4
12. https://datatracker.ietf.org/doc/html/rfc7231
13. https://datatracker.ietf.org/doc/html/rfc7946
14. https://geojson.org
15. https://www.json.org
16. https://stacspec.org
17. https://www.openapis.org
18. https://ogcapi.ogc.org/records
19. https://schemas.org
20. https://json-ld.org