Table of Contents
WIS2 in a box training
WIS2 in a box (wis2box) is a Free and Open Source (FOSS) Reference Implementation of a WMO WIS2 Node. The project provides a plug and play toolset to ingest, process, and publish weather/climate/water data using standards-based approaches in alignment with the WIS2 principles. wis2box also provides access to all data in the WIS2 network. wis2box is designed to have a low barrier to entry for data providers, providing enabling infrastructure and services for data discovery, access, and visualization.
This training provides step-by-step explanations of various aspects of the wis2box project as well as a number of exercises to help you publish and download data from WIS2. The training is provided in the form of overview presentations as well as hands-on practical exercises.
Participants will be able to work with sample test data and metadata, as well as integrate their own data and metadata.
This training covers a wide range of topics (install/setup/configuration, publishing/downloading data, etc.).
Goals and learning outcomes
The goals of this training are to become familiar with the following:
- WIS2 architecture core concepts and components
- data and metadata formats used in WIS2 for discovery and access
- wis2box architecture and environment
- wis2box core functions:
- metadata management
- data ingest and transformation to BUFR format
- MQTT broker for WIS2 message publishing
- HTTP endpoint for data download
- API endpoint for programmatic access to data
Navigation
The left hand navigation provides a table of contents for the entire training.
The right hand navigation provides a table of contents for a specific page.
Prerequisites
Knowledge
- Basic Linux commands (see the cheatsheet)
- Basic knowledge of networking and Internet protocols
Software
This training requires the following tools:
- An instance running Ubuntu OS (provided by WMO trainers during local training sessions) see Accessing your student VM
- SSH client to access your instance
- MQTT Explorer on your local machine
- SCP and FTP client to copy files from your local machine
Conventions
Question
A section marked like this invites you to answer a question.
Also you will notice tips and notes sections within the text:
Tip
Tips share help on how to best achieve tasks.
Note
Notes provide additional information on the topic covered by the practical session, as well as how to best achieve tasks.
Examples are indicated as follows:
Configuration
1 2 3 4 |
|
Snippets which need to be typed in a on a terminal/console are indicated as:
echo 'Hello world'
Container names (running images) are denoted in bold.
Training location and materials
This training is always provided live at https://training.wis2box.wis.wmo.int.
The training contents, wiki and issue tracker are managed on GitHub at https://github.com/wmo-im/wis2box-training.
Printing the material
This training can be exported to PDF. To save or print this training material, go to the print page, and select File > Print > Save as PDF.
Exercise materials
Exercise materials can be downloaded from the exercise-materials.zip zipfile.
Support
For issues/bugs/suggestions or improvements/contributions to this training, please use the GitHub issue tracker.
All wis2box bugs, enhancements and issues can be reported on GitHub.
For additional support of questions, please contact wis@wmo.int.
As always, wis2box core documentation can always be found at https://docs.wis2box.wis.wmo.int.
Contributions are always encouraged and welcome!
Practical sessions ↵
Connecting to WIS2 over MQTT
Learning outcomes
By the end of this practical session, you will be able to:
- connect to the WIS2 Global Broker using MQTT Explorer
- review the WIS2 topic structure
- review the WIS2 notification message structure
Introduction
WIS2 uses the MQTT protocol to advertise the availability of weather/climate/water data. The WIS2 Global Broker subscribes to all WIS2 Nodes in the network and republishes the messages it receives. The Global Cache subscribes to the Global Broker, downloads the data in the message and then republishes the message on the cache
topic with a new URL. The Global Discovery Catalogue publishes discovery metadata from the Broker and provides a search API.
As part of the WIS2 Pilot Phase in 2023, two Global Brokers are available (hosted by China Meteorological Administration and Météo-France).
This is an example of the WIS2 notification message structure for a message received on the topic origin/a/wis2/arg/sabm/data/core/weather/surface-based-observations/synop
:
{
"id": "fa587559-b02e-40a2-9fd5-2c141c39b130",
"type": "Feature",
"version": "v04",
"geometry": {
"type": "Point",
"coordinates": [
-56.625,
-64.24139,
208
]
},
"properties": {
"data_id": "wis2/arg/sabm/data/core/weather/surface-based-observations/synop/WIGOS_0-20000-0-89055_20230926T190000",
"datetime": "2023-09-26T19:00:00Z",
"pubtime": "2023-09-26T18:48:51Z",
"integrity": {
"method": "sha512",
"value": "5a239b5d2ba7a04bd3b2fa44a73a9fb98167d0d4424d7fd19c0a83c75a7715212e2b97b5db6581bb3e1895b92232614d4cc4841ee9164baf47183c8403668dbd"
},
"wigos_station_identifier": "0-20000-0-89055"
},
"links": [
{
"rel": "canonical",
"type": "application/x-bufr",
"href": "http://w2b.smn.gov.ar/data/2023-09-26/wis/arg/sabm/data/core/weather/surface-based-observations/synop/WIGOS_0-20000-0-89055_20230926T190000.bufr4",
"length": 254
},
{
"rel": "via",
"type": "text/html",
"href": "https://oscar.wmo.int/surface/#/search/station/stationReportDetails/0-20000-0-89055"
}
]
}
In this practical session you will learn how to use the MQTT Explorer tool to review the topics available on this Global Broker and be able to display WIS2 notification messages.
MQTT Explorer is a helpful tool to browse and review the topic structure for a given MQTT broker and visually interact with the MQTT protocol. There exist numerous other MQTT client and server software, depending on your requirements and technical environment.
To work with MQTT programmatically (for example, in Python), you can use MQTT client libraries such as paho-mqtt to connect to an MQTT broker and process incoming messages.
Using MQTT Explorer to connect to the Global Broker
One way to view messages published by this Global Broker is using the MQTT Explorer which can be downloaded from the MQTT Explorer website.
Open MQTT Explorer and add a new connection to the Global Broker hosted by China Meteorological Administration using the following details:
- host: gb.wis.cma.cn
- port: 8883
- username: everyone
- password: everyone
Click on the 'ADVANCED' button and add the following topics to subscribe to:
origin/#
cache/#
Note
When setting up MQTT subscriptions you can use the following wildcards:
- Single-level (+): a single-level wildcard replaces one topic level
- Multi-level (#): a multi-level wildcard replaces multiple topic levels
Click 'BACK', then 'SAVE' to save your connection and subscription details. Then click 'CONNECT':
Messages should start appearing in your MQTT Explorer session as follows:
You are now ready to start exploring the WIS2 topics and message structure.
Exercise 1: Review the WIS2 topic structure
Use MQTT to browse topic structure under the origin
and cache
topics.
Question
How can we distinguish the originating country providing the data?
Click to reveal answer
We can distinguish the originating country by looking at the fourth level of the topic structure. For example, the following topic:
origin/a/wis2/zmb/zambia_met_service/data/core/weather/surface-based-observations/synop
tells us that the data was published by Zambia (zmb).
The fifth level of the topic structure provides the id of the centre where the data originated from (in this case, the Zambia Meteorological Service).
Question
How can we distinguish the data type?
Click to reveal answer
We can distinguish the data type by looking at the ninth level of the topic structure. For example, the following topic:
origin/a/wis2/zmb/zambia_met_service/data/core/weather/surface-based-observations/synop
tells us that the data type is surface-based observations (synop).
Exercise 2: Review the WIS2 message structure
Disconnect from MQTT Explorer and update the 'Advanced' sections to change the subscription to the following:
origin/a/wis2/+/+/data/core/weather/surface-based-observations/synop
cache/a/wis2/+/+/data/core/weather/surface-based-observations/synop
Note
The +
wildcard is used to subscribe to all countries (fourth level) and centres (fifth level) while the remaining topic structure is fixed to ensure we subscribe to subtopics data/core/weather/surface-based-observations/synop
.
Messages should start to appear again.
You can view the content of the WIS2 message in the "Value" section on the right hand side.
Question
How can we identify the timestamp that the data was published? And how can we identify the timestamp that the data was collected?
Click to reveal answer
The timestamp that the data was published is contained in the properties
section of the message with a key of pubtime
.
The timestamp that the data was collected is contained in the properties
section of the message with a key of datetime
.
Question
How can we download the data from the URL provided in the message?
Click to reveal answer
The URL is contained in the links
section with rel="canonical"
and defined by the href
key.
You can copy the URL and paste it into a web browser to download the data.
Exercise 3: cache vs origin topics
The same message is published on both the origin
and cache
topics. Find a message that has been published on both topics and compare the content.
Question
What is the difference between the messages published on the origin
and cache
topics?
Click to reveal answer
The message published on the origin
topic contains a URL to the original data. The message published on the cache
topic contains a URL to the data cached by the Global Cache.
Conclusion
Congratulations!
In this practical session, you learned:
- how to subscribe to WIS2 Global Broker services using MQTT Explorer
- the WIS2 topic structure
- the WIS2 notification message structure
- the difference between Global Broker messages published on the
origin
andcache
topics
Accessing your student VM
Learning outcomes
By the end of this practical session, you will be able to:
- access your student VM over SSH and WinSCP
- verify the required software for the practical exercises is installed
- verify you have access to exercise materials for this training on your local student VM
Introduction
As part of locally run wis2box training sessions, you can access your personal student VM on the local training network named "WIS2-training".
Your student VM has the following software pre-installed:
- Ubuntu 22.0.4.3 LTS ubuntu-22.04.3-live-server-amd64.iso
- Python 3.10.12
- Docker 24.0.6
- Docker Compose 2.21.0
- Text editors: vim, nano
Note
If you want to run this training outside of a local training session, you can provide your own instance using any cloud provider, for example:
- GCP (Google Cloud Platform) VM instance
e2-medium
- AWS (Amazon Web Services) ec2-instance
t3a.medium
- Azure (Microsoft) Azure Virtual Machine
standard_b2s
Select Ubuntu Server 22.0.4 LTS as OS.
After creating your VM ensure you have installed python, docker and docker compose, as described at wis2box-software-dependencies.
The release archive for wis2box used in this training can be downloaded as follows:
wget https://github.com/wmo-im/wis2box/releases/download/1.0b5/wis2box-setup-1.0b5.zip
unzip wis2box-setup-1.0b5.zip
You can always find the latest 'wis2box-setup' archive at https://github.com/wmo-im/wis2box/releases.
The exercise material used in this training can be downloaded as follows:
wget https://training.wis2box.wis.wmo.int/exercise-materials.zip
unzip exercise-materials.zip
The following additional Python packages are required to run the exercise materials:
pip3 install minio
pip3 install pywiscat
pip3 install pywis-pubsub
If you are using the student VM provided during local WIS2 training sessions, the required software will already be installed.
Connect to your student VM on the local training network
Use the following configuration to connect your PC on the local Wi-Fi broadcasted in the room during WIS2 training:
- SSID: WIS2-training
- password: dataismagic!
Use an SSH client to connect to your student VM using the following:
- Host: (provided during in-person training)
- Port: 22
- Username: (provided during in-person training)
- Password: wis2training (default password to be changed after logging in)
Tip
Contact a trainer if you are unsure about the hostname/username or have issues connecting.
Once connected, please change your password to ensure others cannot access your VM:
limper@student-vm:~$ passwd
Changing password for testuser.
Current password:
New password:
Retype new password:
passwd: password updated successfully
Verify software versions
To be able to run wis2box, the student VM should have Python, Docker and Docker Compose pre-installed.
Check Python version:
python3 --version
Python 3.10.12
Check docker version:
docker --version
Docker version 24.0.6, build ed223bc
Check Docker Compose version:
docker compose version
Docker Compose version v2.21.0
To ensure your user can run Docker commands your user has been added to the docker
group.
To test that your user can run docker hello-world, run the following command:
docker run hello-world
returns:
Hello from Docker!
This message shows that your installation appears to be working correctly.
...
Inspect the exercise materials
Inspect the contents of your home directory; these are the materials used as part of the training and practical sessions.
ls ~/
exercise-materials wis2box-1.0b5
You can use WinSCP to connect to your instance and inspect the contents of your home directory and download or upload files between your VM and your local PC.
Open WinSCP and click on the "New Site". You can create a new SCP connection to your VM as follows:
Click 'Save' and then 'Login' to connect to your VM.
And you should be able to see the following content:
Conclusion
Congratulations!
In this practical session, you learned how to:
- access your student VM over SSH and WinSCP
- verify the required software for the practical exercises is installed
- verify you have access to exercise materials for this training on your local student VM
Initializing wis2box
Learning outcomes
By the end of this practical session, you will be able to:
- run the
wis2box-create-config.py
script to create the initial configuration - start wis2box and check the status of its components
- access the wis2box-webapp, API, MinIO UI and Grafana dashboard in a browser
- connect to the local wis2box-broker using MQTT Explorer
Note
The current training materials are using wis2box-1.0b5.
See accessing-your-student-vm for instructions on how to download and install the wis2box software stack if you are running this training outside of a local training session.
Preparation
Login to your designated VM with your username and password and ensure you are in the wis2box-1.0b5
directory:
cd ~/wis2box-1.0b5
wis2box-create-config.py
The wis2box-create-config.py
script can be used to create the initial configuration of your wis2box.
It will ask you a set of question to help setup your configuration.
You will be able to review and update the configuration files after the script has completed.
Run the script as follows:
python3 wis2box-create-config.py
wis2box-data directory
The script will ask you to enter the directory where your configuration and data will be stored.
We recommend you use the directory wis2box-data
in your home directory to store your configuration and data.
Note that you need to define the full path to this directory.
For example if your username is mlimper
, the full path to the directory is /home/mlimper/wis2box-data
:
mlimper@student-vm-mlimper:~/wis2box-1.0b5$ python3 wis2box-create-config.py
Please enter the directory on the host where wis2box-configuration-files are to be stored:
/home/mlimper/wis2box-data
Configuration-files will be stored in the following directory:
/home/mlimper/wis2box-data
Is this correct? (y/n/exit)
y
The directory /home/mlimper/wis2box-data has been created.
wis2box URL
Next, you will be asked to enter the URL for your wis2box. This is the URL that will be used to access the wis2box web application, API and UI.
Please use http://<your-hostname>
as the URL. Remember that your hostname is defined by your username.wis2.training
Please enter the URL of the wis2box:
For local testing the URL is http://localhost
To enable remote access, the URL should point to the public IP address or domain name of the server hosting the wis2box.
http://mlimper.wis2.training
The URL of the wis2box will be set to:
http://mlimper.wis2.training
Is this correct? (y/n/exit)
STORAGE and BROKER passwords
You can use the option of random password generation when prompted for WIS2BOX_STORAGE_PASSWORD
and WIS2BOX_BROKER_PASSWORD
or define your own.
Don't worry about remembering these passwords, they will be stored in the wis2box.env
file in your wis2box-1.0b5 directory.
Country code and centre-id
Next you will be asked for the 3-letter ISO code for your country and centre-id for your wis2box. The centre-id can be a string of your choosing for the purpose of this training.
Please enter your 3-letter ISO country code:
nld
Please enter the centre-id for your wis2box:
maaike_test
The country-code will be set to:
nld
The centre-id will be set to:
maaike_test
Is this correct? (y/n/exit)
Discovery metadata
Next you will answer a set of question to generate discovery metadata templates for your wis2box. The answers do not need to be correct for the purpose of this training.
********************************************************************************
Creating initial configuration for surface and upper-air data.
********************************************************************************
Please enter the email address of the wis2box administrator:
mlimper@wmo.int
The email address of the wis2box administrator will be set to:
mlimper@wmo.int
Is this correct? (y/n/exit)
n
Please enter the email address of the wis2box administrator:
me@gmail.com
The email address of the wis2box administrator will be set to:
me@gmail.com
Is this correct? (y/n/exit)
y
Please enter the name of your organization:
Maaike-TEST
Your organization name will be set to:
Maaike-TEST
Is this correct? (y/n/exit)
y
Getting bounding box for "nld".
bounding box: -68.6255319,11.825,7.2274985,53.744395.
Do you want to use this bounding box? (y/n/exit)
y
Created new metadata file: /home/mlimper/wis2box-data/metadata/discovery/metadata-synop.yml
Created new metadata file: /home/mlimper/wis2box-data/metadata/discovery/metadata-temp.yml
We will review the discovery metadata templates in a later session.
review configuration
Once the scripts is completed check the contents of the wis2box.env
file in your current directory:
cat ~/wis2box-1.0b5/wis2box.env
Or check the content of the file via WinSCP.
Question
What is the value of the WIS2BOX_STORAGE_DATA_RETENTION_DAYS environment variable in the wis2box.env file?
Click to reveal answer
The default value for WIS2BOX_STORAGE_DATA_RETENTION_DAYS is 30 days. You can change this value to a different number of days if you wish.
The wis2box-management container runs a cronjob on a daily basis to remove data older than the number of days defined by WIS2BOX_STORAGE_DATA_RETENTION_DAYS from the wis2box-public
bucket and the API backend:
0 0 * * * su wis2box -c "wis2box data clean --days=$WIS2BOX_STORAGE_DATA_RETENTION_DAYS"
Note
The wis2box.env
file contains environment variables defining the configuration of your wis2box. For more information consult the wis2box-documentation
Next, check the contents of the data-mappings.yml
file in your wis2box data directory:
cat ~/wis2box-data/data-mappings.yml
Or check the content of the data-mappings.yml via WinSCP by browsing to the new directory 'wis2box-data' (click refresh if you don't see it yet)
Question
How many different keys are defined for data
in the data-mappings.yml
file?
Click to reveal answer
There are 2 different 'data'-keys defined in the data-mappings.yml file, one for surface-based observations and one for upper-air observations:
- nld.maaike_test.data.core.weather.surface-based-observations.synop
- nld.maaike_test.data.core.weather.upper-air-observations.temp
The country-code and centre-id will be different from the example above, they will be set to the values you entered during the wis2box-create-config.py
script.
You can also note that different 'plugins' are defined for the different data types. The use of these plugins in the wis2box data pipeline architecture will be discussed in a later session.
Note
The data-mappings.yml
file defines the plugins used to transform your data. For more information see data pipeline plugins in the wis2box-documentation
wis2box start and status
Ensure you are in the directory containing the wis2box software stack:
cd ~/wis2box-1.0b5
Start wis2box with the following command:
python3 wis2box-ctl.py start
Inspect the status with the following command:
python3 wis2box-ctl.py status
Repeat this command until all services are up and running.
wis2box and Docker
wis2box runs as a set of Docker containers managed by docker-compose.
The services are defined in the various docker-compose*.yml
which can be found in the ~/wis2box-1.0b5/
directory.
The Python script wis2box-ctl.py
is used to run the underlying Docker Compose commands that control the wis2box services.
wis2box API
Open a new tab and navigate to the page http://<your-host>/oapi
.
This is the wis2box API (running via the wis2box-api container).
Question
What collections are currently available?
Click to reveal answer
To view collections currently available through the API, click View the collections in this service
:
The following collections are currently available:
- Discovery metadata
- Station metadata
- Data notifications
Question
How many data notifications have been published?
Click to reveal answer
Click on "Data notifications", then click on Browse through the items of "Data Notifications"
.
You will note that the page says "No items" as no Data notifications have been published yet.
wis2box webapp
Open a web browser and visit the page http://<your-host>/wis2box-webapp
:
This is the (new) wis2box web application to enable you to interact with your wis2box:
- ingest ASCII and CSV data
- update/review your station metadata
- monitor notifications published on your wis2box-broker
We will use this web application in a later session.
wis2box UI
Open a web browser and visit the page http://<your-host>
:
The wis2box UI will display your configured datasets. The UI is currently empty, as datasets have not yet been configured.
wis2box-broker
Open the MQTT Explorer on your computer and prepare a new connection to connect to your broker (running via the wis2box-broker container).
Click +
to add a new connection:
Click on the 'ADVANCED' button make sure you have subscriptions to the the following topics:
$SYS/#
origin/#
wis2box/#
Note
The messages published under the $SYS
topic are system messages published by the mosquitto service itself.
The messages published under topics starting with origin/a/wis2/#
are the WIS2 data notifications published by the wis2box-broker.
The messages published under topics starting with wis2box
are internal messages between the various components of the wis2box software stack.
Use the following connection details, making sure to replace the value of <your-host>
with your hostname and <WIS2BOX_BROKER_PASSWORD>
with the value from your wis2box.env
file:
- Protocol: mqtt://
- Host:
<your-host>
- Port: 1883
- Username: wis2box
- Password:
<WIS2BOX_BROKER_PASSWORD>
Note
Check you wis2box.env for the value of your WIS2BOX_BROKER_PASSWORD.
Make sure to click "SAVE" to store your connection details.
Then click "CONNECT" to connect to your wis2box-broker.
Once you are connected, you should see statistics being published by your broker on the $SYS/#
.
Later during the training you will use the MQTT connection you saved to view notifications published by your wis2box-broker.
MinIO UI
Open a web browser and visit the page http://<your-host>:9001
:
This is the MinIO UI (running via the wis2box-storage container).
The username and password are defined in the wis2box.env
file in your wis2box data directory by the environment variables WIS2BOX_STORAGE_USERNAME
and WIS2BOX_STORAGE_PASSWORD
.
Use the command below to check the values of these environment variables from the command line in your SSH session:
cat ~/wis2box-1.0b5/wis2box.env
Or check the content of the file via WinSCP.
Try to login to your MinIO UI. You will see that there 3 buckets already defined:
wis2box-incoming
: used to receive incoming datawis2box-public
: used to store data that is made available in the WIS2 notifications, the content of this bucket is proxied as/data
on yourWIS2BOX_URL
via the nginx containerwis2box-archive
: used to archive data fromwis2box-incoming
on a daily basis
Note
The wis2box-storage container will send a notification on the wis2box-broker when data is received. The wis2box-management container is subscribed to all messages on wis2box/#
and will receive these notifications, triggering the data pipelines defined in your data-mappings.yml
.
Grafana UI
Open a web browser and visit the page http://<your-host>:3000
:
This is the Grafana UI, where you can view the wis2box workflow monitoring dashboard. You can also access the logs of the various containers in the wis2box software stack via the 'Explore' option in the menu.
Conclusion
Congratulations!
In this practical session, you learned how to:
- run the
wis2box-create-config.py
script to create the initial configuration - start wis2box and check the status of its components
- access the wis2box-webapp, API, MinIO UI and Grafana dashboard in a browser
- connect to the wis2box-broker using MQTT Explorer
Configuring WIS2 discovery metadata
Learning outcomes
By the end of this practical session, you will be able to:
- create discovery metadata
- publish discovery metadata
- update and re-publish discovery metadata
Introduction
WIS2 requires discovery metadata to be provided describing your data to be shared to WIS2 Global Services. This session will walk you through creating and publishing discovery metadata from wis2box from a configuration file.
Note
While this practical session uses discovery metadata to publish SYNOP, feel free to use another dataset (such as TEMP).
The discovery metadata configuration file for TEMP can be found in ~/wis2box-data/metadata/discovery/metadata-temp.yml
.
Preparation
Ensure you are running MQTT Explorer and you are connected to the broker on your student VM before continuing.
Login to your student VM using your SSH-client.
Creating discovery metadata
Let's start by using the file generated during the Initializing wis2box practical session.
Inspect the sample SYNOP discovery metadata:
more ~/wis2box-data/metadata/discovery/metadata-synop.yml
Note
All values in the discovery metadata configuration are required and should be included.
Question
How does line 3 of your discovery metadata file relate to the new data mapping in the previous session?
Click to reveal answer
Line 3 of the discovery metadata configuration file should be equal to one of the data mappings defined in data-mappings.yml
.
Update the following values in the discovery metadata configuration:
identification.title
: a human readable title describing your dataidentification.abstract
: a human readable description describing your dataidentification.extents.temporal (begin)
: the begin (BEGIN_DATE
) (YYYY-MM-DD
) and end time of your data (keeping the end time tonull
is suitable to ongoing observations)contact.pointOfContact
: your organization's point of contact information
Note
Many values have been pre-populated as part of the Initializing wis2box practical session, however you will want to ensure they are correct / accurate for your dataset.
Tip
The configuration is based on the YAML format. Consult the YAML cheatsheet for more information.
Tip
As the identification.extents.spatial.bbox
coordinates are pre-populated, you may want to update the coordinates if the dataset coordinates are not based on your country's administrative boundaries. Feel free to update this value accordingly, ensuring that values are correctly signed (for example, use the minus sign [-
] for southern or western hemispheres.
The following tools can be valuable for deriving your identification.extents.spatial.bbox
:
Publishing discovery metadata
First login to the wis2box-management container:
python3 wis2box-ctl.py login
Run the following command to publish your discovery metadata:
wis2box metadata discovery publish /data/wis2box/metadata/discovery/metadata-synop.yml
Ensure that your discovery metadata was published to the API, by navigating to http://<your-host>/oapi/collections/discovery-metadata
.
Ensure that your discovery metadata was also published to the broker, by looking for a new metadata message in MQTT Explorer.
Question
Do you see your new discovery metadata in the API?
Click to reveal answer
You should see your published discovery metadata in the API, under the http://<your-host>/oapi/collections/discovery-metadata/items
link.
Click on your discovery metadata record and inspect the content, noting how it relates to the discovery metadata configuration created earlier in this session.
Update the title of your discovery metadata:
vi /data/wis2box/metadata/discovery/metadata-synop.yml
Tip
You can also use WinSCP to connect to your instance and edit this file.
Now re-publish (from inside the wis2box-management container):
wis2box metadata discovery publish /data/wis2box/metadata/discovery/metadata-synop.yml
Ensure that your discovery metadata updates were published to the API, by refreshing the page to your discovery metadata.
Question
Are you able to see the updates you made in the configuration?
Click to reveal answer
You should see your published discovery metadata update in the API, under the http://<your-host>/oapi/collections/discovery-metadata/items
link.
Feel free to update additional values and re-publishing your discovery metadata to get a better idea of how and where discovery metadata content is updated.
Publishing your dataset to the API
Run the below command to add the data to the API:
wis2box data add-collection /data/wis2box/metadata/discovery/metadata-synop.yml
Ensure that your dataset was published to the API, by navigating to http://<your-host>/oapi/collections/<metadata.identifier>
.
Question
Do you see your new dataset in the API?
Click to reveal answer
You should see the new dataset published to the API.
Question
Do you see any data coming from your new dataset in the API? If not, why not?
Click to reveal answer
If you do not see new data, this means that data has not been ingested yet against your new dataset.
Conclusion
Congratulations!
In this practical session, you learned how to:
- create discovery metadata
- publish discovery metadata
- update and re-publish discovery metadata
Configuring station metadata
Learning outcomes
By the end of this practical session, you will be able to:
- create an authorization token for the
collections/stations
endpoint - add station metadata to wis2box
- review stations associated to datasets in the wis2box-ui
- update/delete station metadata using the wis2box-webapp
Introduction
wis2box has a collection of station metadata that is used to publish data on WIS2. Only data for stations configured in the wis2box station list will be published on your wis2box broker. The WIGOS Station Identifier (WSI) is used as the unique reference of the station which produced a specific set of observation data.
Create an authorization token for collections/stations
To edit stations via the wis2box-webapp you will first to need create an authorization token.
Login to your student VM and ensure you are in the wis2box-1.0b5
directory:
cd ~/wis2box-1.0b5
Then login into the wis2box-management container with the following command:
python3 wis2box-ctl.py login
Within the wis2box-management container your can create an authorization token for a specific endpoint using the command: wis2box auth add-token --path <my-endpoint>
.
For example, to use a random automatically generated token for the collections/stations
endpoint:
wis2box auth add-token --path collections/stations
The output will look like this:
Continue with token: 7ca20386a131f0de384e6ffa288eb1ae385364b3694e47e3b451598c82e899d1 [y/N]? y
Token successfully created
Or, if you want to define your own token for the collections/stations
endpoint, you can use the following example:
wis2box auth add-token --path collections/stations DataIsMagic
Output:
Continue with token: DataIsMagic [y/N]? y
Token successfully created
Exercise 1: Create an authorization token for collections/stations
Please create an authorization token for the collections/stations
endpoint using the instructions above.
add station metadata using the wis2box-webapp
The wis2box-webapp provides a graphical user interface to edit station metadata.
Open the wis2box-webapp in your browser by navigating to http://<your-host>/wis2box-webapp
:
And select stations:
When you click add 'add new station' you are asked to provide the WIGOS station identifier for the station you want to add:
When you click search the station data is retrieved from OSCAR/Surface, please note that this can take a few seconds.
Review the data returned by OSCAR/Surface and add missing data where required. Select a topic for the station and provide your authorization token for the collections/stations
endpoint and click 'save':
Go back to the station list and you will see the station you added:
Exercise 2: Add station metadata
Please add three or more stations to the wis2box station metadata collection of your wis2box.
Please use stations from your country if possible, especially if you brought your own data.
Otherwise, you can use the following WIGOS station identifiers for testing purposes:
- 0-20000-0-91334
- 0-20000-0-96323 (note missing station elevation in OSCAR)
- 0-20000-0-96749 (note missing station elevation in OSCAR)
Deriving missing elevation information
If your station elevation is missing, there are online services to help lookup the elevation using open elevation data. One such example is the Open Topo Data API.
For example, to get the elevation of the BMKG auditorium, one would query the Open Topo Data API as follows:
wget -q -O - "https://api.opentopodata.org/v1/aster30m?locations=-6.15558,106.84204"
Output:
{
"results": [
{
"dataset": "aster30m",
"elevation": 7.0,
"location": {
"lat": -6.15558,
"lng": 106.84204
}
}
],
"status": "OK"
}
Review your station metadata
After saving your station metadata, you can review the content of your station metadata in the wis2box-webapp:
You can verify the updated stations are available in the wis2box-api:
You can also visit the wis2box-ui at http://<your-host>
and select "EXPLORE" on your dataset and you will see the stations you added:
Exercise 3: Review your station metadata
Verify the stations you added are associated to your dataset by visiting the wis2box-api and wis2box-ui endpoints for your host in your browser.
You also have the option to view/update/delete the station in the wis2box-webapp. Note that you are required to provide your authorization token for the collections/stations
endpoint to update/delete the station.
Exercise 4: Update/delete station metadata
Please update/delete the station metadata for one of the stations you added using the wis2box-webapp.
Bulk station metadata upload
wis2box also has the ability to perform "bulk" loading of station metadata from a CSV file. See the official wis2box documentation for more information.
Conclusion
Congratulations!
In this practical session, you learned how to:
- create an authorization token for the
collections/stations
endpoint - add station metadata to wis2box
- review stations associated to datasets in the wis2box-ui
- update/delete station metadata using the wis2box-webapp
Converting SYNOP data to BUFR
Learning outcomes
By the end of this practical session, you will be able to:
- submit valid FM-12 SYNOP bulletins via the wis2box web application for conversion to BUFR and exchange over the WIS2.0
- validate, diagnose and fix simple coding errors in an FM-12 SYNOP bulletin prior to format conversion and exchange
- ensure that the required station metadata is available in the wis2box
- confirm and inspect successfully converted bulletins
Introduction
Surface weather reports from land surface stations have historically been reported hourly or at the main (00, 06, 12 and 18 UTC) and intermediate (03, 09, 15, 21 UTC) synoptic hours. Prior to the migration to BUFR these reports were encoded in the plain text FM-12 SYNOP code form. Whilst the migration to BUFR was scheduled to be complete by 2012 a large number of reports are still exchanged in the legacy FM-12 SYNOP format. Further information on the FM-12 SYNOP format can be found in the WMO Manual on Codes, Volume I.1 (WMO-No. 306, Volume I.1).
WMO Manual on Codes, Volume I.1
To aid with completing migration to BUFR some tools have been developed for encoding FM-12 SYNOP reports to BUFR, in this session you will learn how to use these tools as well as the relationship between the information contained in the FM-12 SYNOP reports and BUFR messages.
Preparation
Prerequisites
- Ensure that your wis2box has been configured and started.
- Confirm the status by visiting the wis2box API (
http://<your-host-name>/oapi
) and verifying that the API is running. - Make sure that you have MQTT Explorer open and connected to your broker.
In order to submit data to be processed in the wis2box-webapp you will need an auth token for "processes/wis2box".
You can generate this token by logging in to the wis2box management container and using the wis2box auth add-token
command:
cd ~/wis2box-1.0b5
python3 wis2box-ctl.py login
wis2box auth add-token --path processes/wis2box
Or if you want to define your own token:
wis2box auth add-token --path processes/wis2box DataIsMagic
Please record the token that is generated, you will need this later in the session.
For practical purposes the exercises in this session use data from Romania, import the station 0-20000-0-15015
into your station list and associate it with the topic for your "Surface weather observations collection". You can remove this station at the end of the session.
If you have forgotten your auth token for "collections/stations" you can create a new one with:
wis2box auth add-token --path collections/stations
Inspecting SYNOP data and BUFR conversion
Exercise 1 - the basics
Review the FM-12 SYNOP message below:
AAXX 21121
15015 02999 02501 10103 21090 39765 42952 57020 60001=
Identify the key components of the FM-12 SYNOP message and confirm that the station(s) is (are) registered
within the wis2box. The station(s) should be discoverable via the station list page
(http://<your-host-name>/wis2box-webapp/station
) or via the API directly
(http://<your-host-name>/oapi/collections/stations
).
Hint
The traditional station identifier (2 digit block number and 3 digit station number) are included in the FM-12 SYNOP report. These can be used to find the station.
Question
What is the traditional station identifier of the station included in the message and where is the station located?
Click to reveal answer
The five digit group, 15015
, gives the 5 digit traditional station identifier. In
this case for the station "OCNA SUGATAG" located in Romania.
Identify the number of weather reports in the message.
Question
How many weather reports are in the message?
Click to reveal answer
One, the report contains a single message.
The first line AAXX 21121
indicates that this is an FM-12 SYNOP message (AAXX
),
the 2112 indicates that the weather observation was made on the 21st day of the month at 12 UTC.
The final digit of the row, 1
, indicates the source and units of the wind speed. The second line
contains a single weather report, beginning with the 5 digit group 15015
and ending with
the =
symbol.
Exercise 2 - converting your first message
Now that you have reviewed the message, you are ready to convert the data to BUFR.
Copy the message you have just reviewed:
AAXX 21121
15015 02999 02501 10103 21090 39765 42952 57020 60001=
Open the wis2box web application and navigate to the synop2bufr page using the left navigation drawer and proceed as follows:
- Paste the content you have copied in the text entry box.
- Select the month and year using the date picker, assume the current month for this exercise.
- Select a topic from the drop down menu (the options are based on the metadata configured in the wis2box)
- Enter the "processes/wis2box" auth token you generated earlier
- Ensure "Publish on WIS2" is toggled ON
- Click "SUBMIT"
Click submit. The data will now be converted to BUFR and the result returned to the web application.
Click on "Output BUFR files" to see a list of the files that have been generated.
Tip
The result section of the page should show a single converted BUFR message with zero warnings or errors. The download button allows the BUFR data to be downloaded directly to your computer. Click the down arrow / chevron to reveal download and inspect buttons. The inspect button runs a process to convert and extract the data from BUFR.
Question
The FM-12 SYNOP format does not include the station location, elevation or barometer height. Confirm that these are in the output BUFR data, where do these come from?
Click to reveal answer
Clicking the inspect button should bring up a dialog like that shown below.
This includes the station location shown on a map and basic metadata, as well as the observations in the message.
As part of the transformation from FM-12 SYNOP to BUFR, additional metadata was added to the BUFR file.
The BUFR file can also be inspected by downloading the file and validating using a tool such as as the ECMWF ecCodes BUFR validator.
As a final step navigate to the monitoring page from the left menu and confirm that the data have been published.
You should also be able to see these notifications in MQTT Explorer.
Exercise 3 - understanding the station list
For this next exercise you will convert a file containing multiple reports, see the data below:
AAXX 21121
15015 02999 02501 10103 21090 39765 42952 57020 60001=
15020 02997 23104 10130 21075 30177 40377 58020 60001 81041=
15090 02997 53102 10139 21075 30271 40364 58031 60001 82046=
Question
Based on the prior exercise, look at the FM-12 SYNOP message and predict how many output BUFR messages will be generated.
Now copy paste this message into SYNOP form and submit the data.
Did the number of messages generated match your expectation and if not, why not?
Click to reveal answer
You might have expected three BUFR messages to be generated, one for each weather report. However, instead you got 2 warnings and only one BUFR file.
In order for a weather report to be converted to BUFR the basic metadata contained in the station list is required. Whilst the above example includes three weather reports, two of the three stations reporting were not registered in your wis2box.
As a result, only one of the three weather report resulted in a BUFR file being generated and WIS2 notification being published. The other two weather reports were ignored and warnings were generated.
Hint
Take note of the relationship between the WIGOS Identifier and the traditional station
identifier included in the BUFR output. In many cases, for stations listed in WMO-No. 9
Volume A at the time of migrating to WIGOS station identifiers, the WIGOS station
identifier is given by the traditional station identifier with 0-20000-0
prepended,
e.g. 15015
has become 0-20000-0-15015
.
Using the station list page from the web application import the missing stations from OSCAR/Surface into the wis2box and repeat the exercise.
Three BUFR files should be generated and there should be no warnings or errors listed in the web application.
In addition to the basic station information, additional metadata such as the station elevation above sea level and the barometer height above sea level are required for encoding to BUFR. The fields are included in the station list and station editor pages.
Exercise 4 - debugging
In this final exercise you will identify and correct two of the most common problems encountered when using this tool to convert FM-12 SYNOP to BUFR.
Example data is shown in the box below, examine the data and try and resolve any issues that there may be prior to submitting the data through the web application.
Hint
You can edit the data in the entry box on the web application page. If you miss any issues these should be detected and highlighted as a warning or error once the submit button has been clicked.
AAXX 21121
15015 02999 02501 10103 21090 39765 42952 57020 60001
15020 02997 23104 10130 21075 30177 40377 58020 60001 81041=
15090 02997 53102 10139 21075 30271 40364 58031 60001 82046=
Question
What issues did you expect to encounter when converting the data to BUFR and how did you overcome them? Where there any issues you were not expecting?
Click to reveal answer
In this first example the "end of text" symbol (=), or record delimiter, is missing between the first and second weather reports. Consequently, lines 2 and 3 are treated as a single report, leading to errors in the parsing of the message.
The second example below contains several common issues found in FM-12 SYNOP reports. Examine the data and try to identify the issues and then submit the corrected data through the web application.
AAXX 21121
15020 02997 23104 10/30 21075 30177 40377 580200 60001 81041=
Question
What issues did you find and how did you resolve these?
Click to reveal answer
There are two issues in the weather report.
The first, in the signed air temperature group, has the tens character set to missing (/), leading to an invalid group. In this example we know that the temperature is 13.0 degrees Celsius (from the above examples) and so this issue can be corrected. Operationally, the correct value would need to be confirmed with the observer.
The second issue occurs in group 5 where there is an additional character, with the final character duplicated. This issue can be fixed by removing the extra character.
Housekeeping
During the exercises in this session you will have imported several files into your station list. Navigate to the station list page and click the trash can icons to delete the stations. You may need to refresh the page to have the stations removed from the list after deleting.
You can also delete the file used in the final exercise as this will no longer be required.
Conclusion
Congratulations!
In this practical session, you learned:
- how to submit a FM-12 SYNOP report through the web-app;
- how to diagnose and correct simple errors in an FM-12 SYNOP report;
- the importance of registering stations in the wis2box (and OSCAR/Surface);
- and the use of the inspect button to view the content of BUFR data.
Converting CSV data to BUFR
Learning outcomes
By the end of this practical session, you will have:
- learnt to format CSV data for use with the default automatic weather station BUFR template
- learnt to use wis2box webapp to validate and convert sample data
- learnt about some of the common issues that can occur, such as incorrect units
Introduction
Comma-separated values (CSV) data files are often used for recording observational and other data in a tabular format. Most data loggers used to record sensor output are able to export the observations in delimited files, including in CSV. Similarly, when data are ingested into a database it is easy to export the required data in CSV formatted files. To aid the exchange of data originally stored in tabular data formats a CSV to BUFR converted has been implemented in the wis2box using the same software as for SYNOP to BUFR. In this session you will learn about using this tool through the wis2box web-application. Command line usage and customisation will be covered in a later session.
Preparation
Prerequisites
- Ensure that your wis2box has been configured and started, including the setting execution tokens
for the
processes/wis2box
andcollections/stations
paths. Confirm the status by visiting the wis2box API (http://<your-host-name>/oapi
) and verifying that the API is running. - The tokens can be checked by logging in to the wis2box management container and entering the
command:
wis2box auth has-access-path --path processes/wis2box <your-token>
where<your-token>
is the token you entered. -
If the tokens are missing they can be generated with the following commands:
wis2box auth add-token --path processes/wis2box <token> wis2box auth add-token --path collections/stations <token>
where token is the value of the token. This can be left blank to automatically generate a random token (recommended). - Make sure that you have MQTT Explorer open and connected to your broker.
Inspecting CSV data and BUFR conversion
Exercise 1 - the basics
The csv2bufr converter used in the wis2box can be configured to map between various tabulated input files, including CSV. However, to facilitate and make the use of this tool easier a standardised CSV format has been developed, targeted at data from AWS stations and in support of the GBON reporting requirements. The table below lists the parameters included in the format:
Column | Units | Data type | Description |
---|---|---|---|
wsi_series | nan | Integer | WIGOS identifier series |
wsi_issuer | nan | Integer | WIGOS issuer of identifier |
wsi_issue_number | nan | Integer | WIGOS issue number |
wsi_local | nan | Character | WIGOS local identifier |
wmo_block_number | nan | Integer | WMO block number |
wmo_station_number | nan | Integer | WMO station number |
station_type | nan | Integer | Type of observing station, encoding using code table 0 02 001 (set to 0, automatic) |
year | nan | Integer | Year (UTC), the time of observation (based on the actual time the barometer is read) |
month | nan | Integer | Month (UTC), the time of observation (based on the actual time the barometer is read) |
day | nan | Integer | Day (UTC), the time of observation (based on the actual time the barometer is read) |
hour | nan | Integer | Hour (UTC), the time of observation (based on the actual time the barometer is read) |
minute | nan | Integer | Minute (UTC), the time of observation (based on the actual time the barometer is read) |
latitude | degrees | Decimal | Latitude of the station (to 5 decimal places) |
longitude | degrees | Decimal | Longitude of the station (to 5 decimal places) |
station_height_above_msl | meters | Decimal | Height of the station ground above mean sea level (to 1 decimal place) |
barometer_height_above_msl | meters | Decimal | Height of the barometer above mean sea level (to 1 decimal place), typically height of station ground plus the height of the sensor above local ground |
station_pressure | Pa | Decimal | Pressure observed at the station level to the nearest 10 pascals |
msl_pressure | Pa | Decimal | Pressure reduced to mean sea level to the nearest 10 pascals |
geopotential_height | gpm | Integer | Geoptential height expressed in geopotential meters (gpm) to 0 decimal places |
thermometer_height | meters | Decimal | Height of thermometer or temperature sensor above the local ground to 2 decimal places |
air_temperature | Kelvin | Decimal | Instantaneous air temperature to 2 decimal places |
dewpoint_temperature | Kelvin | Decimal | Instantaneous dewpoint temperature to 2 decimal places |
relative_humidity | % | Integer | Instantaneous relative humidity to zero decimal places |
method_of_ground_state_measurement | code table | Integer | Method of observing the state of the ground, encoded using code table 0 02 176 |
ground_state | code table | Integer | State of the ground encoded using code table 0 20 062 |
method_of_snow_depth_measurement | code table | Integer | Method of observing the snow depth encoded using code table 0 02 177 |
snow_depth | meters | Decimal | Snow depth at time of observation to 2 decimal places |
precipitation_intensity | kg m-2 h-1 | Decimal | Intensity of precipitation at time of observation to 5 decimal places |
anemometer_height | meters | Decimal | Height of the anemometer above local ground to 2 decimal place |
time_period_of_wind | minutes | Integer | Time period over which the wind speed and direction have been averegd. 10 minutes in normal cases or the number of minutes since a significant change occuring in the preceeding 10 minutes. |
wind_direction | degrees | Integer | Wind direction (at anemometer height) averaged from the caterisan components over the indicated time period, 0 decimal places |
wind_speed | ms-1 | Decimal | Wind speed (at anemometer height) averaged from the cartesian components over the indicated time period, 1 decimal place |
maximum_wind_gust_direction_10_minutes | degrees | Integer | Highest 3 second average over the preceeding 10 minutes, 0 decimal places |
maximum_wind_gust_speed_10_minutes | ms-1 | Decimal | Highest 3 second average over the preceeding 10 minutes, 1 decimal place |
maximum_wind_gust_direction_1_hour | degrees | Integer | Highest 3 second average over the preceeding hour, 0 decimal places |
maximum_wind_gust_speed_1_hour | ms-1 | Decimal | Highest 3 second average over the preceeding hour, 1 decimal place |
maximum_wind_gust_direction_3_hours | degrees | Integer | Highest 3 second average over the preceeding 3 hours, 0 decimal places |
maximum_wind_gust_speed_3_hours | ms-1 | Decimal | Highest 3 second average over the preceeding 3 hours, 1 decimal place |
rain_sensor_height | meters | Decimal | Height of the rain gauge above local ground to 2 decimal place |
total_precipitation_1_hour | kg m-2 | Decimal | Total precipitation over the past hour, 1 decimal place |
total_precipitation_3_hours | kg m-2 | Decimal | Total precipitation over the past 3 hours, 1 decimal place |
total_precipitation_6_hours | kg m-2 | Decimal | Total precipitation over the past 6 hours, 1 decimal place |
total_precipitation_12_hours | kg m-2 | Decimal | Total precipitation over the past 12 hours, 1 decimal place |
total_precipitation_24_hours | kg m-2 | Decimal | Total precipitation over the past 24 hours, 1 decimal place |
The full file can be downloaded from: aws-full.csv
Now download the example for this exercise from the link below:
Inspect the expected columns from the table above and compare to the example data file.
Question
Examining the date, time and identify fields (WIGOS and traditional identifiers) what do you notice? How would today's date be represented?
Click to reveal answer
Each column contains a single piece of information. For example the date is split into year, month and day, mirroring how the data are stored in BUFR. Todays date would be split across the columns "year", "month" and "day". Similarly, the time needs to be split into "hour" and "minute" and the WIGOS station identifier into its respective components.
Question
Looking at the data file how are missing data encoded?
Click to reveal answer
Missing data within the file are represented by empty cells. In a CSV file this would be
encoded by ,,
. Note that this is an empty cell and not encoded as a zero length string,
e.g. ,"",
.
Missing data
It is recognised that data may be missing for a variety of reasons, whether due to sensor failure or the parameter not being observed. In these cases missing data can be encoded as per the above answer, the other data in the report remain valid.
Station metadata
The CSV to BUFR conversion tool has been developed as a standalone tool separate from the wis2box. As such, all the data to be encoded in BUFR must be present in the CSV file, including the station metadata like the location and sensor heights. In future versions the web-application will be updated to load the metadata from the wis2box API and to insert the values into the CSV, similar to the process used in FM-12 SYNOP to BUFR.
Even though all the metadata is contained within the CSV file the station metadata must still be registered in the wis2box for the message to be procesed.
Exercise 2 - converting your first message
Now that you have familiarised yourself with the input data file and specified CSV format you will convert this example file to BUFR using the wis2box web-application. First we need to register the station in the wis2box, in this case the station is a fictional station located on the island of Sicily in Italy. Navigate to the station list page, click add station and then search (leave the WIGOS Station Identifier box empty). Click create new station and enter the fictional station details, suggested details are below:
Field | Value |
---|---|
Station name | Monte Genuardo |
WIGOS station identifier | 0-20000-0-99100 |
Traditional identifier | 99100 |
Longitude | 13.12425 |
Latitude | 37.7 |
Station elevation | 552 |
Facility type | Land (fixed) |
Barometer height above sea level | 553 |
WMO Region | VI |
Territory or member operating the station | Italy |
Operating status | operational |
Select the appropriate topic, enter the collections/stations
token previously created and click save. You are now
ready to process data from this station. Navigate to CSV to BUFR submission page on the the wis2box web-application
(http://<your-host-name>/wis2box-webapp/csv2bufr_form
).
Click the entry box or drag and drop the test file you have downloaded to the entry box.
You should now be able to click next to preview and validate the file.
Clicking the next button loads the file into the browser and validates the contents against a predefined schema. No data has yet been converted or published. On the preview / validate tab you should be presented with a list of warnings about missing data but in this exercise these can be ignored.
The next button will take you to the topic selection page,
as with the FM-12 SYNOP to BUFR page select the topic configured for "surface weather observations" on your wis2box
and click next. You should now be on an authorisation page where you will be asked to enter the processes/wis2box
token you have previously created. Enter this token and click the "Publish on WIS2" toggle to ensure
"Publish to WIS2" is selected (see screenshot below).
Click next to transform to BUFR and publish, you should then see the following screen:
Clicking the down arrow tn the right of Output BUFR files
should reveal the Download
and Inspect
buttons.
Click inspect to view the data and confirm the values are as expected.
As a final step navigate to the monitoring page from the left menu.
Select the topic you have been publishing on from the dropdown menu and click update, you should see the message you have just published (and possibly notifications from the synop2bufr session). An example screenshot is shown below:
Success
Congratulations you have published you first csv data converted to BUFR via the wis2box.
You should also be able to see these notifications in MQTT Explorer.
Exercise 3 - debugging the input data
In this exercise we will examine what happens with invalid input data. Download the next example file by clicking the link below. This contains the same data as the first file but with the empty columns removed. Examine the file and confirm which columns have been removed and then follow the same process to convert the data to BUFR.
Question
With the columns missing from the file were you able to convert the data to BUFR? Did you notice any change to the warnings on the validation page?
Click to reveal answer
You should have still been able to convert the data to BUFR but the warning messages will have been updated to indicate that the columns were missing completely rather than containing a missing value.
In this next example an additional column has been added to the CSV file.
Question
Without uploading or submitting the file can you predict what will happen when you do?
Now upload and confirm whether your prediction was correct.
Click to reveal answer
When the file is validated you should now receive a warning that the column index
is not found in the schema and that the data will be skipped. You should be able to click
through and convert to BUFR as with the previous example.
In the final example in this exercise the data has been modified. Examine the contents of the CSV file.
Question
What has changed in the file and what do you think will happen?
Now upload the file and confirm whether you were correct.
Click to real answer
The pressure fields have been converted from Pa to hPa in the input data. However, the CSV to BUFR converter expects the same units as BUFR (Pa) and, as a result, these fields fail the validation due to being out of range. You should be able to edit the CSV to correct the issue and to resubmit the data by returning to the first screen and re-uploading.
Hint
The wis2box web-application can be used to test and validate sample data for the automated workflow. This will identify some common issues, such as the incorrect units (hPa vs Pa and C vs K) and missing columns. Care should be taken that the units in the CSV data match those indicated above.
Bug
Please note, due to a bug in the current version of the web-application you may need to reload the page before resubmitting the data.
Exercise 4 - potential web-application and API issues
Before the web-application can be used to submit data the topic hierarchy, on which to publish, the authorisation token and the stations need to be configured. This can be demonstrated with several simple exercises.
For the first example in this exercise download the example file via the link below and publish using the web-application.
Question
What result do you receive on the "Review page"?
Click to reveal answer
The second row in the file contains data from a station (again fictional) that has not been registered in the wis2box. As a result a warning is given that the station has not been found and that the data have not been published.
For the next example, try submitting the data but without selecting a topic hierarchy on which to publish.
Question
What happens when you try and click next?
Click to reveal answer
You should find the next button greyed out until a valid topic is selected. The options for the topic hierarchy are set based on the discovery metadata registered within the box
For the final example, try entering a token that has not been registered and observe what happens when you click next.
Question
What do you expect will happen if you enter an invalid token?
Click to reveal answer
When you click next you should be taken to the Review page but receive a message that the token is invalid. Navigate to the previous step, enter a valid token and try again.
Housekeeping
As with the FM-12 SYNOP to BUFR exercise you will have registered some new stations within the wis2box. Navigate to the station list and delete those stations as they are no longer required.
Conclusion
Congratulations
In this practical session you have learned:
- how to format a CSV file for use with wis2box web-application
- and how to validate a sample CSV file and to correct for common issues.
Next steps
The csv2bufr converter used in the wis2box has been designed to be configurable for use with any row based tabular data. The column names, delimiters, quotation style and limited quality control can all be configured according to user needs. In this page only the basics have been covered, with a standardised CSV template developed based on user feedback. Advanced configuration will be covered as part of the BUFR command line tools session.
BUFR command line tools
Learning outcomes
In this practical session you will be introduced to some of the BUFR command line tools included in the wis2box-management container. You will learn:
- how to inspect the headers in the BUFR file using the
bufr_ls
command - how to extract and inspect the data within a bufr file using
bufr_dump
- the basic structure of the bufr templats used in csv2bufr and how to use the command line tool
- and how to make basic changes to the bufr templates and how to update the wis2box to use the revised version
Introduction
The wis2box-management container contains some tools for working with BUFR files from the command line.
These include the tools developed by ECMWF and included in the ecCodes software, more information on these can be found on the ecCodes website.
Other tools include thosedeveloped as part of the wis2box development, including csv2bufr and synop2bufr that you have previously used but via the wis2box web-application.
In this session you will be introduced to the bufr_ls
and bufr_dump
from the ecCodes software package and advanced configuration of the csv2bufr tool.
Preparation
In order to use the BUFR command line tools you will need to be logged in to the wis2box management container and unless specified otherwise all commands should be run on this container. You will also need to have MQTT Explorer open and connected to your broker.
First connect to your student VM via your ssh client and then log in the to management container:
cd ~/wis2box-1.0b5
python3 wis2box-ctl.py login
Confirm the that tools are available, starting with ecCodes:
bufr_dump -V
ecCodes Version 2.28.0
Next check csv2bufr:
csv2bufr --version
You should get the following response:
csv2bufr, version 0.7.4
Finally, create a working directory to work in:
cd /data/wis2box
mkdir -p working/bufr-cli
cd working/bufr-cli
You are now ready to start using the BUFR tools.
Using the BUFR command line tools
Exercise 1 - bufr_ls
In this first exercise you will use the bufr_ls
command to inspect the headers of a BUFR file and to determine the
contents of the file. The following headers are included in a BUFR file:
header | ecCodes key | description |
---|---|---|
originating/generating centre | centre | The originating / generating centre for the data |
originating/generating sub-centre | bufrHeaderSubCentre | The originating / generating sub centre for the data |
Update sequence number | updateSequenceNumber | Whether this is the first version of the data (0) or an update (>0) |
Data category | dataCategory | The type of data contained in the BUFR message, e.g. suface data. See BUFR Table A |
International data sub-category | internationalDataSubCategory | The sub-type of data contained in the BUFR message, e.g. suface data. See Common Code Table C-13 |
Year | typicalYear (typicalDate) | Most typical time for the BUFR message contents |
Month | typicalMonth (typicalDate) | Most typical time for the BUFR message contents |
Day | typicalDay (typicalDate) | Most typical time for the BUFR message contents |
Hour | typicalHour (typicalTime) | Most typical time for the BUFR message contents |
Minute | typicalMinute (typicalTime) | Most typical time for the BUFR message contents |
BUFR descriptors | unexpandedDescriptors | List of one, or more, BUFR descriptors defining the data contained in the file |
Download the example file directly into the wis2box-management container using the following command:
curl https://training.wis2box.wis.wmo.int/sample-data/bufr-cli-ex1.bufr4 --output bufr-cli-ex1.bufr4
Now use the following command to run bufr_ls
on this file:
bufr_ls bufr-cli-ex1.bufr4
You should see the following output:
bufr-cli-ex1.bufr4
centre masterTablesVersionNumber localTablesVersionNumber typicalDate typicalTime numberOfSubsets
cnmc 29 0 20231002 000000 1
1 of 1 messages in bufr-cli-ex1.bufr4
1 of 1 total messages in 1 files
On its own this information is not very informative, with only limited information on the file contents provided.
The default output does not provide information on the observation, or data, type and is in a format that is not
very easy to read. However, various options can be passed to bufr_ls
to change both the format and header fields
printed.
Use bufr_ls
without any arguments to view the options:
bufr_ls
You should see the following output:
NAME bufr_ls
DESCRIPTION
List content of BUFR files printing values of some header keys.
Only scalar keys can be printed.
It does not fail when a key is not found.
USAGE
bufr_ls [options] bufr_file bufr_file ...
OPTIONS
-p key[:{s|d|i}],key[:{s|d|i}],...
Declaration of keys to print.
For each key a string (key:s), a double (key:d) or an integer (key:i)
type can be requested. Default type is string.
-F format
C style format for floating-point values.
-P key[:{s|d|i}],key[:{s|d|i}],...
As -p adding the declared keys to the default list.
-w key[:{s|d|i}]{=|!=}value,key[:{s|d|i}]{=|!=}value,...
Where clause.
Messages are processed only if they match all the key/value constraints.
A valid constraint is of type key=value or key!=value.
For each key a string (key:s), a double (key:d) or an integer (key:i)
type can be specified. Default type is string.
In the value you can also use the forward-slash character '/' to specify an OR condition (i.e. a logical disjunction)
Note: only one -w clause is allowed.
-j JSON output
-s key[:{s|d|i}]=value,key[:{s|d|i}]=value,...
Key/values to set.
For each key a string (key:s), a double (key:d) or an integer (key:i)
type can be defined. By default the native type is set.
-n namespace
All the keys belonging to the given namespace are printed.
-m Mars keys are printed.
-V Version.
-W width
Minimum width of each column in output. Default is 10.
-g Copy GTS header.
-7 Does not fail when the message has wrong length
SEE ALSO
Full documentation and examples at:
<https://confluence.ecmwf.int/display/ECC/bufr_ls>
Now run the same command on the example file but output the information in JSON.
Question
What flag do you pass to the bufr_ls
command to view the output in JSON format?
Click to reveal answer
You can change the output format to json using the -j
flag, i.e.
bufr_ls -j <input-file>
. This can be more readable than the default output format. See the example output below:
{ "messages" : [
{
"centre": "cnmc",
"masterTablesVersionNumber": 29,
"localTablesVersionNumber": 0,
"typicalDate": 20231002,
"typicalTime": "000000",
"numberOfSubsets": 1
}
]}
When examining a BUFR file we often want to determine the type of data contained in the file and the typical date / time
of the data in the file. This information can be listed using the -p
flag to select the headers to output. Multiple
headers can be included using a comma separated list.
Using the bufr_ls
command inspect the test file and identify the type of data contained in the file and the typical date and time for that data.
Hint
The ecCodes keys are given in the table above. We can use the following to list the dataCategory and internationalDataSubCategory of the BUFR data:
bufr_ls -p dataCategory,internationalDataSubCategory bufr-cli-ex1.bufr4
Additional keys can be added as required.
Question
What type of data (date category and sub category) are contained in the file? What is the typical date and time for the data?
Click to reveal answer
The command you need to run should have been similar to:
bufr_ls -p dataCategory,internationalDataSubCategory,typicalDate,typicalTime -j bufr-cli-ex1.bufr4
You may have additional keys, or listed the year, month, day etc individually. The output should be similar to below, depending on whether you selected JSON or default output.
{ "messages" : [
{
"dataCategory": 2,
"internationalDataSubCategory": 4,
"typicalDate": 20231002,
"typicalTime": "000000"
}
]}
From this we see that:
- The data category is 2, from BUFR Table A we can see that this file contains "Vertical soundings (other than satellite)" data.
- The international sub category is 4, indicating "Upper-level temperature/humidity/wind reports from fixed-land stations (TEMP)" data. This information can be looked up in Common Code Table C-13 (row 33). Note the combination of category and sub category.
- The typical date and time are 2023/10/02 and 00:00:00z respectively.
Exercise 2 - bufr_dump
The bufr_dump
command can be used to list and examine the contents of a BUFR file, including the data itself.
In this exercise we will use a BUFR file that is the same as your created during the initial csv2bufr practical session using the wis2box-webapp.
Download the sample-file to the wis2box management container directly with the following command:
curl https://training.wis2box.wis.wmo.int/sample-data/bufr-cli-ex2.bufr4 --output bufr-cli-ex2.bufr4
Now run the bufr_dump
command on the file, using the -p
flag to output the data in plain text (key=value format):
bufr_dump -p bufr-cli-ex2.bufr4
You should see around 240 keys output, many of which are missing. This is typical with real world data as not all the eccodes keys are populated with reported data.
Hint
The missing values can be filtered using tools such as grep
:
bufr_dump -p bufr-cli-ex2.bufr4 | grep -v MISSING
The example BUFR file for this exercise comes from the csv2bufr practical session. Please download the original CSV file into your current location as follows:
curl https://training.wis2box.wis.wmo.int/sample-data/csv2bufr-ex1.csv --output csv2bufr-ex1.csv
And display the content of the file with:
more csv2bufr-ex1.csv
Question
Use the following command to display column 18 in the CSV file and you will find the reported mean sea level pressure (msl_pressure):
more csv2bufr-ex1.csv | cut -d ',' -f 18
Which key in the BUFR output corresponds to the mean sea level pressure ?
Hint
Tools such as grep
can be used in combination with bufr_dump
. For example:
bufr_dump -p bufr-cli-ex2.bufr4 | grep -i "pressure"
would filter the contents of bufr_dump
to only those lines containing the word pressure. Alternatively,
the output could be filtered on a value.
Click to reveal answer
The key "pressureReducedToMeanSeaLevel" corresponds to the msl_pressure column in the input CSV file.
Spend a few minutes examining the rest of the output, comparing to the input CSV file before moving on to the next exercise. For example, you can try to find the keys in the BUFR output that correspond to relative humidity (column 23 in the CSV file) and air temperature (column 21 in the CSV file).
Exercise 3 - csv2bufr mapping files
The csv2bufr tool can be configured to process tabular data with different columns and BUFR sequences.
This is done by the way of a configuration file written in the JSON format.
Like BUFR data itself, the JSON file contains a header section and a data section, with these broadly corresponding to the same sections in BUFR.
Additionally, some formatting options are specified within the JSON file.
The JSON file for the default mapping can be view via the link below (right-click and open in new tab):
Examine the header
section of the mapping file (shown below) and compare to the table from exercise 1 (ecCodes key column):
"header":[
{"eccodes_key": "edition", "value": "const:4"},
{"eccodes_key": "masterTableNumber", "value": "const:0"},
{"eccodes_key": "bufrHeaderCentre", "value": "const:0"},
{"eccodes_key": "bufrHeaderSubCentre", "value": "const:0"},
{"eccodes_key": "updateSequenceNumber", "value": "const:0"},
{"eccodes_key": "dataCategory", "value": "const:0"},
{"eccodes_key": "internationalDataSubCategory", "value": "const:2"},
{"eccodes_key": "masterTablesVersionNumber", "value": "const:30"},
{"eccodes_key": "numberOfSubsets", "value": "const:1"},
{"eccodes_key": "observedData", "value": "const:1"},
{"eccodes_key": "compressedData", "value": "const:0"},
{"eccodes_key": "typicalYear", "value": "data:year"},
{"eccodes_key": "typicalMonth", "value": "data:month"},
{"eccodes_key": "typicalDay", "value": "data:day"},
{"eccodes_key": "typicalHour", "value": "data:hour"},
{"eccodes_key": "typicalMinute", "value": "data:minute"},
{"eccodes_key": "unexpandedDescriptors", "value":"array:301150, 307096"}
],
Here you can see the same headers as available in the output from the bufr_ls
command. In the mapping file the value
for these can be set to a column from the CSV input file, a constant value or an array of constant values.
Question
Look at the example header section above. How would you specify a constant value, a value from the input CSV data file and an array of constants?
Click to reveal answer
- Constant values can be set by setting the
value
property to"value": "const:<your-constant>"
. - Values can be set to a column from the input CSV by setting the
value
property to"value": "data:<your-column>"
. - Arrays can be set by setting the
value
property to"value": "array:<comma-seperated-values>"
.
Now examine the data section of the JSON mapping file (note, the output has been truncated):
"data": [
{"eccodes_key": "#1#wigosIdentifierSeries", "value":"data:wsi_series", "valid_min": "const:0", "valid_max": "const:0"},
{"eccodes_key": "#1#wigosIssuerOfIdentifier", "value":"data:wsi_issuer", "valid_min": "const:0", "valid_max": "const:65534"},
{"eccodes_key": "#1#wigosIssueNumber", "value":"data:wsi_issue_number", "valid_min": "const:0", "valid_max": "const:65534"},
{"eccodes_key": "#1#wigosLocalIdentifierCharacter", "value":"data:wsi_local"},
{"eccodes_key": "#1#latitude", "value": "data:latitude", "valid_min": "const:-90.0", "valid_max": "const:90.0"},
{"eccodes_key": "#1#longitude", "value": "data:longitude", "valid_min": "const:-180.0", "valid_max": "const:180.0"},
{"eccodes_key": "#1#heightOfStationGroundAboveMeanSeaLevel", "value":"data:station_height_above_msl", "valid_min": "const:-400", "valid_max": "const:9000"},
{"eccodes_key": "#1#heightOfBarometerAboveMeanSeaLevel", "value":"data:barometer_height_above_msl", "valid_min": "const:-400", "valid_max": "const:9000"},
{"eccodes_key": "#1#blockNumber", "value": "data:wmo_block_number", "valid_min": "const:0", "valid_max": "const:99"},
{"eccodes_key": "#1#stationNumber", "value": "data:wmo_station_number", "valid_min": "const:0", "valid_max": "const:999"},
{"eccodes_key": "#1#stationType", "value": "data:station_type", "valid_min": "const:0", "valid_max": "const:3"},
{"eccodes_key": "#1#year", "value": "data:year", "valid_min": "const:1600", "valid_max": "const:2200"},
{"eccodes_key": "#1#month", "value": "data:month", "valid_min": "const:1", "valid_max": "const:12"},
{"eccodes_key": "#1#day", "value": "data:day", "valid_min": "const:1", "valid_max": "const:31"},
{"eccodes_key": "#1#hour", "value": "data:hour", "valid_min": "const:0", "valid_max": "const:23"},
{"eccodes_key": "#1#minute", "value": "data:minute", "valid_min": "const:0", "valid_max": "const:59"},
{"eccodes_key": "#1#nonCoordinatePressure", "value": "data:station_pressure", "valid_min": "const:50000", "valid_max": "const:150000"},
{"eccodes_key": "#1#pressureReducedToMeanSeaLevel", "value": "data:msl_pressure", "valid_min": "const:50000", "valid_max": "const:150000"},
{"eccodes_key": "#1#nonCoordinateGeopotentialHeight", "value": "data:geopotential_height", "valid_min": "const:-1000", "valid_max": "const:130071"},
...
]
Refer back to exercise 2 in this practical session and the task identifying the ecCodes key corresponding to the mean sea level pressure.
Question
Can you identify the row in the data section that performs the mapping between the CSV input file and ecCodes key used to encode the mean sea level pressure data?
Click to reveal answer
You will have hopefully identified the line below:
{"eccodes_key": "#1#pressureReducedToMeanSeaLevel", "value": "data:msl_pressure", "valid_min": "const:50000", "valid_max": "const:150000"},
This instructs the csv2bufr tool to map the data from the msl_pressure
column to the #1#pressureReducedToMeanSeaLevel
element in BUFR.
Bonus question
Can you guess what the valid_min
and valid_max
properties do?
Click to reveal answer
These specify the valid minimum and maximum values for the different elements. Values falling outside of the range specified will result in a warning message and the data set to missing in the output BUFR file.
Download the csv2bufr mapping file to the wis2box management container:
curl https://raw.githubusercontent.com/wmo-im/csv2bufr/main/csv2bufr/templates/resources/aws-template.json --output aws-template.json
Now modify the file to change the limits for air temperature, for example set to realistic limits but in degrees Celsius:
{"eccodes_key": "#1#airTemperature", "value": "data:air_temperature", "valid_min": "const:-60", "valid_max": "const:60"},
Now use csv2bufr
to transform the example CSV data file to BUFR using the modified mapping file:
csv2bufr data transform --bufr-template aws-template.json csv2bufr-ex1.csv
You should see the following output:
CLI: ... Transforming ex1.csv to BUFR ...
CLI: ... Processing subsets:
#1#airTemperature: Value (301.25) out of valid range (-60 - 60).; Element set to missing
CLI: ..... 384 bytes written to ./WIGOS_0-20000-0-99100_20230929T090000.bufr4
CLI: End of processing, exiting.
This suggests a mismatch between the units in the input data and that expected by the mapping file.
Use the bufr_dump
command to confirm that the first air temperature value has been set to missing in the file that was produced:
bufr_dump -p WIGOS_0-20000-0-99100_20230929T090000.bufr4 | grep airTemperature
Note
Note that the units used in BUFR are fixed, Kelvin for temperature, Pascals for pressure etc. However, csv2bufr can perform simple unit conversions by scaling and adding an offset.
As the final part of this exercise, edit the the mapping file again and add a scale and offset to the airTemperature line.
{"eccodes_key": "#1#airTemperature", "value": "data:air_temperature", "valid_min": "const:-60", "valid_max": "const:60", "scale": "const:0", "offset":"const:273.15"},
And edit the air_temperature column in the 'csv2bufr-ex1.csv' to be in Celsius rather than Kelvin: 301.25 K = 28.1 degrees C.
Rerun the conversion:
csv2bufr data transform --bufr-template aws-template.json csv2bufr-ex1.csv
Run BUFR dump and confirm that the airTemperature is correctly encoded.
bufr_dump -p WIGOS_0-20000-0-99100_20230929T090000.bufr4 | grep airTemperature
You should see the following output:
#1#airTemperature=301.25
#2#airTemperature=MISSING
Exercise 4 - installing new csv2bufr templates
The bufr-template
used by csv2bufr can be specified by providing a filename to the command, as per the above example,
or by setting a search path with an environment variable and by specifying the template name. The template name must match
the filename but without the extension, with the file located on the search path.
Bug
Currently, the default mapping file used in the csv2bufr automated workflow sets the originating centre
header to indicate that the data have originating with the WMO Secretariat, see the JSON snippet
from the default bufr-template
below:
{"eccodes_key": "bufrHeaderCentre", "value": "const:0"},
In this final exercise you will download the template used, correct with the appropriate centre code and install in the wis2box management container. Suggested originating centre codes are listed below, these will need to be confirmed with your WIS focal points prior to data exchange.
Centre | Code |
---|---|
Melbourne (RSMC), Australia | 67 |
Brasilia (RSMC), Brazil | 43 |
Beijing (RSMC), China | 38 |
New Delhi (RSMC), India | 28 |
Indonesia (NMC) | 195 |
Malaysia (NMC) | 73 |
Federated States of Micronesia | 197 |
Wellington (RSMC), New Zealand | 69 |
Oman (NMC) | 127 |
Philippines (NMC) | 201 |
Seoul, Republic of Korea | 40 |
Dili (NMC), Timor-Leste | TBC |
For this exercise, first create a directory in which to store the mappings:
cd /data/wis2box
mkdir bufr-templates
Navigate to the directory and download the default bufr-template:
cd bufr-templates
curl https://raw.githubusercontent.com/wmo-im/csv2bufr/main/csv2bufr/templates/resources/aws-template.json \
--output aws-template-custom.json
Make sure the file 'aws-template-custom.json' is in the directory '/data/wis2box/bufr-templates':
ls /data/wis2box/bufr-templates
Edit the file using vi
or your favourite text editor and update the value of the bufrHeaderCentre
.
Now try the commands from the block below:
curl https://training.wis2box.wis.wmo.int/sample-data/csv2bufr-ex1.csv --output csv2bufr-ex1.csv
export CSV2BUFR_TEMPLATES=/data/wis2box/bufr-templates
csv2bufr data transform --bufr-template aws-template-custom csv2bufr-ex1.csv
Inspect the output file using bufr_ls
and confirm that the originating centre in the headers is updated.
Note
You can use the custom mappings in your automated workflow by updating your data-mappings.yml and updating the environment variable for CSV2BUFR_TEMPLATES.
Edit your data-mappings.yml
file to use this new file by updating the template name:
csv:
- plugin: wis2box.data.csv2bufr.ObservationDataCSV2BUFR
template: aws-template-custom
notify: true
file-pattern: '^.*\.csv$'
Add the environment variable to your wis2box.env
file:
cd ~/wis2box-1.0b5/
echo "export CSV2BUFR_TEMPLATES=/data/wis2box/bufr-templates" >> wis2box.env
And restart the wis2box stack:
python3 wis2box-ctl.py stop
python3 wis2box-ctl.py start
The new template should now be used in the automated workflow.
Housekeeping
Clean up your working directory by removing files you no longer need and clean up your station list to remove any example stations you may have created and that are no longer needed.
Conclusion
Congratulations
In this practical session you have learned:
- how to use some of the BUFR command line tools available in the wis2box management container;
- how to make basic changes to the mapping file used in the csv2bufr tool;
- and how to install new csv2bufr mappping templates in wis2box.
Next steps
In this session we have covered only a small part of the csv2bufr and ecCodes BUFR tools functionality.
For example, the csv2bufr tool includes the ability to create new 'bufr-template' files provided different BUFR
sequences. As an example, the template for BUFR sequence (301150, 307096), used by the default
aws-template
, can be generated with the command:
csv2bufr mappings create 301150 307096 > bufr-template.json
This file can be modified to add and map additional variables from the CSV data file, use different quality control values (valid_min and valid_max) and perform some basic (linear) unit conversions. Further information is available via the csv2bufr documentation pages:
https://csv2bufr.readthedocs.io/en/latest/.
Further information on the ecCodes BUFR tools can be found at:
Data ingest and monitoring
Learning outcomes
By the end of this practical session, you will be able to:
- trigger different wis2box data pipelines using different file types
- monitor the status of your data ingest and publishing
Introduction
The wis2box-management container listens to events from the MinIO storage service to trigger data ingestion based on the configuration of data-mappings.yml
. This allows you to upload data into MinIO and have wis2box automatically ingest and publish data in real-time.
For the purpose of next few exercises we will use the MinIO admin interface to upload data into MinIO.
The same steps can be done programmatically by using any MinIO or S3 client software, allowing you to automate your data ingestion as part of your operational workflows.
Preparation
Login to you student VM using your SSH client (PuTTY or other).
Make sure wis2box is up and running:
cd ~/wis2box-1.0b5/
python3 wis2box-ctl.py start
python3 wis2box-ctl.py status
Make sure your have MQTT Explorer running and connected to your instance. If you are still connected from the previous session, clear any previous messages you may have received from the queue. This can be done by either by disconnecting and reconnecting or by clicking the trash can for the topic.
Make sure you have a web browser open with the Grafana dashboard for your instance by going to http://<your-host>:3000
And make sure you have a second tab open with the MinIO user interface at http://<your-host>:9001
. Remember you need to login with the WIS2BOX_STORAGE_USER
and WIS2BOX_STORAGE_PASSWORD
defined in your wis2box.env
file:
Ingesting CSV data
First we will use a test sample in the CSV format using the AWS template. The data-mappings.yml
on your wis2box are configured to use the wis2box.data.csv2bufr.ObservationDataCSV2BUFR
plugin for files with the extension .csv
:
csv:
- plugin: wis2box.data.csv2bufr.ObservationDataCSV2BUFR
template: aws-template
notify: true
buckets:
- ${WIS2BOX_STORAGE_INCOMING}
file-pattern: '^.*\.csv$'
Download the following sample data files to your local machine:
Now go back to MinIO in your web browser and navigate to the wis2box-incoming
bucket and click 'Create new path' to create the following directory:
xyz/test/data/core/weather/surface-based-observations/synop
Upload the file aws-example.csv
to the directory you just created:
Exercise 1: check for errors
Do you see any errors reported on the Grafana dashboard?
Click to reveal answer
The 'wis2box ERRORs' displayed at the bottom of the Grafana home dashboard should report the following error:
ERROR - handle() error: Topic Hierarchy validation error: No plugins for http://minio:9000/wis2box-incoming/xyz/test/data/core/weather/surface-based-observations/synop/aws-example.csv in data mappings. Did not match any of the following: ...
Note, the data
definition in data-mappings.yml
uses .
instead of /
to separate the path elements.
However the path using .
in the data-mappings.yml
and /
in MinIO are equivalent.
If there are no data mappings defined for the directory that received the data, wis2box will not initiate any workflow.
Exercise 2: correct your input path and repeat the data ingest
Go back to MinIO to the root of the wis2box-incoming
bucket. Then click 'Create new path' and define the correct path for wis2box. For example if your country code is idn
and your centre-id is bmkg
, you should create the following path:
idn/bmkg/data/core/weather/surface-based-observations/synop
Now upload the sample data file aws-example.csv
to the new path. Do you see any errors reported on the Grafana dashboard?
Click to reveal answer
The Grafana dashboard should report the following errors:
- ... {/app/wis2box/data/csv2bufr.py:98} ERROR - Station 0-20000-0-60360 not in station list; skipping
- ... {/app/wis2box/data/csv2bufr.py:98} ERROR - Station 0-20000-0-60355 not in station list; skipping
- ... {/app/wis2box/data/csv2bufr.py:98} ERROR - Station 0-20000-0-60351 not in station list; skipping
As the stations in the test data are not defined in your wis2box metadata, the data ingest workflow will not be triggered.
If instead you again see the error Topic Hierarchy validation error: No plugins for ... in data mappings
, check that you have defined the correct path in MinIO and repeat the data ingest.
Exercise 3: add the test stations and repeat the data ingest
Add the following stations to your wis2box using the station editor in wis2box-webapp:
- 0-20000-0-60351
- 0-20000-0-60355
- 0-20000-0-60360
Now re-upload the sample data file aws-example.csv
to the same path in MinIO you used in the previous exercise.
Check the Grafana dashboard, are there any new errors ? How can you see that the test data was successfully ingested and published?
Click to reveal answer
If you were subscribed with MQTT Explorer to your wis2box-broker, you should have received notifications for the test data when the data was successfully published.
You can also check the charts on the Grafana home dashboard to see if the test data was successfully ingested and published.
The chart "Number of WIS2.0 notifications published by wis2box" indicates that notifications were successfully published on the MQTT broker in wis2box.
Ingesting binary data
wis2box can ingest binary data in BUFR format using the wis2box.data.bufr4.ObservationDataBUFR
plugin included in wis2box.
You can verify that the plugin is configured in your wis2box by checking the contents of data-mappings.yml
from the SSH command line:
cat ~/wis2box-data/data-mappings.yml
And you should see it contains an entry that specifies that files with the extension .bin
should be processed by the wis2box.data.bufr4.ObservationDataBUFR
plugin:
bin:
- plugin: wis2box.data.bufr4.ObservationDataBUFR
notify: true
buckets:
- ${WIS2BOX_STORAGE_INCOMING}
file-pattern: '^.*\.bin$'
This plugin will split the BUFR file into individual BUFR messages and publish each message to the MQTT broker. If the station for the corresponding BUFR message is not defined in the wis2box station metadata, the message will not be published.
Please download the following sample data file to your local machine:
Exercise 4: ingest binary data in BUFR format
Upload the sample data file 'bufr-example.bin' to the same path in MinIO you used in the previous exercise:
Check the Grafana dashboard and MQTT Explorer to see if the test-data was successfully ingested and published.
How many messages were published to the MQTT broker for this data sample?
Click to reveal answer
If you successfully ingested and published the last data sample, you should have received 10 new notifications on the wis2box MQTT broker. Each notification correspond to data for one station for one observation timestamp.
Ingesting SYNOP data in ASCII format
In the previous session we used the SYNOP form in the wis2box-webapp to ingest SYNOP data in ASCII format. You can also ingest SYNOP data in ASCII format by uploading the data into MinIO. The data-mappings.yml
on your wis2box are configured to use the plugin wis2box.data.synop2bufr.ObservationDataSYNOP2BUFR
for files with the extension .txt
:
txt:
- plugin: wis2box.data.synop2bufr.ObservationDataSYNOP2BUFR
notify: true
buckets:
- ${WIS2BOX_STORAGE_INCOMING}
file-pattern: '^.*-(\d{4})(\d{2}).*\.txt$'
Download the following two sample data files to your local machine:
(click 'save as' in your browser to download the files)
Note that the 2 files contain the same content, but the filename is different. The filename is used to determine the date of the data sample.
The file pattern in data-mappings.yml
specifies that the regular expression ^.*-(\d{4})(\d{2}).*\.txt$
that is used to extract the date from the filename. The first group in the regular expression is used to extract the year and the second group is used to extract the month.
Exercise 5: ingest SYNOP data in ASCII format
Go back to the MinIO interface in your browse and navigate to the wis2box-incoming
bucket and into the path where you uploaded the test data in the previous exercise.
Upload the new files in the correct path in the wis2box-incoming
bucket in MinIO to trigger the data ingest workflow.
Check the Grafana dashboard and MQTT Explorer to see if the test data was successfully ingested and published.
What is the difference in the properties.datetime
between the two messages published to the MQTT broker?
Click to reveal answer
Check the properties of the last 2 notifications in MQTT Explorer and you will note that one notification has:
"properties": {
"data_id": "wis2/rou/test/data/core/weather/surface-based-observations/synop/WIGOS_0-20000-0-60355_20230703T090000",
"datetime": "2023-07-03T09:00:00Z",
...
and the other notification has:
"properties": {
"data_id": "wis2/rou/test/data/core/weather/surface-based-observations/synop/WIGOS_0-20000-0-60355_20230803T090000",
"datetime": "2023-08-03T09:00:00Z",
...
The filename was used to determine the year and month of the data sample.
MinIO Python client (optional)
In this exercise we will use the MinIO Python client to copy data into MinIO.
MinIO provides a Python client which can be installed as follows:
pip3 install minio
On your student VM the 'minio' package for Python will already be installed.
Go to the directory exercise-materials/data-ingest
and run the example script using the following command:
cd ~/exercise-materials/data-ingest
python3 copy_data_to_incoming.py
Note
You will get an error as the script is not configured to access the MinIO endpoint on your wis2box yet.
The script needs to know the correct endpoint for accessing MinIO on your wis2box. If wis2box is running on your host, the MinIO endpoint is available at http://<your-host>:9000
.
The sample script provides the basic structure for copying a file into MinIO.
ingest data using Python
Use the Python example provided to create to ingest data into your wis2box.
Ensure that you:
- define the correct MinIO endpoint for your host
- define the correct path in MinIO for the topics defined in your
data-mappings.yml
- provide the correct storage credentials for your MinIO instance
You can verify that the data was uploaded correctly by checking the MinIO user interface and seeing if the sample data is available in the correct directory in the wis2box-incoming
bucket.
You can use the Grafana dashboard to check the status of the data ingest workflow.
Finally you can use MQTT Explorer to check if notifications were published for the data you ingested.
Cleaning up
You can now delete the following stations you have created in your wis2box using the station-editor in wis2box-webapp:
- 0-20000-0-60351
- 0-20000-0-60355
- 0-20000-0-60360
Conclusion
Congratulations!
In this practical session, you learned how to:
- trigger wis2box workflow using different data ingest methods
- monitor the status of your data ingest and publishing
Downloading and finding data from WIS2
Learning outcomes!
By the end of this practical session, you will be able to:
- use pywis-pubsub to subscribe to a Global Broker and download data to your local system
- use pywiscat to discover datasets from the Global Discovery Catalogue
Introduction
In this session you will learn how to discover data from the WIS2 Global Discovery Catalogue (GDC) and download data from a WIS2 Global Broker (GB).
Preparation
Note
Before starting please login to your student VM.
Downloading data with pywis-pubsub
The first practical session used MQTT Explorer to connect to the Météo-France Global Broker.
Let's use the pywis-pubsub to subscribe using a command line tool.
pywis-pubsub subscribe --help
Note
pywis-pubsub is pre-installed on the local training environment, but can be installed from anywhere with pip3 install pywis-pubsub
Update the sample configuration (see the sections marked TBD) to connect to the Météo-France Global Broker:
vi ~/exercise-materials/pywis-pubsub-exercises/config.yml
Update the following values in the configuration:
- broker:
mqtts://everyone:everyone@globalbroker.meteo.fr:8883
- subscribe_topics: fill in one to many topics
cache/#
(on separate lines) - storage.option.path: add a directory to where data should be downloaded
Run the pywis-pubsub
command:
pywis-pubsub subscribe -c ~/exercise-materials/pywis-pubsub-exercises/config.yml --verbosity DEBUG
At this point you should see a number of ongoing messages on your screen. Hit Crtl-C
to stop messages from arriving/download to help analyze the content.
Note
The above command will download data to your local system for demo purposes. For operational environments you will need to consider and manage diskspace requirements as part of your workflow.
Question
What is the format of the data notifications that are displayed on the screen?
Click to reveal answer
The format is GeoJSON
Question
Is there data being downloaded? How can we run the pywis-pubsub
command to be able to download the data (hint: review the options when running the pywis-pubsub subscribe --help
command)?
Click to reveal answer
Add the -d
or --download
flag to the pywis-pubsub
command
Stop the pywis-pubsub
command (CTRL-C) and update the configuration to be able to download the data
to /tmp/wis2-data
.
Try spatial filtering with a bounding box:
pywis-pubsub subscribe -c ~/exercise-materials/pywis-pubsub-exercises/config.yml --verbosity INFO -d -b -142,42,-52,84
Note
Try using your own bounding box (format is west,south,east,north
, in decimal degrees).
Finding data with pywiscat
Let's use pywiscat to query the GDC
At the moment, the available GDCs are as follows:
- Environment and Climate Change Canada, Meteorological Service of Canada: https://api.weather.gc.ca/collections/wis2-discovery-metadata
- China Meteorological Administration: https://api.weather.gc.ca/api/collections/wis2-discovery-metadata
pywiscat search --help
Note
pywiscat is pre-installed on the local training environment, but can be installed from anywhere with pip3 install pywiscat
Note
by default, pywiscat connects to Canada's Global Discovery Catalogue. To set to a different catalogue, set the PYWISCAT_GDC_URL
environment variable
pywiscat search
Question
How many records are returned from the search?
Click to reveal answer
There should be 40 records returned
Querying WIS2 GDC 🗃️ ...
Results: 40 records
...
Let's try querying the GDC with a keyword:
pywiscat search -q observations
Question
What is the data policy of the results?
Click to reveal answer
All data returned should specify "core" data
+---------------------------------------------------------------------------+------------------------------+-------------------------------------------------------+-------------+
| id | centre | title | data policy |
+---------------------------------------------------------------------------+------------------------------+-------------------------------------------------------+-------------+
| urn:x-wmo:md:dma:dominica_met_wis2node:surface-weather-observations | dominica_met_wis2node | Surface weather observations from Dominica Meteoro... | core |
| urn:x-wmo:md:gin:conakry_met_centre:surface-weather-observations | conakry_met_centre | Surface weather observations from gin.conakry_met_... | core |
| urn:x-wmo:md:atg:antigua_met_wis2node:surface-weather-observations | antigua_met_wis2node | Surface weather observations from Antigua and Barb... | core |
| urn:x-wmo:md:bfa:ouagadougou_met_centre:surface-weather-observations | ouagadougou_met_centre | Surface weather observations from bfa.ouagadougou_... | core |
...
Try additional queries with -q
Tip
The -q
flag allows for the following syntax:
-q synop
: find all records with the word "synop"-q temp
: find all records with the word "temp"-q "observations AND malawi"
: find all records with the words "observations" and "malawi"-q "observations NOT malawi"
: find all records that contain the word "observations" but not the word "malawi"-q "synop OR temp"
: find all records with both "synop" or "temp"-q "obs~"
: fuzzy search
When searching for terms with spaces, enclose in double quotes.
Let's get more details on a specific search result that we are interested in:
pywiscat get <id>
Tip
Use the id
value from the previous search.
Conclusion
Congratulations!
In this practical session, you learned how to:
- use pywis-pubsub to subscribe to a Global Broker and download data to your local system
- use pywiscat to discover datasets from the Global Discovery Catalogue
Querying data using the wis2box API
Learning outcomes
By the end of this practical session, you will be able to:
- use the wis2box API to query and filter your stations
- use the wis2box API to query and filter your data
Introduction
The wis2box API provides discovery and query access in a machine readable manager, which includes the wis2box UI.
In this practical session you will learn how to use the data API to discovery, browse and query data that has been ingested in wis2box.
Preparation
Note
Navigate to the wis2box API landing page in your web browser:
http://<your-host>/oapi
Inspecting collections
From the landing page, click on the 'Collections' link.
Question
How many dataset collections do you see on the resulting page? What do you think each collection represents?
Click to reveal answer
There should be 4 collections displayed, including "Stations", "Discovery metadata", and "Data notifications"
Inspecting stations
From the landing page, click on the 'Collections' link, then click on the 'Stations' link.
Click on the 'Browse' link, then click on the 'json' link.
Question
How many stations are returned? Compare this number to the station list in http://<your-host>/wis2box-webapp/station
Click to reveal answer
The number of stations from the API should be equal to the number of stations you see in the wis2box webapp.
Question
How can we query for a single station (e.g. Balaka
)?
Click to reveal answer
Query the API with http://<your-host>/oapi/collections/stations/items?q=Balaka
.
Note
The above example is based on the Malawi test data. Try testing against the stations your have ingested as part of the previous exercises.
Inspecting observations
Note
The above example is based on the Malawi test data. Try testing against the observation your have ingested as part of the exercises.
From the landing page, click on the 'Collections' link, then click on the 'Surface weather observations from Malawi' link.
Click on the 'Queryables' link.
Question
Which queryable would be used to filter by station identifier?
Click to reveal answer
The wigos_station_identifer
is the correct queryable.
Navigate to the previous page (i.e. http://<your-host>/oapi/collections/urn:x-wmo:md:mwi:mwi_met_centre:surface-weather-observations
)
Click on the 'Browse' link.
Question
How can we visualize the JSON response?
Click to reveal answer
By clicking on the 'JSON' link at the top right of the page, of by adding f=json
to the API request on the web browser.
Inspect the JSON response of the observations.
Question
How many records are returned?
Question
How can we limit the response to 3 observations?
Click to reveal answer
Add limit=3
to the API request.
Question
How can we sort the response by the latest observations?
Click to reveal answer
Add sortby=-resultTime
to the API request (notice the -
sign to denote descending sort order). For sorting by the earliest observations, update the request to include sortby=resultTime
.
Question
How can we filter the observations by a single station?
Click to reveal answer
Add wigos_station_identifier=<WSI>
to the API request.
Question
How can we receive the observations as a CSV?
Click to reveal answer
Add f=csv
to the API request.
Question
How can we show a single observation (id)?
Click to reveal answer
Using the feature identifier from an API request against the observations, query the API for http://<your-host>/oapi/collections/{collectionId}/items/{featureId}
, where {collectionId}
is the name of your observations collection and {itemId}
is the identifier of the single observation of interest.
Conclusion
Congratulations!
In this practical session, you learned how to:
- use the wis2box API to query and filter your stations
- use the wis2box API to query and filter your data
Ended: Practical sessions
Cheatsheets ↵
Linux cheatsheet
Overview
The basic concepts of working in a Linux operating system are files and directories (folders) organized in a tree structure within an environment.
Once you login to a Linux system, you are working in a shell in which you can work on files and directories, by executing commands which are installed on the system. The Bash shell is a common and popular shell which is typically found on Linux systems.
Bash
Directory Navigation
- Entering an absolute directory:
cd /dir1/dir2
- Entering a relative directory:
cd ./somedir
- Move one directory up:
cd ..
- Move two directories up:
cd ../..
- Move to your "home" directory:
cd -
File Management
- Listing files in the current directory:
ls
- Listing files in the current directory with more detail:
ls -l
- List the root of the filessystem:
ls -l /
- Create an empty file:
touch foo.txt
- Create a file from an
echo
command:
echo "hi there" > test-file.txt
- Copy a file:
cp file1 file2
- Wildcards: operate on file patterns:
ls -l fil* # matches file1 and file2
- Concatenate two files into a new file called
newfile
:
cat file1 file2 > newfile
- Append another file into
newfile
cat file3 >> newfile
- Delete a file:
rm newfile
- Delete all files with the same file extension:
rm *.dat
- Create a directory
mkdir dir1
Chaining commands together with pipes
Pipes allow a user to send the output of one command to another using the pipe |
symbol:
echo "hi" | sed 's/hi/bye/'
- Filtering command outputs using grep:
echo "id,title" > test-file.txt
echo "1,birds" >> test-file.txt
echo "2,fish" >> test-file.txt
echo "3,cats" >> test-file.txt
cat test-file.txt | grep fish
- Ignoring case:
grep -i FISH test-file.txt
- Count matching lines:
grep -c fish test-file.txt
- Return outputs not containing keyword:
grep -v birds test-file.txt
- Count the number of lines in
test-file.txt
:
wc -l test-file.txt
- Display output one screen at a time:
more test-file.txt
...with controls:
- Scroll down line by line: enter
- Go to next page: space bar
-
Go back one page: b
-
Display the first 3 lines of the file:
head -3 test-file.txt
- Display the last 2 lines of the file:
tail -2 test-file.txt
Docker cheatsheet
Overview
Docker allows for creating virtual envronments in an isolated manner in support of virtualization of computing resources. The basic concept behind Docker is containerization, where software can run as services, interacting with other software containers, for example.
The typical Docker workflow involves creating and building images, which are then run as live containers.
Image management
- List available images
docker images
- Build an image from a Dockerfile:
cat << EOF > Dockerfile
FROM ubuntu:latest
RUN apt-get update
RUN apt-get install –y nginx
CMD ["echo", "Hello from my first Docker setup!"]
EOF
- Building the image:
docker build -t my-image:local .
- Removing an image:
docker rmi my-image:local
Volume Management
- List all created volumes:
docker volume ls
- Create a volume:
docker volume create my-volume
- Display detailed information on a volume:
docker volume inspect my-volume
- Remove a volume:
docker volume rm my-volume
- Remove all unused volumes:
docker volume prune
Container Management
- Create a container from an image, with an interactive terminal (
-it
) and a mounted volume (v
):
docker run -it -v ${pwd}:/app my-image:local
- Display a list of currently running containers:
docker ps
- List of all containers:
docker ps -a
- Start a container:
docker start my-image:local # starts a new container
- Enter the interactive terminal of a running container:
Tip
use docker ps
to use the container id in the command below
docker exec -it my-container /bin/bash
- Remove a container
docker rm my-container
- Remove a running container:
docker rm -f my-container
YAML cheatsheet
Overview
Many wis2box configuration files are managed in YAML (YAML Ain't Markup Language) format. YAML allows for flexible configurations using indentation to do grouping and nesting.
Indentation
YAML is characterized by indentatation. It is important to ensure consistet indentatation in a YAML fileiont.
It is recommended to use 4 spaces for a default indentation level.
It is also strongly recommended to NOT use tabs for indentation.
A simple YAML file:
my-configuration: # <- this is a section
# this is a comment
# you can add comments in this manner as you wish
section: # <- this is nested section
subsection: # <- this is another nested section
value1: 123 # an integer
value2: 1.23 # a float
value3: '123' # a number forced into a string
value4: my value # a string (does not need quoting)
value4: [1, 2, 3, 4] # an array/list
# lists can contain any data type, including more lists or sections
my-list: # another list, same result as value4 above
- item 1
- item 2
- item 3
WIS2 in a box cheatsheet
Overview
wis2box runs as a suite of Docker Compose commands. The wis2box-ctl.py
command is a utility
(written in Python) to run Docker Compose commands easily.
wis2box command essentials
Building
- Build all of wis2box:
python3 wis2box-ctl.py build
- Build a specific wis2box Docker image:
python3 wis2box-ctl.py build wis2box-management
- Update wis2box:
python3 wis2box-ctl.py update
Starting and stopping
- Start wis2box:
python3 wis2box-ctl.py start
- Stop wis2box:
python3 wis2box-ctl.py stop
- Verify all wis2box containers are running:
python3 wis2box-ctl.py status
- Login to a wis2box container (wis2box-management by default):
python3 wis2box-ctl.py login
- Login to a specific wis2box container:
python3 wis2box-ctl.py login wis2box-api
Design time commands (metadata management and publishing)
Note
You must be logged into the wis2box-management container to run the below commands
- Publish discovery metadata:
wis2box metadata discovery publish /path/to/discovery-metadata-file.yml
- Publish station metadata:
wis2box metadata station publish-collection
- Add a dataset of observation collections from discovery metadata:
wis2box data add-collection /path/to/discovery-metadata-file.yml
- Ingest data into the wis2box-incoming bucket to trigger processing and publishing:
wis2box data ingest --topic-hierarchy topic.hierarchy.path --path /path/to/directory/of/data/files