BUFR command line tools

Learning outcomes

In this practical session you will be introduced to some of the BUFR command line tools included in the wis2box-management container. You will learn:

how to inspect the headers in the BUFR file using the bufr_ls command
how to extract and inspect the data within a bufr file using bufr_dump
the basic structure of the bufr templats used in csv2bufr and how to use the command line tool
and how to make basic changes to the bufr templates and how to update the wis2box to use the revised version

Introduction

The wis2box-management container contains some tools for working with BUFR files from the command line.

These include the tools developed by ECMWF and included in the ecCodes software, more information on these can be found on the ecCodes website.

Other tools include thosedeveloped as part of the wis2box development, including csv2bufr and synop2bufr that you have previously used but via the wis2box web-application.

In this session you will be introduced to the bufr_ls and bufr_dump from the ecCodes software package and advanced configuration of the csv2bufr tool.

Preparation

In order to use the BUFR command line tools you will need to be logged in to the wis2box management container and unless specified otherwise all commands should be run on this container. You will also need to have MQTT Explorer open and connected to your broker.

First connect to your student VM via your ssh client and then log in the to management container:

cd ~/wis2box-1.0b5
python3 wis2box-ctl.py login

Confirm the that tools are available, starting with ecCodes:

bufr_dump -V

You should get the following response:

ecCodes Version 2.28.0

Next check csv2bufr:

csv2bufr --version

You should get the following response:

csv2bufr, version 0.7.4

Finally, create a working directory to work in:

cd /data/wis2box
mkdir -p working/bufr-cli
cd working/bufr-cli

You are now ready to start using the BUFR tools.

Using the BUFR command line tools

Exercise 1 - bufr_ls

In this first exercise you will use the bufr_ls command to inspect the headers of a BUFR file and to determine the contents of the file. The following headers are included in a BUFR file:

header	ecCodes key	description
originating/generating centre	centre	The originating / generating centre for the data
originating/generating sub-centre	bufrHeaderSubCentre	The originating / generating sub centre for the data
Update sequence number	updateSequenceNumber	Whether this is the first version of the data (0) or an update (>0)
Data category	dataCategory	The type of data contained in the BUFR message, e.g. suface data. See BUFR Table A
International data sub-category	internationalDataSubCategory	The sub-type of data contained in the BUFR message, e.g. suface data. See Common Code Table C-13
Year	typicalYear (typicalDate)	Most typical time for the BUFR message contents
Month	typicalMonth (typicalDate)	Most typical time for the BUFR message contents
Day	typicalDay (typicalDate)	Most typical time for the BUFR message contents
Hour	typicalHour (typicalTime)	Most typical time for the BUFR message contents
Minute	typicalMinute (typicalTime)	Most typical time for the BUFR message contents
BUFR descriptors	unexpandedDescriptors	List of one, or more, BUFR descriptors defining the data contained in the file

Download the example file directly into the wis2box-management container using the following command:

curl https://training.wis2box.wis.wmo.int/sample-data/bufr-cli-ex1.bufr4 --output bufr-cli-ex1.bufr4

Now use the following command to run bufr_ls on this file:

bufr_ls bufr-cli-ex1.bufr4

You should see the following output:

bufr-cli-ex1.bufr4
centre                     masterTablesVersionNumber  localTablesVersionNumber   typicalDate                typicalTime                numberOfSubsets
cnmc                       29                         0                          20231002                   000000                     1
1 of 1 messages in bufr-cli-ex1.bufr4

1 of 1 total messages in 1 files

On its own this information is not very informative, with only limited information on the file contents provided.

The default output does not provide information on the observation, or data, type and is in a format that is not very easy to read. However, various options can be passed to bufr_ls to change both the format and header fields printed.

Use bufr_ls without any arguments to view the options:

bufr_ls

You should see the following output:

NAME    bufr_ls

DESCRIPTION
        List content of BUFR files printing values of some header keys.
        Only scalar keys can be printed.
        It does not fail when a key is not found.

USAGE
        bufr_ls [options] bufr_file bufr_file ...

OPTIONS
        -p key[:{s|d|i}],key[:{s|d|i}],...
                Declaration of keys to print.
                For each key a string (key:s), a double (key:d) or an integer (key:i)
                type can be requested. Default type is string.
        -F format
                C style format for floating-point values.
        -P key[:{s|d|i}],key[:{s|d|i}],...
                As -p adding the declared keys to the default list.
        -w key[:{s|d|i}]{=|!=}value,key[:{s|d|i}]{=|!=}value,...
                Where clause.
                Messages are processed only if they match all the key/value constraints.
                A valid constraint is of type key=value or key!=value.
                For each key a string (key:s), a double (key:d) or an integer (key:i)
                type can be specified. Default type is string.
                In the value you can also use the forward-slash character '/' to specify an OR condition (i.e. a logical disjunction)
                Note: only one -w clause is allowed.
        -j      JSON output
        -s key[:{s|d|i}]=value,key[:{s|d|i}]=value,...
                Key/values to set.
                For each key a string (key:s), a double (key:d) or an integer (key:i)
                type can be defined. By default the native type is set.
        -n namespace
                All the keys belonging to the given namespace are printed.
        -m      Mars keys are printed.
        -V      Version.
        -W width
                Minimum width of each column in output. Default is 10.
        -g      Copy GTS header.
        -7      Does not fail when the message has wrong length

SEE ALSO
        Full documentation and examples at:
        <https://confluence.ecmwf.int/display/ECC/bufr_ls>

Now run the same command on the example file but output the information in JSON.

Question

What flag do you pass to the bufr_ls command to view the output in JSON format?

Click to reveal answer

You can change the output format to json using the -j flag, i.e. bufr_ls -j <input-file>. This can be more readable than the default output format. See the example output below:

{ "messages" : [
  {
    "centre": "cnmc",
    "masterTablesVersionNumber": 29,
    "localTablesVersionNumber": 0,
    "typicalDate": 20231002,
    "typicalTime": "000000",
    "numberOfSubsets": 1
  }
]}

When examining a BUFR file we often want to determine the type of data contained in the file and the typical date / time of the data in the file. This information can be listed using the -p flag to select the headers to output. Multiple headers can be included using a comma separated list.

Using the bufr_ls command inspect the test file and identify the type of data contained in the file and the typical date and time for that data.

Hint

The ecCodes keys are given in the table above. We can use the following to list the dataCategory and internationalDataSubCategory of the BUFR data:

bufr_ls -p dataCategory,internationalDataSubCategory bufr-cli-ex1.bufr4

Additional keys can be added as required.

Question

What type of data (date category and sub category) are contained in the file? What is the typical date and time for the data?

Click to reveal answer

The command you need to run should have been similar to:

bufr_ls -p dataCategory,internationalDataSubCategory,typicalDate,typicalTime -j bufr-cli-ex1.bufr4

You may have additional keys, or listed the year, month, day etc individually. The output should be similar to below, depending on whether you selected JSON or default output.

{ "messages" : [
  {
    "dataCategory": 2,
    "internationalDataSubCategory": 4,
    "typicalDate": 20231002,
    "typicalTime": "000000"
  }
]}

From this we see that:

The data category is 2, from BUFR Table A we can see that this file contains "Vertical soundings (other than satellite)" data.
The international sub category is 4, indicating "Upper-level temperature/humidity/wind reports from fixed-land stations (TEMP)" data. This information can be looked up in Common Code Table C-13 (row 33). Note the combination of category and sub category.
The typical date and time are 2023/10/02 and 00:00:00z respectively.

Exercise 2 - bufr_dump

The bufr_dump command can be used to list and examine the contents of a BUFR file, including the data itself.

In this exercise we will use a BUFR file that is the same as your created during the initial csv2bufr practical session using the wis2box-webapp.

Download the sample-file to the wis2box management container directly with the following command:

curl https://training.wis2box.wis.wmo.int/sample-data/bufr-cli-ex2.bufr4 --output bufr-cli-ex2.bufr4

Now run the bufr_dump command on the file, using the -p flag to output the data in plain text (key=value format):

bufr_dump -p bufr-cli-ex2.bufr4

You should see around 240 keys output, many of which are missing. This is typical with real world data as not all the eccodes keys are populated with reported data.

Hint

The missing values can be filtered using tools such as grep:

bufr_dump -p bufr-cli-ex2.bufr4 | grep -v MISSING

The example BUFR file for this exercise comes from the csv2bufr practical session. Please download the original CSV file into your current location as follows:

curl https://training.wis2box.wis.wmo.int/sample-data/csv2bufr-ex1.csv --output csv2bufr-ex1.csv

And display the content of the file with:

more csv2bufr-ex1.csv

Question

Use the following command to display column 18 in the CSV file and you will find the reported mean sea level pressure (msl_pressure):

more csv2bufr-ex1.csv | cut -d ',' -f 18

Which key in the BUFR output corresponds to the mean sea level pressure ?

Hint

Tools such as grep can be used in combination with bufr_dump. For example:

bufr_dump -p bufr-cli-ex2.bufr4 | grep -i "pressure"

would filter the contents of bufr_dumpto only those lines containing the word pressure. Alternatively, the output could be filtered on a value.

Click to reveal answer

The key "pressureReducedToMeanSeaLevel" corresponds to the msl_pressure column in the input CSV file.

Spend a few minutes examining the rest of the output, comparing to the input CSV file before moving on to the next exercise. For example, you can try to find the keys in the BUFR output that correspond to relative humidity (column 23 in the CSV file) and air temperature (column 21 in the CSV file).

Exercise 3 - csv2bufr mapping files

The csv2bufr tool can be configured to process tabular data with different columns and BUFR sequences.

This is done by the way of a configuration file written in the JSON format.

Like BUFR data itself, the JSON file contains a header section and a data section, with these broadly corresponding to the same sections in BUFR.

Additionally, some formatting options are specified within the JSON file.

The JSON file for the default mapping can be view via the link below (right-click and open in new tab):

aws-template.json

Examine the header section of the mapping file (shown below) and compare to the table from exercise 1 (ecCodes key column):

"header":[
    {"eccodes_key": "edition", "value": "const:4"},
    {"eccodes_key": "masterTableNumber", "value": "const:0"},
    {"eccodes_key": "bufrHeaderCentre", "value": "const:0"},
    {"eccodes_key": "bufrHeaderSubCentre", "value": "const:0"},
    {"eccodes_key": "updateSequenceNumber", "value": "const:0"},
    {"eccodes_key": "dataCategory", "value": "const:0"},
    {"eccodes_key": "internationalDataSubCategory", "value": "const:2"},
    {"eccodes_key": "masterTablesVersionNumber", "value": "const:30"},
    {"eccodes_key": "numberOfSubsets", "value": "const:1"},
    {"eccodes_key": "observedData", "value": "const:1"},
    {"eccodes_key": "compressedData", "value": "const:0"},
    {"eccodes_key": "typicalYear", "value": "data:year"},
    {"eccodes_key": "typicalMonth", "value": "data:month"},
    {"eccodes_key": "typicalDay", "value": "data:day"},
    {"eccodes_key": "typicalHour", "value": "data:hour"},
    {"eccodes_key": "typicalMinute", "value": "data:minute"},
    {"eccodes_key": "unexpandedDescriptors", "value":"array:301150, 307096"}
],

Here you can see the same headers as available in the output from the bufr_ls command. In the mapping file the value for these can be set to a column from the CSV input file, a constant value or an array of constant values.

Question

Look at the example header section above. How would you specify a constant value, a value from the input CSV data file and an array of constants?

Click to reveal answer

Constant values can be set by setting the value property to "value": "const:<your-constant>".
Values can be set to a column from the input CSV by setting the value property to "value": "data:<your-column>".
Arrays can be set by setting the value property to "value": "array:<comma-seperated-values>".

Now examine the data section of the JSON mapping file (note, the output has been truncated):

    "data": [
        {"eccodes_key": "#1#wigosIdentifierSeries", "value":"data:wsi_series", "valid_min": "const:0", "valid_max": "const:0"},
        {"eccodes_key": "#1#wigosIssuerOfIdentifier", "value":"data:wsi_issuer", "valid_min": "const:0", "valid_max": "const:65534"},
        {"eccodes_key": "#1#wigosIssueNumber", "value":"data:wsi_issue_number", "valid_min": "const:0", "valid_max": "const:65534"},
        {"eccodes_key": "#1#wigosLocalIdentifierCharacter", "value":"data:wsi_local"},
        {"eccodes_key": "#1#latitude", "value": "data:latitude", "valid_min": "const:-90.0", "valid_max": "const:90.0"},
        {"eccodes_key": "#1#longitude", "value": "data:longitude", "valid_min": "const:-180.0", "valid_max": "const:180.0"},
        {"eccodes_key": "#1#heightOfStationGroundAboveMeanSeaLevel", "value":"data:station_height_above_msl", "valid_min": "const:-400", "valid_max": "const:9000"},
        {"eccodes_key": "#1#heightOfBarometerAboveMeanSeaLevel", "value":"data:barometer_height_above_msl", "valid_min": "const:-400", "valid_max": "const:9000"},
        {"eccodes_key": "#1#blockNumber", "value": "data:wmo_block_number", "valid_min": "const:0", "valid_max": "const:99"},
        {"eccodes_key": "#1#stationNumber", "value": "data:wmo_station_number", "valid_min": "const:0", "valid_max": "const:999"},
        {"eccodes_key": "#1#stationType", "value": "data:station_type", "valid_min": "const:0", "valid_max": "const:3"},
        {"eccodes_key": "#1#year", "value": "data:year", "valid_min": "const:1600", "valid_max": "const:2200"},
        {"eccodes_key": "#1#month", "value": "data:month", "valid_min": "const:1", "valid_max": "const:12"},
        {"eccodes_key": "#1#day", "value": "data:day", "valid_min": "const:1", "valid_max": "const:31"},
        {"eccodes_key": "#1#hour", "value": "data:hour", "valid_min": "const:0", "valid_max": "const:23"},
        {"eccodes_key": "#1#minute", "value": "data:minute", "valid_min": "const:0", "valid_max": "const:59"},
        {"eccodes_key": "#1#nonCoordinatePressure", "value": "data:station_pressure", "valid_min": "const:50000", "valid_max": "const:150000"},
        {"eccodes_key": "#1#pressureReducedToMeanSeaLevel", "value": "data:msl_pressure", "valid_min": "const:50000", "valid_max": "const:150000"},
        {"eccodes_key": "#1#nonCoordinateGeopotentialHeight", "value": "data:geopotential_height", "valid_min": "const:-1000", "valid_max": "const:130071"},
        ...
    ]

Refer back to exercise 2 in this practical session and the task identifying the ecCodes key corresponding to the mean sea level pressure.

Question

Can you identify the row in the data section that performs the mapping between the CSV input file and ecCodes key used to encode the mean sea level pressure data?

Click to reveal answer

You will have hopefully identified the line below:

{"eccodes_key": "#1#pressureReducedToMeanSeaLevel", "value": "data:msl_pressure", "valid_min": "const:50000", "valid_max": "const:150000"},

This instructs the csv2bufr tool to map the data from the msl_pressure column to the #1#pressureReducedToMeanSeaLevel element in BUFR.

Bonus question

Can you guess what the valid_min and valid_max properties do?

Click to reveal answer

These specify the valid minimum and maximum values for the different elements. Values falling outside of the range specified will result in a warning message and the data set to missing in the output BUFR file.

Download the csv2bufr mapping file to the wis2box management container:

curl https://raw.githubusercontent.com/wmo-im/csv2bufr/main/csv2bufr/templates/resources/aws-template.json --output aws-template.json

Now modify the file to change the limits for air temperature, for example set to realistic limits but in degrees Celsius:

        {"eccodes_key": "#1#airTemperature", "value": "data:air_temperature", "valid_min": "const:-60", "valid_max": "const:60"},

Now use csv2bufr to transform the example CSV data file to BUFR using the modified mapping file:

csv2bufr data transform --bufr-template aws-template.json csv2bufr-ex1.csv

You should see the following output:

CLI:    ... Transforming ex1.csv to BUFR ...
CLI:    ... Processing subsets:
#1#airTemperature: Value (301.25) out of valid range (-60 - 60).; Element set to missing
CLI:    ..... 384 bytes written to ./WIGOS_0-20000-0-99100_20230929T090000.bufr4
CLI:    End of processing, exiting.

This suggests a mismatch between the units in the input data and that expected by the mapping file.

Use the bufr_dump command to confirm that the first air temperature value has been set to missing in the file that was produced:

bufr_dump -p WIGOS_0-20000-0-99100_20230929T090000.bufr4 | grep airTemperature

Note

Note that the units used in BUFR are fixed, Kelvin for temperature, Pascals for pressure etc. However, csv2bufr can perform simple unit conversions by scaling and adding an offset.

As the final part of this exercise, edit the the mapping file again and add a scale and offset to the airTemperature line.

        {"eccodes_key": "#1#airTemperature", "value": "data:air_temperature", "valid_min": "const:-60", "valid_max": "const:60", "scale": "const:0", "offset":"const:273.15"},

And edit the air_temperature column in the 'csv2bufr-ex1.csv' to be in Celsius rather than Kelvin: 301.25 K = 28.1 degrees C.

Rerun the conversion:

csv2bufr data transform --bufr-template aws-template.json csv2bufr-ex1.csv

Run BUFR dump and confirm that the airTemperature is correctly encoded.

bufr_dump -p WIGOS_0-20000-0-99100_20230929T090000.bufr4 | grep airTemperature

You should see the following output:

#1#airTemperature=301.25
#2#airTemperature=MISSING

Exercise 4 - installing new csv2bufr templates

The bufr-template used by csv2bufr can be specified by providing a filename to the command, as per the above example, or by setting a search path with an environment variable and by specifying the template name. The template name must match the filename but without the extension, with the file located on the search path.

Bug

Currently, the default mapping file used in the csv2bufr automated workflow sets the originating centre header to indicate that the data have originating with the WMO Secretariat, see the JSON snippet from the default bufr-template below:

    {"eccodes_key": "bufrHeaderCentre", "value": "const:0"},

In this final exercise you will download the template used, correct with the appropriate centre code and install in the wis2box management container. Suggested originating centre codes are listed below, these will need to be confirmed with your WIS focal points prior to data exchange.

Centre	Code
Melbourne (RSMC), Australia	67
Brasilia (RSMC), Brazil	43
Beijing (RSMC), China	38
New Delhi (RSMC), India	28
Indonesia (NMC)	195
Malaysia (NMC)	73
Federated States of Micronesia	197
Wellington (RSMC), New Zealand	69
Oman (NMC)	127
Philippines (NMC)	201
Seoul, Republic of Korea	40
Dili (NMC), Timor-Leste	TBC

For this exercise, first create a directory in which to store the mappings:

cd /data/wis2box
mkdir bufr-templates

Navigate to the directory and download the default bufr-template:

cd bufr-templates
curl https://raw.githubusercontent.com/wmo-im/csv2bufr/main/csv2bufr/templates/resources/aws-template.json \
    --output aws-template-custom.json

Make sure the file 'aws-template-custom.json' is in the directory '/data/wis2box/bufr-templates':

ls /data/wis2box/bufr-templates

Edit the file using vi or your favourite text editor and update the value of the bufrHeaderCentre.

Now try the commands from the block below:

curl https://training.wis2box.wis.wmo.int/sample-data/csv2bufr-ex1.csv --output csv2bufr-ex1.csv
export CSV2BUFR_TEMPLATES=/data/wis2box/bufr-templates
csv2bufr data transform --bufr-template aws-template-custom csv2bufr-ex1.csv

Inspect the output file using bufr_ls and confirm that the originating centre in the headers is updated.

Note

You can use the custom mappings in your automated workflow by updating your data-mappings.yml and updating the environment variable for CSV2BUFR_TEMPLATES.

Edit your data-mappings.yml file to use this new file by updating the template name:

            csv:
                - plugin: wis2box.data.csv2bufr.ObservationDataCSV2BUFR
                template: aws-template-custom
                notify: true
                file-pattern: '^.*\.csv$'

Add the environment variable to your wis2box.env file:

cd ~/wis2box-1.0b5/
echo "export CSV2BUFR_TEMPLATES=/data/wis2box/bufr-templates" >> wis2box.env

And restart the wis2box stack:

python3 wis2box-ctl.py stop
python3 wis2box-ctl.py start

The new template should now be used in the automated workflow.

Housekeeping

Clean up your working directory by removing files you no longer need and clean up your station list to remove any example stations you may have created and that are no longer needed.

Conclusion

Congratulations

In this practical session you have learned:

how to use some of the BUFR command line tools available in the wis2box management container;
how to make basic changes to the mapping file used in the csv2bufr tool;
and how to install new csv2bufr mappping templates in wis2box.

Next steps

In this session we have covered only a small part of the csv2bufr and ecCodes BUFR tools functionality. For example, the csv2bufr tool includes the ability to create new 'bufr-template' files provided different BUFR sequences. As an example, the template for BUFR sequence (301150, 307096), used by the default aws-template, can be generated with the command:

csv2bufr mappings create 301150 307096 > bufr-template.json

This file can be modified to add and map additional variables from the CSV data file, use different quality control values (valid_min and valid_max) and perform some basic (linear) unit conversions. Further information is available via the csv2bufr documentation pages:

https://csv2bufr.readthedocs.io/en/latest/.

Further information on the ecCodes BUFR tools can be found at:

https://confluence.ecmwf.int/display/ECC/BUFR+tools.