Stitching the Australian 1-km AVHRR Archive
Edward King
CSIRO Earth Observation Centre
May 2000
Introduction
This task component seeks to exploit the very substantial data holding that has accrued as a result of another part of the task, the Australian participation in the Global 1km Land Data Project. The primary goals are to migrate the entire archive to a modern high-density media in a consistent format, to thoroughly catalogue all the data, and to utilise the redundancies inherent in passes for which data from more than one reception station is available in order to produce the highest quality, most complete data set.
The Dataset
The primary data holdings comprise HRPT data from reception stations located in Darwin, Hobart, Perth and Townsville. From 1993 to the present these stations have contributed essentially all passes (both day and night) of the operative AVHRR instruments, yielding a dataset comprising approximately 100000 scenes. Limited afternoon pass HRPT data from Alice Springs and Hobart is also held in the archive, extending the 1km resolution component back to 1988, though with a restricted frequency of coverage.
We have arranged to incorporate the archive from the CSIRO Atmospheric Research reception station at Aspendale, providing HRPT from 1992 to 1997. This archive is held in DISIMP format and so requires an additional stage of reformating compared with most of the other data. At the time of writing, ingest of the Aspendale data is nearing completion.
As the most complete archive of continental coverage data, the value of this dataset will grow as the time span covered increases. However, to realise that value, it is essential that the archive is properly handled and maintained so that the data is useful, accessible and reliable.
Curation of the Archive
The archive is held in a variety of different formats on several thousand Exabyte and DAT magnetic tapes, typically holding between 2 and 5 Gb of data each. This alone makes utilisation of the data difficult since the sheer scale of tape handling is substantial. Many of the tapes are now between five and ten years old and need to be exercised. Moreover, we find that many of the older Exabyte tapes can only be satisfactorily read on the older drives, leading us to regard the drives themselves as an essential component of the archive! In response to this situation we are migrating the entire archive to DLT 7000 media with a capacity of 35 Gb per tape. This will serve the dual purpose of reducing the tape handling problem by a factor of between five and ten, and will also exercise all the the tapes in the present archive (which will not be discarded but be held as a secondary backup). DLT tapes have a maximum file seek time around 90 seconds so this migration will also permit much faster retrieval of particular files, opening the way for possible near-online access to the entire archive at some time in the future.During the course of the migration, all the files are being consistently catalogued in a comprehensive new database. The metadata will include:
- orbital parameters
- pass information (time)
- channel information
- geographic location of the scene
- file identity and location within the archive
- Complete browse imagery is also being generated. A web interface to the metadata database and the browse imagery, providing a search capability, is currently being prototyped. During the migration to DLT media, all scenes will be converted to the archive file format used by the USGS LAS/ADAPS package. This format was chosen for convenience to support the processing of the archive described below. However it does not capture entirely the HRPT data so an auxiliary file containing the remainder, together with numerous other derived quantities, is maintained in association with each image file.
- Consolidation (stitching)
- More than half the HRPT data in the archive consists of contemporaneous acquisitions from two or more reception stations, amounting to significant redundancy in the archive. While this increases the total data volume, it also allows the compilation of a new ``best'' dataset, by selecting only the highest quality data where alternatives are available. In particular, the following possibilities are all advantageous:
- lines dropped at one station can be replaced by those from another.
- when three or more stations are available, bitwise comparison can be used to detect and correct errors.
- when only two stations are available, noise detection (based on certain assumptions about the behaviour of "good'' data), can be used to choose the more reliable pixels.
In the process of loading and comparing the scenes, it is efficient to produce a new single image file providing the maximum amount of best data for the given satellite pass (*Figure 1.*). Thus the input scenes are ``stitched'' together to create (in most cases) a full continent-spanning pass, as if the data were obtained using a ``super ground-station'' with over-the-horizon capabilities and less noisy acquisition systems than the individual stations. This output pass will contain the best available data, but will be smaller than the sum of the input data files. Since all the individual station tapes must be read anyhow, the whole process is most efficiently conducted in the course of migrating the archive to DLT media.
Figure 1. The procedure of stitching two (or more) scenes from a satellite pass may produce a single longer scene, part of which may overlap both of the input scenes. In the overlap region it is possible to maximise the output data quality.
-
- While conceptually simple, the whole archive migration and stitching process is in practice a complex task, since each scene to be stitched must be read from its tape, reformatted, catalogued, and then held on disk until all the other scenes from that pass are similarly ingested. Once this is achieved, the stitching of the pass can take place. Even though there are some passes for which there is only one input scene, all passes are processed since the stitching program adds further metadata to the auxillary files and to not process them would result in an internally inconsistent output archive. In addition to archiving the stitched output files, the input files are archived separately to DLT grouped by pass. It is this action which actually constitutes migration of the archive, but it has the key advantage that the files are not organised by reception station but by pass, so that should it ever be necessary to re-stitch, all the necessary station files can be restored together, and are guaranteed to be in a common format. Overview of the process is shown schematically in Figure 2.
-
Figure 2. Steps involved in the migration of the AVHRR archive to DLT media, and stitching to create continent-spanning passes.
- The stitching is now operational, having been through an extensive testing process over the past six months. All data from August 1995 to April 1997 has been processed and is held on DLT tapes in LAS/ADAPS archive format at the EOC in Canberra. The major cause of delay in the running of the process is the problems encountered in reading the input station tapes, many of which are old and some of which can only be read on a few particular tape drives. Typical throughput achieved is around 60 GB per day, leading to a rate of progress of one year of data every three to four weeks. An example stitched pass is shown in Figure 3.
-
|
Hobart
|
Perth
|
Townsville
(NASIS)
|
Townsville
(AIMS)
|
Darwin
|
Stitched Pass
|
 |
 |
 |
 |
 |
 |
-
Figure 3. An example stitched pass for NOAA-14 on 21 Sept 1995 at 0500 GMT. The output pass comprises data from Hobart, Perth, Townsville, AIMS and Darwin and is approximately 8100 km long.
-
- Access
- The stitched archive will become the new Level-0 base archive for future work that generates higher level products. Initially this will be available only to a small number of CSIRO users, partly because its reliability will need to be demonstrated, and partly because distribution will only be via magnetic tape. It is intended that the catalogue be web accessible and searchable. The ultimate data delivery mechanism to end-users is not yet determined. In the interim, we will be exporting a copy of the whole stitched archive to collegues at CSIRO Atmospheric Research in Melbourne and Marine Research in Hobart.
-
- The export data format will be ASDA (Turner et.al. 1996 - click here for a description. This is a mostly benign translation from the LAS/ADAPS archive format, the most significant difference being that a descriptive header is prepended to the file. The ASDA header is essentially a self documenting text (=readable) description of the file content and format, and includes details of the instrument (AVHRR) and other ancillary data such as TBUS navigation information. Thus sufficient data will be integral with each image file to allow sensible and valid basic processing by a competent user, without requiring access to other sources of information. Further advantages of adopting the ASDA format are that several reception stations already use it to deliver their data, so there is a body of existing software (including CAPS) that is already able to read it.
-
- We are currently trialling a number of catalogue search and data access tools that have been developed within the CEOS community with a view to facilitating wider access to the data. These include the International Directory Network (IDN), the Information Management System (IMS), and the Data Information and Access Link (DIAL). Of these IMS and DIAL appear the most likely to provide the features needed in an online data search and access system and we are experimenting with making subsets of our data available through this system.
-
- A recent development in this area is that we have gained access to a 25 Tb robotic store located at the ACSys CRC on the ANU. We have populated the store with 2 Tb of our own tapes and established a dedicated 100 Mbit network link connecting our site. There are some teething problems to be overcome with making the link run at full speed but we expect to begin loading our archived data into the store certainly before the end of May 2000. It is our intention to experiment with the data in the store to test both how we might make the archive available near-line and what obstacles must be overcome to achieve batch generation of higher level products.
-
-
- Copyright CSIRO 2000
Last updated 22/5/00
Use of this web site and information available from it is subject to our
Legal Notice and Disclaimer
-