Flies in HLSP collections must follow the naming convention described here. This convention facilitates ingestion of the collection into MAST, and helps to enable automated searches.
On this page...
HLSP File Names
The names for science products and ancillary files in HLSP collections must comply with the guidance offered here to enable the files to be ingested into MAST databases, and to enable MAST services to locate them among the billions of files in MAST. Filenames are best if users can tell with some certainty the semantic content of a file from the name. For these reasons HLSP filenames:
- must be unique within a collection
- must use a restricted character set
- must be limited in overall length
- must contain certain metadata within the name, in a structured way
To support these goals, HLSP filenames (with the exception of README files) are composed of 9 sub-strings called fields, separated by underscores. Certain fields may be further sub-divided into sub-strings called elements, separated by hyphens. Specific guidance is offered in the subsections below.
Character set
File names must be composed of a limited character set, and are restricted in length.
- All filenames must be composed only of lower-case ASCII alphanumeric characters (
a-z,0-9
), plus the following special characters:- underscore (
_
), but only to separate fields within filenames - hyphen (
-
), to separate elements within fields, and as a part of target names - period (
.
), which is permitted within the target field, the version ID and as a field separator for the file extension - plus sign (
+
), which is only permitted within the target field
- underscore (
- Most fields must begin with a lower-case letter (
a-z
), except as noted in the table below- Each field has a maximum length, however overall filenames should be limited in length to no more than 90 characters
Directory structure
For collections of more than a few dozen files, it would help for them to be organized into folders. The specific organization is up to the contributing team, but the sub-folder names must:
- consist of only lower-case alphnumeric characters (
a-z,0-9
)- may also contain underscores and hyphens, except as the first or last character
- be limited in length to 20 characters
The structure supplied by the team is primarily used to facilitate review and characterization of collection files prior to ingest. Note that MAST may or may not use the directory structure provided by the team to organize files in the archive mass storage system, or to represent files in any user interface.
Where applicable, it often speeds up the ingest of particularly complex collections if files, corresponding to an observation of a target in space and time with a particular instrument and optical element, are grouped together underneath a higher-level directory representing that observation.
File name fields
File names are composed of 9 fields, separated by underscores (the exception being the collection README file, which need only have fields 1, 2, 8, and 9 listed below). Certain fields may be sub-divided into elements, separated by dashes. The name template is shown below, where characters in black bold text, including underscores and periods, are literal text; fields in green italic text are symbolic, and are explained in the table below.
hlsp_proj-id_observatory_instrument_target_opt-elem_version_product-type.extension
Where the components, in order, are:
Component | Max Chars | Description | Examples | |
---|---|---|---|---|
1 | hlsp | 4 | A literal string that identifies the file as a community-contributed data product | |
2 |
| 20 | An agreed upon acronym or initializm for the HLSP collection. This name is also used in MAST as a directory name and as a database keyword.
| wide |
3 |
| 20 | Observatory or mission used to acquire the data, or for which the data were simulated.
| hst-iue , galex , jwst |
4 | instrument | 20 | Name of Instrument used to obtain the data, or for which the data were simulated.
|
|
5 | target | 30 | Field name or target as designated by the team, or as a general identifier where a specific target designation is not relevant. Parts, counter numbers, and epochs are allowed in this field and should be separated by hyphens. Counters can be used, e.g., when the same field is observed multiple times with the same observing parameters.
Please describe your usage of target/general parts and counters within the collection README file. | m57, m101-ep1, m101-ep2 |
6 | opt-elem | 20 | Names of optical element(s) (i.e., filter or disperser) used to obtain the data.
|
|
7 | version | 9 | Version designation used by the team for the HLSP delivery, Versions in the filename may relate in some way to data release or software versions, but ultimately they must represent the version of a file, and must be incremented with any delivery that replaces that file. The value must begin with the literal "v" and contain an alphanumeric value, with the syntax
where X and Y are numeric values with up to 2 digits, X cannot have a leading zero, and Z is alphanumeric ( |
|
8 | product-type | 16 | Type of data as designated by the team (models/simulations can be indicated here). Use a widely recognized type. Be sure to distinguish products of similar type, possibly by using a simple compound type. e.g., a photometric catalog (
|
|
9 | extension | 8 | Standard extension name for the file format, which must include standard notation for compression if applicable |
|
Recommendations
- Version numbers can be specific to the project. Teams should use increasing version numbers to make it easy to tell which data are superseded; MAST will not keep older versions of datasets unless the team demonstrates a need for it.
- If your collection involves a time-domain component, this should be reflected in the target field (rather than in the version field), using suffix of the form ep-NN as noted above. Contributors should take special care to explain time-dependent aspects of their collections in the README file.
- Re-delivered data should contain both the re-processed data along with the single-epoch data associated with these products, if applicable.
- All images identified using the same target name should cover approximately the same area of the sky. If there are multiple images covering different parts of a region or source, their target field values should have appended elements indicating the different parts..
Example File Names
A drizzle-combined image of the GOODS field, obtained with the HST/ACS camera WFC channel using the F435W filter:
hlsp_goods_hst_acs-wfc_north-sect13_f435w_v2.0_drz.fits
An image of the Hubble ultra-deep field obtained with the HST/NICMOS NIC3 camera using the F110W filter:
hlsp_udf_hst_nicmos-nic3_treasury_f110w_v2_img.fits
A text catalog from the 47 Tuc from the DEEP47TUC survey obtained with HST/ACS, containing stellar magnitudes in the F606W and F814W filters:
hlsp_deep47tuc_hst_acs_47tuc_f606w-f814w_v1_cat.txt
A combined spectrum with different observatories, instruments, gratings and cenwave positions all abutted, of the target AV-16 from the ULLYSES survey, obtained with HST/STIS and FUSE. Note that the opt-elem field is shortened to uv-opt, rather than list the more than 8 individual elements:
hlsp_ullyses_hst-fuse_fuse-stis_av-16_uv-opt_dr7_preview-spec.fits