When downloading large files, or a large number of them, it is often advisable to use a bash script to download via the API. These scripts use the cURL utility to retrieve the selected files. cURL is robust, reliable, and supports many features including authentication. It is also the preferred tool for downloading large volumes of data because of the data volume and rate limits imposed by the Portal Web interface.
Example Bash Script
When retrieving a selection of files from the MAST Portal, one of the download options is an auto-generated bash (shell) script. Note that this option will download only the script; the actual files are download when the script is run.
If you would like to customize the example script provided here to access data files in MAST, you will first need to determine the URIs for the files of interest using another API, e.g., astroquery.mast. See the 'Interactive Example' section down below to try it yourself.
You must invoke the script in a unix terminal to download the files:
A look at the example script MAST_2022-04-30T2153.sh shows that, apart from some housekeeping, it does a few main things:
- For EAP protected data, fetch the MAST API Token
- Create a folder for the payload
- Create a
MANIFEST.HTMLfile to report status of the requested vs. retrieved data files
- Retrieve each data file with a cURL command
Items 1 and 4 are described below in more detail.
The API Token
If the requested files include at least one with EAP protection, a MAST API token is required. As the following code snippet shows, the bash script will attempt to get the MAST token multiple ways.
The following code snippet shows how each file is retrieved with cURL. For readability, the cURL command is re-formatted here as multi-line, with linux/MacOS escape characters.
It is worth calling out two important cURL command-line options:
-H: pass custom header to the server. In this case, the MAST auth token
--location-trusted: Follow re-directs, and send auth to other hosts
Interactive Python Example
In addition to requesting a cURL script from the MAST Portal, you can get one from the astroquery.mast Python API. We offer an interactive large download Jupyter Notebook that might serve as an example of how to construct such a query. Jupyter Notebooks are interactive, make it easy to follow along with code, and can be customized with ease; we highly suggest trying this tutorial in our Notebook.
For completeness, we also reproduce that code on this webpage. To begin, you must import the necessary packages. You'll need astroquery.mast to access the API, and astropy.table to handle the results, which are returned as a Table object. Then you can create a query, searching on any of the fields listed on our API page. Here we'll query for JWST NIRCam observations, and specify a specific proposal ID.
The above code will return only matched observations. Each observation has a set of files associated with it; these may be guide-star images, uncalibrated exposures, or the final calibrated science product. In any case, we will need to retrieve all of these products before we can filter the results.
Requesting the products for many observations at once increases the risk of timeout errors. However, requesting products one at a time is often slow. The best balance is achieved by requesting products in groups of five.
With our set of unique files, all that remains is to pass the table to the download products function. We also include the 'extension' filter to limit our download to .fits files. Any criteria listed on the products fields page may also be used.