# Large Queries in `astroquery.mast`

For some programs stored in the MAST archives, you may encounter issues when performing queries for products or downloading data via the MAST Portal due to a large number of files. This applies particularly to JWST programs using Wide-Field Slitless Spectroscopy. It is preferable — and often, necessary — to use an API to get this data instead.

To that end, this notebook will demonstrate:
* Searching the MAST Portal for observations using the `astroquery.mast` API
* Retreiving associated data products, without causing a timeout error
* Downloading the desired subset of data products

## Table of Contents
* [Imports](#Imports)
* [Search the MAST Archives](#Search-the-MAST-Archives)
* [Retrieve Associated Products](#Retreive-Associated-Products)
* [Filter and Download Products](#Filter-and-Download-Products)
* [Further Reading](#Further-Reading)

## Imports
In order to run this notebook, we need: 
* [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) to access the MAST archives
* [astropy.table](https://docs.astropy.org/en/stable/table/index.html) to hold the results of our queries, combine them, and then filter them for unique products

In [None]:
from astroquery.mast import Observations
from astropy.table import unique, vstack, Table

## Search the MAST Archives
The first step to downloading the data is finding the observations we're interested in. We will use the `query_criteria()` method, which allows us to specify criteria such as RA/Dec, filters, exposure time, and any other fields listed [here](#https://mast.stsci.edu/api/v0/_c_a_o_mfields.html). 

We will search for NIRCam observations from JWST Program 1073.

In [None]:
matched_obs = Observations.query_criteria(
        obs_collection = 'JWST'
        , proposal_id = '1073'
        , instrument_name = 'Nircam'
        )


In [None]:
# This displays selected columns from the observation table, as a sanity check
columns = ['dataproduct_type', 'calib_level', 't_exptime', 'proposal_pi', 'intentType', 'obsid']
matched_obs[columns].show_in_notebook(display_length=5)

## Retreive Associated Products
Each observation has associated data products. Which products are of interest to you depends on how you intend to use the data; more on this in the section below. For now, retreive all the products by requesting them, one observation at a time.

<div class="alert alert-block alert-warning">

<span style="color:black">
    <b>Note: It is wise not to query for all of the products simultaneously.</b> If there are a large number of associated products, it is extremely likely to take an enormous amount of time, fail, or worse, do both. This particular observation search will return only 15 observations, but the following call will return 35,3000 associated products. 
</span>
</div>

In [None]:
t = [Observations.get_product_list(obs) for obs in matched_obs]
files = unique(vstack(t), keys='productFilename')

## Filter and Download Products
If you are trying to download proprietary data, you will need to login. This requires a MAST token, which you can create at the [auth.mast](#https://auth.mast.stsci.edu/tokens) wesbite.

In this example, we are looking to download the uncalibrated products. We will filter those out below using the `productSubGroupDescription` field. You can find the other available product filters, including product type and file size, [here](https://mast.stsci.edu/api/v0/_productsfields.html).

An additional option we make use of is the `curl_script` flag. Rather than downloading the data immediately, this method instead downloads a curl script. This is turned off by default, but may provide a more robust connection that a direct download.

In [None]:
# Un-comment below if downloading data during its exclusive access period.
#Observations.login()

manifest = Observations.download_products(
           files
           , productSubGroupDescription='UNCAL'
           , productType=['SCIENCE','INFO']
           , curl_flag=True
           )

The name of the download script will be something like: mastDownload_**YYYYMMDDhhmmss**.sh, where the latter part of the name is a numeric timestamp. What remains is to invoke the downloaded **bash** script on your machine to retrieve the files. 

All of the code in this notebook is also available as a companion script, for further convenience.

## Futher Reading
* For a full explanation of product levels and the processing pipleline, see [Science Data Products](https://outerspace.stsci.edu/display/MASTDOCS/Science+Data+Products)

## About this Notebook

**Authors:** Thomas Dutkiewicz, Dick Shaw, and Tom Donaldon <br>
**Keywords:** Downloads, astroquery, MAST <br>
**Last Updated:** Aug 2022 <br>

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/> 