# JWST SI Keyword Search for Observations
## Introduction

Thus tutorial will illustrate how to use MAST API to search for JWST science data by values of [FITS](https://fits.gsfc.nasa.gov/fits_standard.html) header keywords, and then retrieve all products for the corresponding Observations. 
Searching by SI Keyword values and accessing all data products is not supported in the [MAST Portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html), nor with the [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) `Observations` class by itself. 

Specifically, this tutorial will show you how to:
* Use the `Mast` class of [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) to search for JWST science files by values of [FITS](https://fits.gsfc.nasa.gov/fits_standard.html) header keywords
* Construct a unique set of Observation IDs to perform a search with the astroquery.mast `Observation` class
* Fetch the unique data products associated with the Observations
* Filter the results for science products
* Download a bash script that retrieve the filtered products

<div class="alert alert-block alert-info">

<span style="color:black">
Here are key distinctions between the two search methods with <a href="https://astroquery.readthedocs.io/en/latest/mast/mast.html">astroquery.mast</a>:
    <ul>
        <li> <b>SI Keyword Search:</b> Uses the <code>Mast</code> class to search for FITS products that match values of user-specified keywords, where the set of possible keywords is very large. Returns only FITS products, and only finds highest level of calibrated products (generally, L-2b and L-3). </li>
        <li> <b>Advanced Search for Observations:</b> Uses the <code>Observations</code> class to search for data products that match certain metadata values. The <a href="https://mast.stsci.edu/api/v0/_productsfields.html">available metadata</a> upon which to conduct such a search is limited to coordinates, timestamps, and a modest set of instrument configuration information. Returns MAST <code>Observations</code> objects, which are collections of all levels of products (all formats) and all ancillary data products. </li>
    </ul>
</span>
</div>

The job of connecting files that match keyword values to Observations is not hard, but it is a little convoluted. It can be done because product file names contain the MAST Observation ID as a sub-string. In effect, one uses the API to perform an Instrument Keyword search, followed by an advanced Observation search once the Observation IDs are known. Here are the steps in the process:
<ul>
    <li><a href="#Imports">Imports</a></li>
    <li><a href="#Example">Example: Search for Exoplanet Spectra</a></li>
    <ul>
        <li><a href="#Criteria">Specify Search Criteria</a></li>
        <li><a href="#Timestamp">Timestamp</a></li>
    </ul>
    <li><a href="#KW Search">The Keyword Search</a></li>
    <li><a href="#Obs IDs">The Observation Search</a></li>
    <ul>
        <li><a href="#Obs Query">Execute the Observation Search</a></li>
    </ul>
    <li><a href="#Data Products">Query for Data Products</a></li>
    <ul>
        <li><a href="#Product Filters">Filter the Data Products</a></li>
        <li><a href="#Login">MAST Login</a></li>
        <li><a href="#Retrieve Files">Retrieve Files</a></li>
    </ul>
    <li><a href="#Resources">Additional Resources</a></li>
</ul>


## Imports
<a id="Imports"></a>

There are not many packages needed for this tutorial. 
* [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) constructs the queries, retrieves tables of results, and retrieves data products
* `astropy.table` holds the results of our product query and finds the unique files
* [astropy.time](https://docs.astropy.org/en/stable/time/index.html) creates Time objects and converts between time representations

In [None]:
from astroquery.mast import Mast,Observations
from astropy.table import Table, unique, vstack
from astropy.time import Time

# Example: Search for Exoplanet Spectra
<a id="Example"></a>

This example shows how to search for [NIRISS spectral time-series observations (TSO)](https://jwst-docs.stsci.edu/jwst-near-infrared-imager-and-slitless-spectrograph/niriss-observing-modes/niriss-single-object-slitless-spectroscopy) taken of transiting exo-planets. The data are from Commissioning or Early Release Science programs, and are therefore public. 

## Specify Search Criteria
<a id="Criteria"></a>

The criteria for SI Keyword searches consists of FITS header keyword name/value pairs. (Learn more about SI keywords from the [JWST Keyword Dictionary](https://mast.stsci.edu/portal/Mashup/Clients/jwkeywords/index.html), and about the supported set of [keyword values](https://mast.stsci.edu/api/v0/_jwst_inst_keywd.html) that can be queried.) With this kind of query it is necessary to construct a specific structure to hold the query parameters. 

The following helper routines translate a simple dictionary (one that is easy to customize in Python) to the required [JSON](https://www.w3schools.com/js/js_json_intro.asp)-style syntax, while the second creates a Min:Max pair of parameters for date-time stamps which, as with all parameters that vary continuously, must be expressed as a range of values in a dictionary. 

In [None]:
def set_params(parameters):
    return [{"paramName":p, "values":v} for p,v in parameters.items()]

def set_mjd_range(min, max):
    '''Set time range in MJD given limits expressed as ISO-8601 dates'''
    return {
        "min": Time(min, format='isot').mjd, 
        "max": Time(max, format='isot').mjd
        }

### Timestamp
<a id="Timestamp"></a>

A date range is specified here (though is not strictly needed) to illustrate how to express these common parameters. For historical reasons the `astroquery.mast` parameter names for timestamps come in pairs: one with a similar name to the corresponding FITS keyword, and another with the string <code>_mjd</code> appended. The values are equivalent, but are expressed in ISO-8601 and MJD representations, respectively. 

Change or add keywords and values to the <code>keywords</code> dictionary below to customize your criteria. Note that multiple, discreet-valued parameters are given in a list. 

In [None]:
keywords = {
    'category': ['COM','ERS']
    ,'exp_type': ['NIS_SOSS']
    ,'tsovisit': ['T']
    #,'productLevel': [3]
    ,'date_obs_mjd': [set_mjd_range('2022-06-01','2022-08-04')]
}

params = {
    'columns': '*',
    'filters': set_params(keywords)
    }

The following cell displays the constructed parameter object to illustrate the syntax for the query, which is described formally [here](https://mast.stsci.edu/api/v0/_services.html#MastScienceInstrumentKeywordsNircam). 

In [None]:
params

The full selection of keywords upon which to build search criteria is described in the [Field Descriptions for JWST Instrument Keywords](https://mast.stsci.edu/api/v0/_jwst_inst_keywd.html). Note that [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) parameter names do not always match the FITS keyword names. 

## Execute the SI Keyword Search
<a id="KW Search"></a>

This type of query is a little more primitive in [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) than that for the `Observation` class. Begin by specifying the webservice for the query, which for this case is the [SI keyword search for NIRCam](https://mast.stsci.edu/api/v0/_services.html#MastScienceInstrumentKeywordsNiriss). Then execute the query with arguments for the service and the search parameters that were created above.

In [None]:
service = 'Mast.Jwst.Filtered.Niriss'
t = Mast.service_request(service, params)

## Construct the Observation Search
<a id="Obs IDs"></a>

The keyword search returnes an astropy table of *files* that match the query criteria. We need to construct MAST Observation IDs from the file names in order to query for all JWST *Observations* that match our criteria. This can be derived from the filenames by removing all characters including and beyond the final underscore character. Here we make a list of unique Observation IDs for the subsequent query. Note that we limit the list to *unique* IDs, as many filenames have common roots.

In [None]:
# Unique file names:
fn = list(set(t['filename']))
# Set of derived Observation IDs:
ids = list(set(['_'.join(x.split('_')[:-1]) for x in fn]))

Print the list of unique ids if you like.

In [None]:
ids

### Execute the Query for Observations
<a id="Obs Query"></a>

Now search for Observations that match the list of Observation IDs constructed above. This search uses the [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) `Observations` class, where the available search criteria are described [here](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html). Note that we specify the MAST Mission (i.e., the `obs_collection` field) as <code>JWST</code> to limit the scope of the query (which also greatly speeds up the search). 

In [None]:
matched_obs = Observations.query_criteria(
    obs_collection='JWST',
    instrument_name='Niriss', 
    obs_id=ids
)

Verify that your query matched at least one observation, or the remaining steps will fail.

In [None]:
print('Found {} matching Observations'.format(len(matched_obs)))

## Query for Data Products
<a id="Data Products"></a>

Next fetch the data products that are connected to each Observation. Here we take care to fetch the products from Observations a few at a time (in chunks) to avoid server timeouts. This can happen if there are a large number of files in one or more of the matched Observations. A larger chunk size will execute faster, but increases the risk of a server timeout.

The following bit of python magic splits a single long list into a list of smaller lists, each of which has a size no larger than `sz_chunk`.

In [None]:
sz_chunk = 4
chunks = [matched_obs[i:i+sz_chunk] for i in range(0,len(matched_obs), sz_chunk)]

Now fetch the constituent products in a list of tables.

In [None]:
t = [Observations.get_product_list(obs) for obs in chunks]

We need to stack the individual tables and extract a unique set of file names. This avoids redundancy because Observations often have many files in common (e.g., guide-star files). 

In [None]:
products = unique(vstack(t), keys='productFilename')
print('  Number of unique products: {}'.format(len(products)))

Display the resulting list of files if you like. 

In [None]:
products.show_in_notebook(display_length=10)

### Filter the Data Products
<a id="Product Filters"></a>

If there are a subset of products of interest (or, a set of products you would like to exclude) there are a number of ways to do that. The cell below applies a filter to select only products classified as `SCIENCE` plus the files that define product associations; it also excludes guide-star products. See the full set of [Products Field Descriptions](https://mast.stsci.edu/api/v0/_productsfields.html).

In [None]:
filtered_products = Observations.filter_products(
                    products
                   ,productType=['SCIENCE','INFO']
                    )

Display the filtered product table if you like.

In [None]:
filtered_products.show_in_notebook(display_length=10)

### MAST Login
<a id="Login"></a>

If you intend to retrieve data that are protected by an Exclusive Access Period (EAP), you will need to be both *authorized* and *authenticated*. You can authenticate by presenting a valid [Auth.MAST](https://auth.mast.stsci.edu/info) token with the login function. (See [MAST User Accounts](https://outerspace.stsci.edu/display/MASTDOCS/MAST+User+Accounts) for more information about whether you need to login.) Note: this step is unnecessary if you are only retrieving public data. 

<div class="alert alert-block alert-warning">

<span style="color:black">
    If you have arrived at this point, wish to retrieve EAP products, and have <b>not</b> establihed a token, you need to the following:
    <ul>
        <li> Create a token here: <a href="https://auth.mast.stsci.edu/info">Auth.MAST</a>
        <li> Cut/past the token string in response to the prompt that will appear when downloading the script. </li>
    </ul>
    Defining the token string as an environment variable <b>will not work</b> for an already-running notebook.
</span>
</div>

### Retrieve FIles
<a id="Retrieve Files"></a>

Now fetch the products. The example below shows how to retrieve a bash script (rather than direct file download) which enables the file retrievals at a later time. Scripts are a much better choice if the number of files in the download manifest is large (>100).

In [None]:
manifest = Observations.download_products(
           filtered_products,
           curl_flag=True
           )

The name of the download script will be something like: mastDownload_**YYYYMMDDhhmmss**.sh, where the latter part of the name is a numeric timestamp. What remains is to invoke the downloaded **bash** script on your machine to retrieve the files. 

# Additional Resources
<a id="Resources"></a>

* [astropy](https://docs.astropy.org/en/stable/index.html) documentation
* [astroquery.mast](https://astroquery.readthedocs.io/en/latest/mast/mast.html) documentation for querying MAST
* [Field Descriptions for JWST Instrument Keywords](https://mast.stsci.edu/api/v0/_jwst_inst_keywd.html)
* [Queryable fields](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html) in the MAST/CAOM database


## About this notebook

This notebook was developed by Archive Sciences Branch staff, chiefly Dick Shaw. For support, please contact the Archive HelpDesk at archive@stsci.edu, or through the [JWST HelpDesk Portal](https://jwsthelp.stsci.edu). 
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/>