This is a page to record decisions on what is in or out for a Minimum Viable Project.


VOTING

Please provide your assessment of importance or difficulty of items on the MVP list. Steps for doing this:

  1. Go to the MVP voting link provided in email or slack. Email Harry Ferguson for the link if you can't find it.
  2. Watch the video or follow the steps below.  
    1. Duplicate the sheet entitled ScoringTemplate
    2. Rename it to your name
    3. Expand each category using the + sign on the lefthand side
    4. Enter your ratings for Importance and Difficulty using the dropdown menus (stars for importance, gears for difficulty)
    5. Blanks are interpreted as the lowest priority. Delete your entry if you wish to leave a blank.
    6. That's it.

mvp_instructions.mov


The purpose of a Minimum Viable Product should get a practical, extensible, maintainable, well-documented product into the hands of users as fast as possible. There is a delicate balance between what is too minimal to be particularly useful and what is too ambitious to be viable. As a worksheet we can begin to construct a table of features/capabilities and categorize them in terms of importance to the user and viability for an early release based largely on existing code and/or calibrations and/or reference files.

MVP worksheet

Feature/capabilityImportance for MVP
(must, should, nice)
Difficulty to
deliver in MVP
(high, medium,low)
Comments
user-friendly APIsshouldhigh

This is at least for a quick/dirty run. This might be similar to what Nimish Hathi mentioned. This concern is also including how data and processed data are encapsulated in variables inside the working environment (e.g., Jupyter). For example, considering the aXe outputs (check this ticket in Jira for some info ASTROGRISM-45 - Getting issue details... STATUS ), there are many files produced and this is not obvious how to access these saved outputs. It would be nice if we have a wrapper for reading these outputs back into the working environment, and we design how a user can access mostly used information easily such as x.trace, x.wavelength, x.flux.

modularitymustmediumThis would be useful especially when we think of extendability either adding new grism definition from different facilities, or user customized objects from base classes.
Interactive GUInicehighInteractive graphic user interface will help users to quickly examine visually.
documentationmustmedium

This is important, as we all can agree (smile)

Mitchell Revalski : "...detailed documentation, and reference information for what choices are best for tunable parameters (e.g. if you allow the user to fit a Gaussian, Voigt, or Lorentz profile to something, suggest common choices and pros/cons for different scenarios). Finally, pointing out common pit-falls and sanity checks that should be performed along the way are helpful."

minimal code comment standardshouldlowWe should discuss about what would be the minimal requirement for code commenting. Have these requirements noted down, and make sure that all codes complied to the minimal standard before accepting any push to the main code body. We might assign someone specifically to go through all codes and check for the compliance.
Tutorial / cookbook / templatemustlow

This should explain itself how it import it is.

Mitchell Revalski : Jupyter notebook is suggested.

Megan Sosey :  Create notebooks, using a common example, for all the code bases that we have

  • To understand differences
  • To understand commonalities
  • To flush out better interface design for a common library product
minimal functionalities in prototypemustmedium

Kornpob Bhirombhakdi : At minimum, the prototype should be able to take inputs = {a grism image or a set of dithered images, other association files... such as direct image, background image, etc.}, and give outputs. There should be APIs for users to easily re-run with different parameters (as compared to aXe, all outputs will be in OUTPUT folder, a user has to manage saving this folder separately in a different name, and re-run almost the whole codes just to produce outputs with different parameters. From my aXe experiences, I think extracting grism spectra composes of i) locating a spectrum, ii) compute trace and wavelength given a spectrum, iii) extract (in aXe sense, this is the SPC files), iv) post-extraction calibration (e.g., aperture correction, combining spectra with outlier detection and rejection algorithm in case use avoid performing extraction on a drizzled image, or flux scaled to photometric points... aXe does not perform these steps.

Mitchell Revalski : "...a working end-to-end reduction pipeline example designed for a straightforward set of observations (e.g. image-spectrum-image). functionalities in a prototype might include: 1) initialize code directories and packages, 2) identify and download grism observations in an Astroquery style, 3) perform minimal data-quality checks with automated modules and print a user report, 4) proceed to spectral extraction by identifying a spectrum (or taking user-input locations from direct imaging), calculate required trace, wavelength, and so on, 5) extract the spectrum and perform calibrations, 6) run basic sanity checks on the extracted spectra and provide user report, 7) plot the extracted data products and report on extraction parameters such as aperture, contamination estimates, etc."

Mitchell Revalski : "...emission (and absorption) line finding, possibly using user-input spectral templates, and integration with a line-fitting routine to produce emission line maps, kinematic maps, and an output format that can easily be manipulated to produce user-desired diagnostics such as using the measured line properties to calculate densities, temperatures, abundances, reddening, general line ratios, and so forth." 

Mitchell Revalski : "One aspect that I've found helpful integrating into my own codes: allow for all tabular data (including list of figures) to be output in LaTeX ready format for tabular or aastex style deluxetables (and figure calls)."

Identify associated data sets



e.g. Find and download direct and dispersed images that overlap on the sky via an archive query

Data model for grism data in general

Megan Sosey : Create a data model for grism data in general, show it’s use

Organization and bookkeeping



Conventions for file formats (in and out) metadata in files, file names, directory structure, output files (e.g. column names and units)

Geometric transformations



Outline all of the variants and what the use cases are (e.g. elaborate from Nor's presentation)

Astrometric registration



Align dithered observations

Simulations



Create a simulated 2D dispersed spectrum from a 1D spectrum and image morphology

Background subtraction



What are the different background components & approaches to estimating/subtracting for HST instruments?

Flatfielding



This can be subtle; the same approach can't be used in all circumstances. Maybe multiple user stories are needed? 

1D extraction with no model assumptions

i) locating a spectrum, ii) compute trace and wavelength given a spectrum, iii) extract (in aXe sense, this is the SPC files)
1D extraction relying on  SED models

At least for the contaminants: Forward-model a "reasonable" assumption for the spectrum (flat, polynomial, or SED from a template library; informed by the direct image photometry). Lots of variants here. Is it sufficient for the MVP to enable  this without providing a rich suite of models or templates?
1D "optimal" extraction

Weighting data by the cross-dispersion profile

2D spectral extraction



Maybe multiple stories with different approaches to getting a 2D extracted dispersed spectrum?

Co-adding 



Maybe several stories with different approaches to co-adding spectra taken at different orientations?

Converting counts to flux

Applying instrument throughput calibrations.
Aperture corrections

  1. point sources based on PSF
  2. morphology based on direct image
Outlier detection & rejection from multiple exposures


Find an isolated emission line



EM2D use case

Mitchell Revalski : "...emission (and absorption) line finding, possibly using user-input spectral templates, and integration with a line-fitting routine to produce emission line maps, kinematic maps, and an output format that can easily be manipulated to produce user-desired diagnostics such as using the measured line properties to calculate densities, temperatures, abundances, reddening, general line ratios, and so forth." 

Create an emission-line map



Create a 2D emission-line map from spectra taken at different orientations

Fit a set of templates



Varying flux and redshift

Visual outputs

Simply plots, etc.

Mitchell Revalski : ...."plot the extracted data products and report on extraction parameters such as aperture, contamination estimates, etc."

Integration to LaTeX

Mitchell Revalski : "One aspect that I've found helpful integrating into my own codes: allow for all tabular data (including list of figures) to be output in LaTeX ready format for tabular or aastex style deluxetables (and figure calls)."

Mindmap (work in progress) 

Email Harry Ferguson for editable link.

Style & Standards

Coding and documentation will follow the STScI style guide


11 Comments

  1. A few Notes from 5/13 meeting (please add more...):

    • Megan – a data model that encodes the geometries and notebook showing people how to use it
    • Russell – simulations (especially to help with assessing contamination in observation planning)
    • Iva – replace grizli (essentially allowing model-based extraction)
    • Swara – Can we make use of MIRAGE for simulations 
      • discussion (Steve) – perhaps; but code itself may not be ideal for extending or maintaining for multiple missions 
  2. Can we generate basic notebooks using different codes as a starting point? (a good way to compare these tools and then take best parts from them)

    Also, users might want to look at their data (quick and dirty way) so can we generate a visualization tool?

    ** I also like the idea of 'basic' simulation package.

    -- Here we could focus on JWST as (a) it has some framework in form of Mirage and (b) if done in short timescale, could be useful for Cy1 proposals later this year

  3. Considering that our goal is what users need (or what they would like or what they think is useful), I think we should get input from new users like Kornpob, Mitchell and others. 

      1. I am not sure if this how we are doing here. I put some ideas in the table above, and the team will discuss on these ideas?

        1. Those are all good! 

          So far, they say nothing about what the code is supposed to accomplish though. You could write the same for any software. We're also looking for input on what functionalities people are looking for. (Some examples are in the – probably incomplete – list of user stories, which I will probably tweak and transfer over here.)

          1. I was about to ask if we should summarize the User Stories, and have some inputs from those stories. If that looks good, I will add more.

      2. Hi Harry and Nimish, thanks for soliciting input. I believe Kornpob is much more experienced with the reduction details than I am so I'll mostly focus on the items in the table above. As a new user, my hope for a MVP would be a working end-to-end reduction pipeline example designed for a straightforward set of observations (e.g. image-spectrum-image). With that in mind, my initial thoughts would be for a Jupyter Notebook style framework because it is highly visual and the documentation can be created simultaneously in a "readthedocs" style. If modular enough, this could easily accommodate different datasets, such as transitioning from HST to JWST.

        Please keep in mind that I'm not very familiar with the current software sets, workflows, and user difficulties, so please disregard any irrelevant suggestions. In examining the above table, a Jupyter Notebook style framework would seem to allow for the most features to be incorporated and maintained in a unified location. I agree with the suggestions above that the functionalities in a prototype might include: 1) initialize code directories and packages, 2) identify and download grism observations in an Astroquery style, 3) perform minimal data-quality checks with automated modules and print a user report, 4) proceed to spectral extraction by identifying a spectrum (or taking user-input locations from direct imaging), calculate required trace, wavelength, and so on, 5) extract the spectrum and perform calibrations, 6) run basic sanity checks on the extracted spectra and provide user report, 7) plot the extracted data products and report on extraction parameters such as aperture, contamination estimates, etc.

        In the most general terms, the barriers to entrance for new users of any software usually include lack of detailed documentation, and/or a lack of reference information for what choices are best for tunable parameters (e.g. if you allow the user to fit a Gaussian, Voigt, or Lorentz profile to something, suggest common choices and pros/cons for different scenarios). Finally, pointing out common pit-falls and sanity checks that should be performed along the way are helpful.

        The "wish-list" beyond this would then include emission (and absorption) line finding, possibly using user-input spectral templates, and integration with a line-fitting routine to produce emission line maps, kinematic maps, and an output format that can easily be manipulated to produce user-desired diagnostics such as using the measured line properties to calculate densities, temperatures, abundances, reddening, general line ratios, and so forth. One aspect that I've found helpful integrating into my own codes: allow for all tabular data (including list of figures) to be output in LaTeX ready format for tabular or aastex style deluxetables (and figure calls).

        Again, please consider these from the viewpoint of a beginner who is not (yet) familiar with common user difficulties and data challenges, but I hope this is helpful!

  4. Further from Megan Sosey:

    But my suggestion, was two fold:

    1. Create a datamodel for grism data in general, show it’s use
    2. Create notebooks, using a common example, for all the code bases that we have
      • To understand differences
      • To understand commonalities
      • To flush out better interface design for a common library product
  5. I'd like us to think about the concept of "eating our own dog food". Currently, the experts in the organization do not use, contribute to or develop the officially supported grism software. What does this new software need to be in order for us internal people to use it in our own science? Because if we don't use it, there will continue to be a shadow grism industry.

    Copying this from the roadmap with some editing to account for how my thinking has changed now that forward modeling has become more ubiquitous, these are my requirements:

    1. Preprocessing steps specific for grism data (flat fielding, sky subtraction, alignment of direct image)

    2. Extracting a spectrum for any RA, Dec projected into a grism frame

    3. Accurate, iterative contamination model which takes into account the 2D light distribution of the source, the SED/slope of the spectrum and emission lines

    4. Tools for forward modeling to allow for redshift fitting (including photometry), measurements of emission line fluxes and fitting models to spectra. These should be able to handle multiple available grisms, e.g., G102 and G141. I think this is very important because without these tools the community reverts to methods developed for slit spectra that disregard key differences.
    5. Create 2D emission line maps that combine correctly spectra taken at different angles.
    6. Can be used to simulate a grism image based on a direct image and some input SEDs or spectra and determine contamination at the expected position of the object of interest.

    7. QUICK LOOK coaddition of spectra from different exposures (interlacing/drizzling) and spectra from different roll angles (fine, 1D extraction too) with the caveat that this is not recommended for actual scientific analysis.