IJDPW Slack Archive - Session 2.3

This page archives Slack comments from Day 2, Session 3 of the Improving JWST Data Products Workshop (IJDPW).

Michael Regan - @Anna Pusack, there are pixels that are marked bad due to their neighbors spilling charge. This only shows up in Flat images.

Anna Pusack - We did see that some of the time, but most of the time, it didn’t seem that the early data was doing that correctly. More recent releases have been better about masking correctly.

Timothy Brandt - @Anna Pusack, for the saturation where the pipeline picks out an odd slope, there was a change in the count rate at that pixel at an intermediate read. Does this correspond with a neighboring pixel reaching saturation?

Anna Pusack - I will double check and get back to you. There were other examples different from this one. Quite a variety of ramp shapes.

Timothy Brandt. See Figure 3 of Data reduction pipeline for the CHARIS integral-field spectrograph I: detector readout calibration and data cube extraction for an example of how charge spilling can happen when pixels saturated--it's from an H2RG on a ground-based instrument.

Anna Pusack - Thank you for the resource!

Jeff Valenti - @Anna Pusack, the ramp example with a low slope for the first half of the ramp and a high slope for the second half may have a neighboring pixel that became saturated half way through the integration and began spilling charge into the pixel you plotted.

Anna Pusack - I’ll remind myself about that pixel in particular and get back. There were also a variety of other shapes that were isolated. The issue is more that if it is spilling over, and saturating, so it would be partially saturated I guess, but we are getting a slope from it that is making it through the pipeline. Should really have been masked, or is that a reliable slope?

Anna Pusack - That pixel doesn’t seem to have any saturated pixels around it. Some of the pixels have the jump detected tag, but not the saturated tag. Several of the pixels around it look similar, too.

Jeff Valenti - @Anna Pusack, thanks for checking. If you have it handy, please share the name of the data file and the pixel. This is worth further study.

David Law - @Anna Pusack, algorithmically the EMSM method is very nearly what you should get if you use 3d drizzle and convolve with a light smoothing kernel (which would smooth out resampling artifacts). For your data do you get comparable results if you smooth the drizzled cube?

Anna Pusack - @Inbal Mizrahi, worked a lot on comparing the two algorithms, and we settled on the emsm mostly to reduce the oscillations as much as possible. Maybe she can speak more to this question.

Inbal Mizrahi - @David Law, we haven’t tried smoothing the drizzle cube, but that’ll definitely be interesting to explore and compare. Thanks for the suggestion, I’m happy to follow up when I get some results!

Anil Seth - @Anna Pusack, exciting work! Is the source extraction all being done with aperture photometry, or are you also exploring PSF fitting?

Anna Pusack - All that I showed today was done with aperture photometry, but we’re thinking about PSF fitting

Howard Bushouse - PSF fitting may work for some of the stars near Sgr A*, but Sgr A* itself is very difficult to see/fit in a "normal" image (or IFU slice), because of all the contamination by PSF wings of the nearby stars. The only way to see it in an obvious way (and hence have something to fit) is to get time-resolved data and find two exposures where Sgr A* is quiescent and one where it's flaring, and subtract the 1st from the 2nd. That removes the stars and leaves the net flaring signal from Sgr A*.

Anil Seth - Given the very high quality of the input catalog and precise knowledge of the individual stars locations, I think something like PampelMUSE could work quite well for extracting spectra…

Anna Pusack - @Howard Bushouse, SgrA* would be a different beast. I think there is someone at UCLA looking into that and there is more to come along those lines. @Anil Seth, I have heard of recently PampelMUSE, but haven’t tried it yet. Thank you.

Anil Seth - @Anna Pusack, I led a proposal this year that would yield very similar NIRSpec IFU data to your galactic center data. If it makes it through, we will definitely be working on developing PSF fitting tools and/or adapting PampelMUSE for JWST. There is definitely a lot of power in knowing the spatial distribution of the sources from higher quality imaging data like you have; there is a really nice example of this here: A Stellar Census in NGC 6397 with MUSE the two stars shown have a separation of 0.266 arcsec, or 1.3 MUSE spaxels (and well below the resolution).

Anna Pusack - Oh, great. Thanks!

Michael Regan - @Anna Pusack, using new bad pixels mask with old data is tricky since we’re getting more bad pixels over time.

Anna Pusack - The bad pixel mask from the first part of my talk was created from the up-to-date dark ref file. So it’s not one and done, we would have to create a new one for each time we reduced the data. We have since abandoned that, since the newest ref files and pmap configuration has really remedied that over masking issue.

Marshall Perrin - Related to the growth of more bad pixels over time, @Michael Regan, is there in general a recommended/advised best way to get the bad pixel map that’s most appropriate for a given observation date? Does the pipeline + CRDS do this optimally right now, or is there room to improve?

Michael Regan - @Marshall Perrin, there is clearly room to improve. FGS is updating bad pixel masks around once a month. The other instruments are doing it at slower cadence.

Taylor Bell - @Anna Pusack, have you considered using PSF/PRF extraction? From my experience with stellar companions around exoplanet hosts, PSF/PRF modelling can be somewhat less biased by unresolved/partially-resolved companions.

Inbal Mizrahi - Our NIRCam team is actively working on psf fitting which we hope to use to complement our work (edited)

James Davies - Anyone have suggestions on how to model the NIRSpec IFU PSF?

Anna Pusack - I think that’s the part that’s slowed us down from implementing, because the IFU PSF will be different from NIRCam. The work that @Jean-Baptiste Ruffio talked about might complement what we’re doing, but we would keep what he subtracts!

James Davies - Agree @Anna Pusack. @Michele Perna’s talk (and paper) is the first indications I’ve seen of the shape and wavelength dependence of the NIRSpec IFU PSF.

Anna Pusack - A lot of what he talked about was exciting to explore.

Marshall Perrin - FYI, @Jean-Baptiste Ruffio and I are co-leading a cycle 2 GO CAL program to measure the NIRSpec IFU PSF. Data to be taken this spring, and will be public immediately. Program 3399 if you want to set up a MAST notification :-)

Néstor Espinoza - Hurray for public, community programs @Marshall Perrin! Again, thanks for crafting that.

James Davies - Fantastic @Marshall Perrin!

Marshall Perrin - I should advertise, since this is an open GO calibration effort, we’re definitely happy to have more collaborators join in analyzing those IFU PSF data once they arrive. That’s true for both totally-independent separate efforts, as well as for folks who might want to join in and lend a hand collaborating with us. (We’re already a joint exoplanets + solar system + quasar host galaxies collaboration, so the more science cases the merrier. Please message me or JB if interested.

Jean-Baptiste Ruffio - Sorry, catching up on Youtube presentations/slack. You do have a challenging dataset @Anna Pusack! I do agree that PSF fitting (possibly from the point cloud/detector images) could help. We will be working on trying to package some simple PSF fitting examples from the point cloud eventually if you are interested, happy to chat off line if you want. (edited)

Anna Pusack - @Marshall Perrin, @Jean-Baptiste Ruffio, definitely interested. Let me know what I can contribute with our use case!

Savannah Gramze - @Anna Pusack, I wouldn't be surprised if there were stars in every pixel of your data, as seems to be the case with other Galactic Center observations with NIRCam. How many stars do you resolve in your data?

Anna Pusack - In our NIRCam or NIRSpec data? The NIRCam is also under consideration still with some issues of the code choosing diffraction patterns as separate stars, so we don’t have a definite count yet. For NIRSpec, we are source matching using previous catalogs and get ~50 matches per mosaic pointing, so ~500 stars with spectra potentially.

Jeff Valenti - @Jane Morrison, @Ivana Barisic, question that is off-topic for this workshop, but I will ask anyway. What extra steps did you have to do to plan and specify the slit-stepping observations?

Michael Regan - @Ivana Barisic, we have a solution for the even/odd row effect. It’s a simple bug in IRS2.

Ivana Barisic - Excellent! would be great to chat about this.

Melanie Clarke - @Michael Regan, Is there a ticket or a PR for this? I haven’t seen or heard of the fix yet.

David Law - @Jane Morrison, @Ivana Barisic, super exciting to see these great slit-stepped IFU cubes!

Dries Van De Putte - @Jane Morrison, Is the stepped slit -> cube algorithm specific for the MSA data, or more generally applicable to other stepped slit data (e.g. multiple HST STIS slit pointings).

Jeff Valenti - @Ell Bogat, really a question for NIRCam folks, have we seen other examples of dark subtraction making images worse?

Everett Schlawin - Yes, especially for the short wavelength detector. I thought that Karl Misselt ended up delivering dark current reference files that were all 0.0 for the short wavelength SCA but can't remember the status of what is used currently (I also skip the dark current step).

Bryan Hilbert - That's correct. The current SW dark files have all non-hot pixels set to 0.0 in the darks. This was due to low-level but measurable persistence contamination in the commissioning and cycle 1 darks.

Jeff Valenti - @Ell Bogat, from you perspective, what parts of SpaceKLIP should be added to the standard pipeline, particular for hands-off generation of products in MAST?

Marshall Perrin - @Jeff Valenti, I agree with Ell’s answer re prioritizing the stage 1 improvements in outlier rejection and 1/f noise. FWIW I’ve copied this question over into our high contrast team slack channel for spaceKLIP development, to get opinions on the above from a wider cross section of the team.

Marshall Perrin - Related to the topic of coronagraph mask center location, let me point towards this open/unresolved pipeline issue from this past summer. The short version is, the onboard OSS scripts use the SIAF as implemented onboard at the observing date for target acquisition to place the star onto the coronagraph mask. Any subsequent changes to the SIAF information of course don’t change where OSS tried to put the star in the past. But the pipeline uses the current SIAF when generating astrometry… so these become out of sync for reprocessing of older coronagraphic data.

Marshall Perrin - @Jeff Valenti, following up here to your question earlier today. After separate discussion with Jarron Leisenring and Julien Girard, they concur the priority for upstreaming into the standard pipeline is the improvements in stage 1 and 2 for coronagraphic subarray data. Specifically that includes:

Improved iterative outlier pixel detection
Use of the edge pixels of the subarray as pseudo-refpix
1/f noise subtraction at the group level, optimized for coronagraphic data (i.e. with a hugely bright star right in the middle of it)
Turning off the subarray darks since those have insufficient SNR to be helpful

Jarron points out the spaceKLIP implementation of a Coron1Pipeline variant of the Detector1Pipeline could in principle be picked up wholesale and moved up into the main pipeline, with some cleanup. Of course that would then require additional pipeline rules to invoke that variant stage 1 pipeline for appropriate datasets...

Michael Regan - @Ell Bogat, why does the contrast curve for F1140C look like it is barely detectible but the image looks very clear?

James Davies - @Ell Bogat, would be really interested to hear more about the 1/f noise mitigation in the KLIP pipeline.

Marshall Perrin - The right person to talk to for that is Jarron Leisenring at U of A.Code available on github, with some plots in the PRs. See here and here.

Jo Taylor - @Ell Bogat, excellent snake logo.

Ell Bogat - Credit to Jason Wang's pyKLIP team for that one! pyklip

Jeff Valenti - @Taylor Bell mentioned Eureka!

Jeff Valenti - @Taylor Bell, In what way does column-by-column BG subtraction not work well when # of groups is large?

Taylor Bell - You know, that's a good question — Kevin Stevenson wrote that snippet and I'm not entirely sure why he felt that group-level background subtraction (GLBS) didn't work well for a large number of groups. I'll follow-up with him.

Taylor Bell - Just checked with Kevin and he said that is not really that it breaks or fails but that there isn't much benefit with many (e.g., 70) groups.

Jeff Valenti - @Taylor Bell, Should the cross-dispersion extraction height be larger by default?

Taylor Bell - Compared to the jwst pipeline's Stage 3 default? I frankly don't know — I've never looked that closely at jwst's Stage 3 pipeline, but Tom Greene has sent me default outputs that look horrific worried that we need to file a WOPR, while the Eureka! S3 outputs look absolutely beautiful. Things have been too fast-paced so far for me to have the time to play with jwst's Stage 3 to understand what all it is doing wrong right now.

Speaking generally, the ideal aperture size does seem to vary between objects (as there's a trade-off between capturing more starlight vs capturing more background). But I've found that MIRI/LRS SLITLESS typically benefits from quite small aperture sizes, NIRCam aperture sizes don't vary too much, and I've hardly worked with NIRSpec myself

Thomas Vandal - @Taylor Bell, Can you give more detail on the machine-learning techniques you are looking into for jump detection?

Taylor Bell - They're still very early-stage investigations, but the general idea is to use machine-learning and/or deep-learning methods (especially convolutional neural networks) to take advantage of the fact that pixels don't live in isolation as cosmic rays tend to hit at an angle and impact several pixels at the same time. The current method used by jwst (as far as I'm aware) does not take advantage of this effect and only uses two-point differences for each pixel. We're currently working on building a training and/or testing dataset so we can evaluate performance of different algorithms, and we'll get a proof-of-concept going soon-ish to be able to request sufficient funding from NASA to build a full-fledged algorithm that's generalized to various different instruments and observing modes.

Michael Regan - @Taylor Bell, be careful with converting MIRI DN to electrons. The Poisson noise variance is correct but using the gain value does not yield electrons due to the multiple carriers per photon at shorter wavelengths.

Taylor Bell - I'm not entirely sure I follow what you mean, but so long as the Poisson noise variance remains correct then that's all we need. Still curious to learn more about what you're saying though if there's a webpage or document you can point me to.

Michael Regan - @Taylor Bell, the variance from Poisson noise is correct but using the gain to determine the number of electrons is incorrect. This is because at the shorter wavelengths we get multiple electrons per detected photon. These extra electrons when used to create a gain measurement are noisier and imply a lower gain. But there are actually more electrons than the gain implies. So, we can use the gain to get the Poisson noise but if, for some reason, someone needed to get the number of electrons, there would be significantly more electrons in the pixel than what the gain implies.

Taylor Bell - Ah okay, I get what you mean. So with the current gain file, the units are basically DN/photon and not DN/electron? And I noticed the gain file for LRS SLITLESS has been updated on CRDS now, but I was surprised to see it was something like (vague wavelength-averages since I know it varies with wavelength) ~4.3 instead of the ~3.1 that I'd heard before (and been told by Tom Greene from other MIRI instrument folks) and had estimated myself based on scatter in lightcurves. Is the gain really that high and the time-series noise is that much worse than the photon limit, is this related to the electrons vs photons issue you're mentioning, or are there still known issues the CRDS gain file?

Michael Regan - When the correct values are used, the DN is the number that is the Poisson Variance. That’s the key point. It’s neither photons or electrons. It’s not photons because the extra electrons have a Poisson distribution that increases the Poisson noise. So, the gain value captures all the Poisson terms so that we can calculate the Posson noise. Well, The “gain” varies from 4.9 above 15 microns to around 3.9 at the shorter wavelengths. This means the reference file should show a variation the gain per pixel along the trace. I don’t clearly remember the gain at the shortest wavelength.

Taylor Bell - Okay, understood. Thanks a lot for this discussion!

Michael Regan - You’re welcome.

Jane Rigby - Is this helpful information documented in JDox?

Michael Regan - No, it’s not in JDOX. It is in some MIRI ESA Team internal docs. I did do the initial analysis that showed the varying QE but I have not documented it well.

Thomas Vandal - @Taylor Bell, can you give more detail on what the ramp fitting code in Eureka does differently from the pipeline?

Taylor Bell - I'll cycle back to answering this later — I myself haven't used this feature so I'll need to remind myself when I'm not splitting my attention

Taylor Bell - So I ran out of time to get back to you about this, but if you want to learn more I recommend you check out the following pages:

Thomas Vandal - Thanks!

Jeff Valenti - @Taylor Bell, regarding the spectral trace location, are there real high-frequency variations in addition to the slow drift?

Taylor Bell - I myself have only seen this in a couple of datasets, but yes, in some datasets there is high-frequency variations visible that impact the final lightcurves. I've seen this in NIRCam at least, and those figures in the slides were NIRSpec G395H. I can't remember off the top of my head whether I've seen that in MIRI/LRS.

Jeff Valenti - @Taylor Bell, I missed it. Do you perform optimal extraction on the original curved spectrum or the straightened spectrum?

David Law - I don't think it should matter if the shift was integer pixels?

Jeff Valenti - I think Horne-style optimal extraction fits along the spectral axis. I'd have to think about whether integer shift have an impact.

Taylor Bell - We do this on the integer-shifted pixels which indeed doesn't really matter since the profile is (roughly) independent for each wavelength. We do allow for smoothing in the spectral direction which is useful for some datasets, but likely doesn't work so well for the highly curved NIRSpec data.

David Grant - @Taylor Bell, does optimal extraction need a different profile at different times for transients? E.g., for in- versus out-of-transit for exoplanets?

Taylor Bell - We've not tried this, but so far there's been no obvious need for it. In early reductions of LRS data, I saw that the PSF width changed a lot during transit (~3% depth) which would suggest that changing the profile in transit vs out would be better. However, a semi-recent pmap update significantly improved the impact of BFE in my LRS observations and I no longer see an impact of the transit on the PSF-width, so I don't think there's a need to have time-variable profile based on the data I have looked at myself.

Michael Regan - We do know what causes the bad ramps in the LRS.

Sarah Kendrew - I don’t think we do?

Michael Regan - I’m pretty sure it related to trap capture in the masked region. We should move the LRS slitless subarray. It would increase the frame time by a little bit.

David Grant - @Sarah Kendrew, @Michael Regan, on a similar topic: could the LRS subarray be smaller to decrease the frame time? In my experience there could be half the columns and still have good background subtraction.

Michael Regan - Yes, we can make it smaller and remove the 390 Hz at the same time. Eddie Bergeron has a list of possible subarrays.

Sarah Kendrew - Yes, we’ve had various discussions about that and it’s an option on the table. But changing subarray definitions is not trivial and affects a lot of systems.

Taylor Bell - I'd say we do have some ideas (it's very clearly coincident with the gap between the Lyot and the 4QPM), but last I heard it's not clear why some observations are impacted but not other observations. I believe Sarah Kendrew and/or Achrène Dyrek have been trying to look into whether there's any correlation between the last-used filter, last time LRS or MIRI was used, etc.

Sarah Kendrew - Yes and happy to talk more about that.

Tyler Pauly - @Taylor Bell, do you find that the wrapping of jwst stages 1-3 is fragile to updates in the jwst packages? Are you able to run Eureka on the current pipeline versions? Have there been any features in Eureka that have gotten a PR to the jwst package?

Taylor Bell - This is a definite headache, so we freeze the jwst version with each Eureka! version to avoid conflicts as jwst changes. This is honestly one of the highest maintenance costs at present. jwst version 1.12 also has a hard conflict right now since it strictly requires a more-recent version of numpy than is compatible with theano which strictly requires a lower version of numpy. We've yet to figure out how we're going to resolve that.

Taylor Bell - At present though, none of our code has been pushed into jwst. As is appropriate, I definitely think it'd be worth moving as much of our Stage 1-2 code to jwst as possible to reduce maintenance costs for our small dev team. However, I also recognize that the jwst pipeline is already quite complicated and there is a limit to what they can offer while also trying to support every observing mode. Some mode-specific algorithms may be best left to independent or semi-independent pipelines, while mode-independent algorithms (or at least broadly applicable or absolutely critical algorithms) definitely belong in jwst in my mind though.

Sarah Kendrew - For context, the “bad ramps” we see in some, but not all, slitless LRS data (that @Taylor Bell mentioned in his talk) are likely related to some aspect of prior illumination of the detector. but, having looked at filter movements/positions prior to the TSO, idle time, for all our MIRI LRS TSOs, we have not yet been able to really pinpoint what it is (in order to avoid it).

Thomas Vandal - @Taylor Bell, you mentioned transit spectroscopy papers show reductions from multiple pipelines. To what extent do we understand where differences between them come from, and has this helped with the development of Eureka? Is this still common practice and still useful after > 1 year of analyses?

Taylor Bell - I can't speak for everyone, but within the MANATEE GTO team we regularly have 2-3 independent reductions of every NIRCam dataset. I'm the only one on that team currently setup to do MIRI analyses though. Talking with Nestor during the TSO session, his suggestion is that we should all establish a set of data on which we can routinely compare different analysis methods. E.g., we can first use it to understand differences between currently existing pipelines, and others can use the dataset to validate their new pipelines in the future.

As for what drives differences between different reductions, I'd say that the following have seemed to be the biggest sticking points from my personal experience from working primarily with NIRCam and MIRI:

Binning schemes: Binning then fitting seems most appropriate to me, but others disagree. The extent to which you bin your spectra also matters some — e.g., completely unbinned spectra typically don't make sense as that's smaller than a resolution element.
How you handle 1/f subtraction in NIRCam data can result in constant offsets.
How you handle background subtraction (which pixels, linear or constant per column, etc.) can result in some differences (especially around amplifier boundaries in NIRCam).
How you model the initial exponential ramp in MIRI data can result in different results.
Whether you use a linear in time or quadratic in time model when fitting the lightcurves for NIRCam. Some NIRCam datasets show clear curvature in their out-of-transit/eclipse baselines, while others don't show as obvious a curvature; what you choose to do when there isn't obvious curvature can result in differences.
What you do about stellar limb darkening can give you different results. Options include freely fitting limb darkening as a function of wavelength (likely to give unphysical limb darkening spectra and over-estimate transmission spectrum uncertainties), somehow scale a stellar model to the observations either by fitting all channels simultaneously or by offsetting the coefficients based on a broadband fit (gives intermediate results, and you have the choice of fixing the LD coeffs after scaling the model or using a narrow-ish prior), or just fixing the stellar limb-darkening to a model (assumes that stellar model is perfect which isn't going to be true, but might be fine depending on the dataset). I'm a bit biased toward the middle option, but regardless it's a bit of a tricky choice that can impact your final results.

As for helping to develop Eureka!, the overall mindset we've had on the dev team is to recommend what we think works well but try to avoid forcing anyone to do things the way we think they should be done. Nearly all of the different options described above are currently available to users, and we typically put what we think works best in the template ECFs and on the readthedocs page.

Thomas Vandal - Thank you for the detailed answer!

Michael Regan - We really only need a few DQ values. Most of them carry no information.

Michael Regan - Have you looked at the Picture Frame over the time variation. If there is a temp variation you’ll see the picture.

Anna de Graaff - @Jane Morrison, @Ivana Barisic, catching up with the talks on youtube this morning, it's very cool to see the slit stepping in action! you briefly mention that the dispersion direction is trivial, but I'm wondering how you handle the fact that the MOS traces contain both spatial + wavelength information in the dispersion direction? I.e. how do you interpret the velocity fields of the reconstructed cubes (especially the velocity dispersion)? For the example of the clumpy galaxy that you showed, I can imagine that a compact SF clump will have a narrower LSF than the diffuse ISM that it's embedded in (?).

Content

Space Tools