Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page archives Slack comments from Day 1, Session 1 of Day 1, Session 1 of the Improving JWST Data Products Workshop (IJDPW).

...

Excerpt

Jeff

...

Valenti - Ask questions and discuss first session here. Ideally, tag the relevant speaker(s) and "reply in thread". Don't be shy, but do be polite.


Jeff Valenti - @Howard Bushouse - You described validation against the schema for a data model. If users want to add something to a data model, is the best approach to define a whole new data model?

Tyler Pauly - Is the intent to add values/arrays to an instance of a datamodel, e.g. add a new array to an ImageModel? In general, datamodel schemas check requirements, i.e. missing parameters are a problem, while additional FITS extensions, new arrays assigned to the datamodel, etc. will not cause errors and in fact will be preserved. Example:

Code Block
languagepy
from jwst import datamodels

start_model = datamodels.open('my_image_file.fits')
start_model.squared_data = start_model.data ** 2
start_model.save('image_new_array.fits')
start_model.close()
new_model = datamodels.open('image_new_array.fits')

print(new_model.squared_data)

>>> <array (unloaded) shape: [2048, 2048] dtype: float32>

Jeff Valenti - @Howard Bushouse - In the step example in your presentation, powerpoint "smartened" your triple quotes.


Jeff Valenti - @Howard Bushouse - If a user step needs a new type of reference file, what is the best approach? Particularly if the enhancement may eventually be incorporated into the STScI pipeline.

Howard Bushouse - If you're writing a custom step that needs a custom/new type of reference file, during the development and testing phase you should be able to define the new ref file type in step or pipeline code using the reference_file_types definition that I showed in my slides, and then when running the step always use the --override_<myreffiletype> parameter to feed your custom ref file to the step. Then when/if the step gets incorporated into the official pipeline it will be necessary for the CRDS folks here at STScI to get the new reference file type defined in the CRDS system and ingest the ref file.


Loic Albert - @Howard Bushouse Could you point me to an example of using association files, please?

Sarah Kendrew - Not sure it’s useful but I just posted a demo of running the stage 2 & 3 pipeline for MIRI LRS (fixed slit) that illustrates how to create and use association files, here: miri_lrs_slit_end2end.ipynb

Howard Bushouse - A fairly simple example is in the calwebb_image3 pipeline, which just loads all the "science" type members from an ASN and combines them. Some example code starts at

Howard Bushouse - A somewhat more complicated example is in the calwebb_coron3 pipeline, which loads "science" and "psf" members from a CORON3 type ASN, as I showed in my slides. Example code starts at calwebb_image3.py#L71 which just loads the members from the ASN and then further down it sorts those into 2 lists of science and psf members, and then uses those sci_files and psf_files lists to send things that are needed to each of the pipeline steps.

Howard Bushouse - By the way, the reason why both of these examples explicitly limits the loading of the ASN members to just "science" and/or "psf" types, is because Stage 3 ASN files also list target acquisition exposures associated with the observation, but we don't need/want the targ-acqs in the pipeline processing (they're listed for information only).


Jeff Valenti - @James Davies - Does making a copy of the data model in a new step double memory usage for large exposures? Or is the data model light(er) weight than the full size of the data?

Paul Goudfrooij - FWIW, one can close the input image after copying it.

Howard Bushouse - Yes, if you were to leave both the input and copy open, you would double the memory usage. But as @Paul Goudfrooij said, you can immediately close the input model once the copy has been made.

Howard Bushouse - But speaking of memory usage, 2 of the biggest memory hogs are outlier_detection and resample, but both of those have an in_memory parameter that can be turned off, in which case input and intermediate files are not kept in memory all at once, but just open each file as it's needed and then close it when it's done. So that's a way to help reduce memory usage.


Jeff Valenti Just putting link here: snowblind.


Macarena Garcia Marin - @James Davies the example custom pipeline you showed would it also call the JWST pipeline or only the ones you specified? And you just answered my question!


Harry Ferguson - Datamodel documentation at datamodels/index.html is generally quite useful. However, one thing we could perhaps crowdsource at this workshop is suggestions for improving or clarifying that documentation, either in general or just in the description of a specific keyword.


Jeff Valenti - @Nathan Adams - Can you comment on which version(s) of the STScI pipeline substantially reduced memory usage for large mosaics?

Nathan Adams - allowed_memory was a parameter that was added for some of the STAGE3 steps such as outlier detection, I do not recall when exactly but that can be used to limit how much ram the pipeline tries to access but it is a bit slower to run.


Loic Albert - Howard Bushouse - Say I do a custom pipeline to replace the refpix step with my own 1/f step. Furthermore, say all exposures need to be ran through the detector1 steps prior to entering my custom 1/f step. How should I proceed?

  1. Loop over all exposures to run steps dqinit, saturation, dark, etc then run my own step on the files saved on disk, then resume the loop over all exposures?
  2. Some better way?

Howard Bushouse - That's obviously one (brute force) way to do it, but I'm guessing you could do it more smoothly and efficiently using the pre_hooks and post_hooks that @James Davies was talking about. That allows you to just insert your custom into the middle of the Detector1 pipeline on the fly, so that you don't have to run the first half and second half separately. I'm hoping James will add details.

James Davies - If your step is something like mymodule.My1fNoiseStep, then:

Code Block
languagepy
strun calwebb_detector1 jw001234_blah_uncal.fits --steps.refpix.skip=True --steps.linearity.pre_hooks="mymodule.My1fNoiseStep",

if linearity is on and runs after refpix.

James Davies

  1. skip refpix
  2. add a pre_hooks to the following step

Loic Albert - Thanks. I guess looping is inevitable before entering into my custom 1/f step because my step needs all previous steps to be applied before build a deep stack used for subtracting the astrophysical scene before subtracting the 1/f.  I guess my question was if there was a way to have recursive calls like in Makefiles such that strun custom1f would know to process all the exposures before attempting to create a deep stack. Sorry for the long question.

Sarah Kendrew - @James Davies what would that look like in a notebook? can pre- and post-hooks be included in the Detector1Pipeline.call(file, steps={….}) syntax somehow? Or can they be added to an asdf parameters file?

James Davies - Yes, from slide 16 in my talk, I insert 2 steps after jump

Code Block
languagepy
from jwst.pipeline import Detector1Pipeline


steps = {
    "jump": {
        "post_hooks": ["snowblind.SnowblindStep", "snowblind.JumpPlusStep"],
    },
}
input_data = "jw01345002001_14201_00001_nrcalong_uncal.fits"

Detector1Pipeline.call(input_data, steps=steps, save_results=True)

Sarah Kendrew - Great thanks!


Jeff Valenti - @Nathan Adams - Can you give us an example of how to use a bash script to run the standard pipeline in parallel? You gave an example for your pipeline in the slides.

Nathan Adams -

Code Block
languagebash
find uncals/ -name '*uncal.fits' | xargs -n1 –P16 -I {} strun STAGE1_params.asdf {}

Is the main line for running multiple instances of the same code. Note it is making P=X instances of the pipeline which isn't the most efficient but I've not experienced any bottlenecking doing it that way

Zhiyuan Ma - Just to chime in, speaking with my own experience of doing this (I use the GNU Parallel utility to fire the multiprocessing jobs): Before doing this, make sure the calibration data product (the CRDS cache) are pre-populated. The CRDS package will run into conflict  and race conditions if all the subprocesses are trying to download files from the CRDS server.

Nathan Adams - Yup we have this all saved locally first.


David Grant - @Nathan Adams would there be any advantage to 1/f correction at the group level for your data?

Paul Goudfrooij - Well that’s indeed where the 1/f noise is created (at each read) and hence it’s best in principle to remove it at that level, but (especially when the source counts are low per group), the bottleneck for a proper correction is the background not being flat across the field (I have tried this but without good results for datasets with faint sources).  And that aspect is much better in the flatfielded (_cal) images. Now what could work quite well is to first apply the flatfield at the group level, then run 1/f correction, then apply inverse flatfield, then ramp fitting etc. I have not tried that yet but I’m planning to do so.


Macarena Garcia Marin - @Howard Bushouse or @Tyler Pauly, do you have a sense whether parallelization using -PNumber_of_cores is well known or broadly used? Are there any other options to speed the process up?

Howard Bushouse - No, I personally don't know how well known the use of -PNumber_of_cores is. Currently there are only 2 steps in the whole pipeline ecosystem that explicitly allow for parallel processing and that's the jump and ramp_fit steps in the Detector1 pipeline. Using the maximum_cores param for each step, you can process a single exposure in parallel, in which case the single exposure is broken into blocks/slices for parallel processing. The use of -PNumber_of_cores is more applicable to running multiple instances of something like the Detector1 pipeline in parallel, e.g. when you have a lot of uncal files that need reprocessing.

Dries Van De Putte - There are also other general command line tools to run the same program in parallel for many arguments on the command line (I’m partial to GNU parallel). I’m currently using tools from the standard “multiprocessing” module in python, to just run multiple “pipeline.call()” calls in parallel from within python.

Michael Regan - -PNumber_of_cores is not the same as multprocessing.

Macarena Garcia Marin - @Michael Regan are there options to multiprocessing for these large mosaics?

Michael Regan - Does the Pnumber_of_cores run a set of detector1 pipelines or one pipeline in multiprocessing?

Howard Bushouse - What was the answer to this? I get the impression it's used to run multiple instances of a step or pipeline in parallel, rather than doing multiprocessing within a given step. Is that correct?

Nathan Adams - Indeed it isn't multiprocessing within one pipeline but running multiple instances of it, not the most efficient but we found it to be a simple way of running non-intensive components of the pipeline quickly on many uncal (or subsequent) files


Mario Gennaro - I'd be happy to take offline questions on claws/wisps origin and why sometime a static template does not work 100% well. It's definitely related to the complex optical mechanism that is the root cause of claws/wisps.

Michael Regan - Do we know the cause of the Wisps?

Mario Gennaro - Yes. Creating a wisps/claws channel. Let's take it there those interested.

Mario Gennaro - Join the claws and wisps channel if you are interested in the topic.

Jo Taylor - claws-and-wisps for those looking.


Jeff Valenti - @Nathan Adams - Please provide a link for the Robotham reference, if one is available.

Sarah Kendrew - Dynamic Wisp Removal in JWST NIRCam Images

Sarah Kendrew - (typo in the last name on Nathan’s slide)

Nathan Adams - Whoops my bad on the typo!


Michael Regan - It is best to correct [1/f noise] where it occurs in the groups.


Harry Ferguson - @Nathan Adams A few key steps for CEERS and NGDEEP for background subtraction that have helped a lot:

  • Start with a big ring-median filter across each image
  • Do several steps of source masking using different kernel widths and mask-dilation factors to help get both compact faint sources and diffuse wings of brighter sources
  • Combine the masks from all the different bands (in our case including HST) before doing the background fitting using the unmasked pixels individually for each band.

Nathan Adams - Thanks for this. Currently we are using photutils to identify objects in the image after stage 2. We then do a flat, uniform background subtraction followed by a 2D background subtraction. We then do a second 2d background subtraction on the final mosaic too. The ring-median filter is something we didn't know of and i can investigate. mask dilation is something we do at various points but should be before background i agree to make sure we aren't oversubtracting real objects through the uptick in flux near them.


Alicia Canipe - Jump and ramp-fitting multiprocessing (maximum_cores): jump/arguments and ramp_fitting/arguments.


Michael Regan - I don’t understand the column subtraction. That is the slow read direction.

Thomas Williams - As I understand it’s an artifact of the odd/even column subtraction in the lv1 step, it causes some column-to-column variation.

Michael Regan - The odd/even correction is one value per amp. I don’t see how it creates the observed pattern.


Everett Schlawin - @Mic Bagley, what is the magnitude of the vertical column correction compared to the horizontal 1/f correction?

Ryan Endsley - My recollection from my own investigation is that the horizontal contribution is substantially greater than vertical.

Mic Bagley - I agree with @Ryan Endsley! I mentioned briefly that we've only explore the amplitude of the pattern after we've already subtracted the horizontal pattern, so I don't currently have numbers for the vertical pattern by itself, but I do expect it to be significantly lower than the horizontal correction


Taylor Bell - @Mic Bagley when you said the 1/f noise was semi-consistent between groups, were you looking at differenced groups or just the groups? Since groups aren't differenced, noise in group 1 will be carried forward into group 2, so that could be a part of why 1/f noise in different groups could look the same.

Loic Albert - Also, did you subtract a superbias before doing the 1/f correction?

Michael Regan - The superbias drops out of the differences.

Loic Albert - Unless you use (groupn - superbias) as the image you are correcting. And repeat for all n.

Mic Bagley - @Taylor Bell Thanks, that's very interesting. I've looked at both groups and differenced groups. The image I showed was from just the groups, and I can't remember what I found with the differenced groups but I'll dig that back up. It's helpful to know that pairs of groups are correlated (@Michael Regan was describing that during the break), though we seem to be seeing this in >2 groups. I'm wondering if there's perhaps 1/f in one of the reference files that's being imprinted in the groups. But all this is very early days for our work performing this correction on groups instead of rate images.

Mic Bagley - @Loic Albert I've been running this correction immediately before the ramp fitting step, so the superbias has been subtracted.


Paul Goudfrooij - @Mic Bagley Have you tried playing with non-default parameter values in the outlier_detection step?

Ryan Endsley - @Paul Goudfrooij, I can't speak for Mic but I've played around with the parameters. I've converged on adopting image3.outlier_detection.snr = '3.0 2.0' but that doesn't catch everything for programs (e.g., CEERS) with only 3 exposures per point on the sky.

Paul Goudfrooij - Yes there is no “best choice for all cases” really, and I found the sensitivity to parameter values is quite high for datasets with few dithers.

Mic Bagley - Yes, we've explore a large grid of parameters and haven't found a set that catches all the cases of outliers we're worried about (i.e. the ones that keep showing up in galaxy samples). There are trade-offs and I haven't had a chance to establish a way to decide on a winner. We're really limited by the 3 dithers.


Michael Regan - Slow mode should not change the flat.

Jeff Valenti - Reformulating as a question for @Mic Bagley - Do you see a difference in flat field between fast and slow mode?

Mic Bagley - This is a question I'll forward to Guang Yang, who has been leading our MIRI reduction. I believe he's had a few conversations with Karl Gordon about this, and it was recommended to use in-flight FAST flats instead of ground-based SLOW flats for the CEERS F560W and F770W which use slow mode. I don't think we have any comparison of SLOW and FAST mode flats (mostly because the in-flight vs ground would be sort of an apples and oranges comparison) but I'll check with Guang.


James Davies - @Mic Bagley Are the outliers that you’re seeing the persistence of the saturated cores of snowballs?  Or perhaps RC-type pixels or “flowers” that one sees in the NIRCam long detectors?  We’ve found snowball core persistence images look like little fuzzy galaxies.

Mic Bagley - The majority of them do not seem to be the saturated cores of snowballs (though we definitely see some of those). They tend to have fairly sharp edges.

Mic Bagley - @James Davies what are the "flowers"? I'm not familiar with that term

James Davies - Over in Euclid-land, we are calling these “flowers”. Dark center with bright petals.

Image ModifiedImage Modified

James Davies - The above is NIRCam long.

Mic Bagley - Thanks, yeah some that are making their way through are 'flowers' and some are more like rectangles (flowers with more petals?). We see these with and without flag_4_neighbors set in the jump step

Image Modified


Abigail Lee - @Mic Bagley How much do the 1/f, snowball, wisps, etc. corrections you've applied to CEERS NIRcam imaging improve the photometry ( + how could you quantify this)?

Mic Bagley - I haven't checked this in a few reduction versions, and I definitely should recheck, thank you for the question! Late last year we were finding (very loosely) ~3-8% difference in photometry with and without the wisps+1/f corrections (we didn't look at them individually, but instead compared our reduction to a version without any custom steps).

The way we were checking our flux calibration for CEERS is by matching isolated sources and comparing photometry in similar HST and IRAC bands. We PSF match the NIRCam filter to the WFC3/IRAC filter, do source detection in dual image mode for identical apertures, and compare the fluxes. The bands aren't exactly the same so we're really only using this for relative photometric comparisons (comparing calibrations across NIRCam detectors, or with and without some corrections). The filter comparisons were:

  • F115W - F125W
  • F150W - F160W
  • F356W - 3.6 µm
  • F444W - 4.5 µm

We did a different test for F200W, F277W and F410M, which was a comparison to the NIRCam fluxes of the best-fitting SEDs from the Stefanon+2017 catalog. Again, good for relative comparisons and not absolute flux calibration. These tests were led by Pablo G. Pérez-González.

We try to describe this in CEERS Epoch 1 NIRCam Imaging: Reduction Methods and Simulations Enabling Early JWST Science Results.

Abigail Lee - Wow 3-8%!! That's awesome. Thank you @Mic Bagley.

Mic Bagley That was a while ago! So please don't quote me on that now. I'm working on updating our CEERS reduction and then I'll retest this


Michael Regan - What was the magnitude of the non-flagged pixels?

David Law - Following up on this, it may be worth testing if the IFU outlier detection algorithm does a decent job for the MOS data too.  (Not sure whether this is yet being done by default).


Michael Regan - Can you provide the uncal with the large artifact that was flagged by hand?

Mic Bagley The NIRCam data set was: jw01345069001_05201_00001. At the time, I believe we were using pipeline 1.8.5 and I haven't gone back with newer versions to see if the jump step would automatically detect it yet. Were you referring here to the jump across the full detector, or the persistence from scattered light?

Michael Regan - The jump across the full detector not getting flagged is confusing to me. Thanks for the file.


Jeff Valenti - @Mic Bagley - Regarding custom master background, why aren't empty shutters in same slit also impacted by artifacts?

Mic Bagley - Many are, and for now I believe we're using a similar masking approach for artifacts in all microshutters. So, very time-intensive and not a long-term solution but doable at the moment for sources we know are extended and/or when there's another source in one of the slits.


Abigail Lee - @Mic Bagley why would 1/f corrections on individual groups be advantageous over just one 1/f correction to the final calibrated image?

Paul Goudfrooij - In principle, the 1/f noise is inserted at the group level (during individual reads).

Jeff Valenti - Stay tuned for talks specifically about 1/f.

Mic Bagley - To echo @Paul Goudfrooij’s comment, the 1/f in the countrate images is a combination of 1/f that's introduced in each group (at least how I understand it), so removing it at the group level would maybe provide a cleaner correction.


Michael Regan - Can you provide the uncal for the Corrupted Jumps?

Mic Bagley - I might have replied to the wrong thread above, but this is the NIRCam dataset: The NIRCam data set was: jw01345069001_05201_00001.

Michael Regan - @Mic Bagley Do you remember which detector it was?

Mic Bagley - It affected all detectors in the exposure, LW and SW.


Jeff Valenti - @Mic Bagley - Please explain again the symptoms of "corrupted jumps" - I didn't quite get it during the talk.

Martha Boyer - Just for another data point:  I saw something very similar in the absolute flux data.  I’m not sure if it’s the exact same problem, but it manifests similarly.  I was able to mitigate by adjusting the threshold in the outlier detection step in Stage 1.

Mic Bagley - We're finding that in one of the groups, suddenly half of the detector experiences a jump all together. In the subsequent group, the other half of the detector does. This seems to indicate that something happened maybe in the middle of the readout.

Mic Bagley - From another thread: At the time, I believe we were using pipeline 1.8.5 and I haven't gone back with newer versions to see if the jump step would automatically detect it yet. I believe Guang had an extensive conversation about the MIRI case with the Help Desk last year digging into why the jump step wasn't catching it.


Varun Bajaj - I'm glad it wasn't me just furiously beating my head against reducing data with Jupiter persistence.


Michael Regan - The ETC is only an estimate.


Howard Bushouse - @Mic Bagley You mentioned a change in CR detection between jwst versions 1.10 and 1.11. Was that for imaging data?  I believe it was 1.11 that included a major change to IFU outlier detection, that eliminated previous problems with lots of false positives, but perhaps went too far the other way, such that some fraction of real hits were no longer detected and flagged (which is being worked on).

Alicia Canipe - Following, I was also wondering this.

Mic Bagley - It was for NIRCam imaging. I don't have a robust comparison yet, we were just blinking mosaics between our reductions and noting the pixels that were making it through in the newer reduction. I hope to dig more into this in the next ~month.


Michael Regan - The deep problem could be related to missing Dark Current Poisson noise that I opened a ticket on.


Jeff Valenti - @Mic Bagley - For the NGDEEP-NRC is too shallow issue, can you tell the difference between the observatory itself having a noise source we've missed vs. data processing not yet getting the most out of the data.

Mic Bagley - That's one of our leading questions! We've been trying to understand the cause so we can separate out problems that may be "recoverable" down the line with updated reference files/pipeline steps from those that may be innately in the images because of our readout mode. To be safe, we've changed the readout of the upcoming NGDEEP observations (DEEP8, 4 groups --> MEDIUM8, 7 groups) but we're not confident yet that will solve all issues, and we'd also like to regain some of that sensitivity from the first half of the observations.


Benjamin Johnson - @Mic Bagley are you treating the 7 different integrations separately, or averaging them during stage1/2?

Mic Bagley - We're trying both approaches. But for the depths I quoted, those are from stage1 unaltered, meaning we used the rate files from the default ramp fitting step. (Instead of the rateints for example)

Benjamin Johnson - Thanks, I've been curious about the effect of treating them entirely separately, but have not explored.  IIRC the rate is a straight average of the rateints, but I wonder if there's something to optimize in that combination.

Mic Bagley - Yes, we're working now on exploring ways to fit the ramps and/or combine integrations. Definitely open to any ideas/suggestions you may have.


Nathan Adams - @Mic Bagley Our NGDEEP data is currently 30.3 in 277, 30.2 in 356 and 444 if i recall. F150 and 200 are still not great. (this is 0.16as radius apertures). We were currently debating if it was too small groups, everywhere else going mag 29+ is mostly using NGROUPS of 7+ I think. We are curious if NGEEP has considered changing the plan for the 2nd epoch as an experiment to see if it is one of these potential issues?

Mic Bagley - Thanks @Nathan Adams, this is a useful comparison! Yes, our too leading theories are too few groups and too many integrations. The former could be causing poorly-fit ramps (which we're digging into now) and the latter could increase any fixed pattern noise that may be being introduced by reference files. For NGDEEP Epoch 2, we've converted all images to MEDIUM8 with at least 7 groups. We're still trying to determine if we need to change our dithers as well (difficult because of the constraints with NIRISS primary obs).


Mario Genaro - When you compare the "exposure times" between the 2 surveys, what TIME are you using, ie the 55k vs 90 k are coming exactly from where? @Mic Bagley I am saying that becuase the TIME that is relevant for SNR measurements in NIRCam is Nints*Tgroup*(Ngroups-1), which the ETC calls "time between the first and last group". This time is different from DURATIOn and/or EFFEXPTM. The pipeline effectively uses the first group as the first "measurement" and not the "reset value", In other words, due to bias drifts, we do not start the up-the-ramp fit at t=0 by using a superbias but we rather "skip" the time between the rest and teh first group. This effectivlt shortens what one would naively think of as the exposure time. The shortening is larger (in proportion) if you have fewer groups. So if you are comparing 2 surveys with same DEEP8 pattern but very different NGROUPS, you may actually not properly accounting for this effect, depending on what TIME you are using.

Mario Genaro - @Bryan Hilbert or @Mic Bagley can you point me to the 2 programs that are being compared? I can look at the numbers myself.

Bryan Hilbert - NGDEEP is 2079. MDS is 1283.

Mic Bagley - I was loosely quoting the total time, but the too-few groups has been a leading candidate for us. It's good to know about this shortening!

"you may actually not properly accounting for this effect" --> are there ways to account for this in the ramp fitting that would allow us to "recover" some of that time? Or were you referring to accounting for the effect in our comparison with the MIDIS images?

Bryan Hilbert - I think he was referring to the latter. The time calculation that Mario mentioned above will "subtract" a larger fraction of time from NGDEEP, with its few groups/more ints, compared to MDS with its more groups/fewer ints.