GSoC 2015 blog

Tuesday, August 25, 2015

Final project report part 2: overview

Overview

I am very happy with the outcome of my work at GSoC. On the one hand it is true that the complete list of goals in the original application has not been accomplished. On the other hand, the focus of contributions changed once I started working. I implemented a lot of observation handling code that was not already available as planned, instead of background model application to source observations. Indeed, the majority of the items in the API proposal have been worked out and added as functionality to Gammapy. In addition, I participated in the necessary code cleanup for the release of Gammapy version 0.3.

List of main pull-requests during my GSoC (all merged in the trunk):

Document observation tables and improve gammapy.obs [#278]
Observation table subset selection [#295]
Add cube background model class [#299]
Make background cube models [#319]

Other relevant pull-requests (also merged):

Add function to fill acceptance image from curve [#248]
Consistent random number handling and improve sample_sphere [#283]

(There are more (smaller) pull-request dealing with cleanups, fixes or small additions that are not listed here.)

All in all, a lot of new functionality has been successfully added to Gammapy, as demonstrated by the example in the example section in the final project report part 1 post, and the examples in the documentation links listed below, making this a fruitful project.

As summary of my work produced during the GSoC, I am repeating the list of links to the documentation I produced during the GSoC, that I already posted in the progress section in the final project report part 1.
This documentation explains the most important contributions produced during my GSoC project:

I would like to take the opportunity to thank the mentors of the project for their useful input. There was at all times at least one of them there to answer my questions and give positive feedback. I learnt a lot during this summer!

For instance I deepened my knowledge of the scientific python stack: numpy, scipy, matplotlib. I learnt of course the basic tools needed for my project: git, GitHub, Astropy, Gammapy. I also learnt how to work collaboratively in a pull-request system with code reviews.

These recently acquired knowledge, together with my previous skills in programming with object-oriented languages granted me access to the list of Gammpy core developers and maintainers with write access to the repository.

To conclude, the completion of this works has taken many hours of hard work (~ 500 h), much satisfaction, some frustration and 0 drops of coffee! :-)

Monday, August 24, 2015

Final project report part 1: progress

The last 2 weeks of Google Summer of Code (GSoC) have been a bit intense and exciting.

Gammapy 0.3

First of all Gammapy version 0.3 has been released. It is still an alpha version of Gammapy. But it contains already some of the functionality I developed for the GSoC. For instance:

dummy observation table generator.
container class for cube data.
dummy background cube model generator.
observation selection tools.
many small fixes.

Progress

As for my work on the last 2 weeks, I spent much of the time refactoring the code of the background cube model production script mentioned in my previous report and integrating the functionality into Gammapy. In addition, I added some tests
to assure that the recently added code works, and documented the new classes, methods and scripts.

New functionality added to Gammapy:

Observation grouping classes: classes to define groups of observations with similar properties (like observation altitude/azimuth angle or number of telescopes participating in the observations) in order to be analyzed in groups instead of individually
Format converters: methods to convert the observations list formats of different experiment into the Gammapy observation list format.
Dummy dataset generator: methods to generate a dummy dataset with event lists (data) and effective area tables (instrument response functions) in order to test some of the Gammapy functionality. The tools work as follows:

generate a dummy observation table, generated with the tool mentioned above.
Simulate background data (using a very simple model) according to the produced observation table
Store the data emulating the data structure of an existing experiment, like H.E.S.S..

Class to handle the creation of background cube models: the methods acting on background cube models have been divided into 2 classes:

Cube: the basic container class for cube data mentioned above. This class is kept as generic as possible, allowing other classes to use it to contain other kinds of cube data in the future. It contains basic I/O functionality, plotting methods, and methods operating on the cubes.
CubeBackgroundModel: class with methods specific for the background cube model productions, such as binning, histogramming, and smoothing. It also defines 3 cubes necessary for the model production:

counts cube: contains the statistic participating in the model.
livetime cube: contains the livetime correction to apply.
background cube: contains the model.

Command line tool to run the production of background cube models: this tool takes as input a dataset either from an existing experiment like H.E.S.S., or simulated data produced with the tools mentioned above and produces the models in several steps:

Produce a global observation list, filtering observations taken close to known sources. The tool selects all observations far from the sources listed in the TeVCAT catalog.
Group the observations according to similar observation properties, in this case: altitude and azimuth angles.
For each group produce the background cube models by:

define binning.
fill events and livetime correction in cubes.
fill events in bg cube.
smooth the bg cube to reduce fluctuations due to (low) Poisson statistics.
correct for livetime and bin volume.
set 0 level to something very small.

Example script to make a few plots comparing 2 sets of background cube models.
Added a new file to gammapy-extra with a test CubeBackgroundModel object

More details on the background cube model production can be found online in the documentation I produced during the GSoC, especially in the last week:

Example

As an example of the new capabilities of Gammapy in order to produce background models, I produced 2 different models using the tools I developed and documented here and compared them:

A simulated background cube model (a.k.a. true model).
A reconstructed model (a.k.a. reco model) using simulated data following the model used for the true model.

The models very roughly represent the background seen by a H.E.S.S.-like experiment at an altitude close to the zenith and South azimuth.

Using the example script to plot 2 models together for comparison I produced the following plot (click for an enlarged view):

The plot shows that the reconstructed (reco) model agrees quite well with the simulated (true) model.

The following animated image shows the data of the reco model (click for an enlarged view):

Each frame represents one energy bin of the model, ranging from 0.1 TeV to 80 TeV. The pictures represent the (X, Y) view of the model in detector coordinates (a.k.a. nominal system), covering a 8 x 8 square degree area.

It is shown that Gammpy can be now used to produce accurate background cube models.

Friday, August 7, 2015

Progress report

My work on the last 2 weeks has been mainly focused in the Gammapy tools to select observations from an observation list that were introduced in the mid-term summary report, and on the script to produce cube background models presented in the last progress report. The progress on these 2 topics is presented in more detail in the following sections.

In addition, I also contributed to some of the clean-up tasks in order to prepare for the release of the Gammapy 0.3 stable version in the coming weeks.

Observation selection

I restructured the code in that I produced a few weeks ago to make it clearer and defined 3 main observation selection criteria:

Sky regions: these methods select observations on a certain region of the sky, defined as either a square (sky_box), or a circle (sky_circle).
Time intervals: this method selects observations in a specific time range, defined by its minimum and maximum values.
Generic parameter intervals: this method selects observations in a (min, max) range on a user-specified variable present in the input observation list. The only requirement for the variable is that it should be castable into an Astropy Quantity object: the variable should represent either a dimensionless quantity (like the observation ID), or a simple quantity with units (like the altitude angle or the livetime of the observations).

More details are given in the documentation I wrote for the select_observations function.

In addition I produced a working inline-command tool to act on an input observation list file and output a selected observations file. This tool can perform the same kind of selections mentioned above in a recursive way. More details are given in the documentation I wrote for the gammapy-find-obs tool, that uses the find-obs script.

In order to test the gammapy-find-obs tool, a dummy observation list file produced with the make_test_observation_table tool presented in the first report has been placed in the gammapy-extra repository here.

Cube background model production

I produced a working version of the script that produces cubes similar to the one presented in the animation on the last post with many improvements.

The new version of the script uses a finer binning both in altitude/azimuth angles for the observation grouping, and in (X, Y, energy) coordinates for the cube itself. The binning on (X, Y, energy) also depends on the amount of statistics (coarser coordinate binning for observation groups with less statistics)

In addition, a smoothing to avoid Poisson noise due to low statistics is applied to the background cube model. The smoothing also depends on the available statistics: cubes with more statistics are smoothed less than cubes with less statistics. The smoothing applied is quite simple. It is performed by convoluting the 2D image of each energy bin of the cube independently with a certain kernel, and scale the smoothed image to preserve the original integral of the image. The kernel chosen is the default kernel used by the ROOT TH2::Smooth function: it is named k5a and acts on 2 layers of surrounding pixels (i.e. 2 next neighbors). Interpreting images as 2D arrays, the kernel is represented by the following matrix:

0 0 1 0 0

0 2 2 2 0

1 2 5 2 1

0 2 2 2 0

0 0 1 0 0

I applied the script to the H.E.S.S. data to produce background models for a full set of altitude/azimuth angle bins with satisfactory results. Unfortunately, since the data is not public, I am not allowed to post any figures or animations showing them.

The current version of the script is functional but still a bit chaotic, because it consists of only a few long functions. Therefore, the next steps, to be accomplished in the remaining time of the Google Summer of Code, is to integrate the script into Gammapy, refactorize the code and move most of its functionality into methods and classes within Gammapy.

Friday, July 24, 2015

Progress report

My work on the last 3 weeks has been mainly in the container class for the cube background models (X, Y, energy). The class is called CubeBackgroundModel and the code has recently been merged to the master branch of Gammapy.

The class has remodeled after a few code reviews from its first draft as in the post on Friday, June 19, 2015. For instance it can read/write 2 different kind of FITS formats:

FITS binary tables: more convenient for storing and data analysis.
FITS images: more convenient for the visualization, using for instance DS9.

For the records, FITS is a standard data format largely used in astronomy.

In addition, the plotting methods have been also simplified to allow a more customizable API for the user. Now only one plot is returned by the methods, and the user can easily combine the plots as desired with only a few lines of code using matplotlib.

A new function has been added to the repository as well for creating dummy background cube models called make_test_bg_cube_model. This function creates a background following a 2D symmetric Gaussian model for the spatial coordinates (X, Y) and a power-law in energy. The Gaussian width varies in energy from sigma/2 to sigma. An option is also available to mask 1/4th of the Gaussian images. This option will be useful in the future, when testing the still-to-come reprojection methods, necessary for applying the background model to the analysis data to subtract the background. Since the models are produced in the detector coordinate system (a.k.a. nominal system), the models need to be projected to sky coordinates (i.e. Galactic, or RA/Dec) in order to apply them to the data.

The work on the CubeBackgroundModel class has also triggered the development of other utility functions, for instance to create WCS coordinate objects for describing detector coordinates in FITS format or a converter of Astropy Table objects to FITS binary table ones.

Moreover, a test file with a dummy background cube produced with the make_test_bg_cube_model tool has been placed in the gammapy-extra repository here for testing the input/output (read/write) methods of the class.

This work has also triggered some discussions about some methods and classes in both the Astropy and Gammapy repositories. As a matter of fact, I am currently solving some of them, especially for the preparation of the release of the Gammapy 0.3 stable version in the coming weeks.

In parallel I am also currently working on a script that should become a command-line program to produce background models using the data of a given gamma-ray astronomy experiment. The script is still on a first draft version, but the idea is to have a program that:

looks for the data (all observations of a given experiment)
filters out the observations taken on known sources
divides the data into groups of similar observation conditions
creates the background models and stores them to file

In order to create the model, the following steps are necessary:

stack events and bin then (fill a histogram)
apply livetime correction
apply bin volume correction
smooth histogram (not yet implemented)

A first glimpse on such a background model is shown in the following animated image (please click on the animation for an enlarged view):

The movie shows a sequence of 4 images (X, Y), one for each energy bin slice of the cube. The image spans 10 deg on each direction, and the energy binning is defined between 0.01 TeV and 100 TeV, equidistant in logarithmic scale. The model is performed for a zenith angle range between 0 deg and 20 deg.

There is still much work to do in order to polish the script and move most of the functionality into Gammapy classes and functions, until the script is only a few high-level calls to the necessary methods in the correct order.

Thursday, July 2, 2015

Mid-term summary

Mid-term has arrived and quite some work has been done for Gammapy, especially in the observation, dataset and background modules. At the same time I have learnt a lot about Gammapy, Astropy (especially tables, quantities, angles, times and fits files handling), and python (especially numpy and matplotlib.pyplot). But the most useful thing I'm learning is to produce good code via code reviews. The code review process is sometimes hard and frustrating, but very necessary in order to produce clear code that can be read and used by others.

The last week I have been working on a method to filter observations tables as the one presented in the figure on the first report. The method is intended to be used to select observations according to different criteria (for instance data quality, or within a certain region in the sky) that should be used for a particular analysis.

In the case of background modeling this is important to separate observations taken on or close to known sources or far from them. In addition, the observations can be grouped according to similar observation conditions. For instance observations taken under a similar zenith angle. This parameter is very important in gamma-ray observations.

The zenith angle of the telescopes is defined as the angle between the vertical (zenith) and the direction where the telescopes are pointing. The smaller the zenith angle is, the more vertical the telescopes are pointing, and the thinner is the atmosphere layer. This has large consequences in the amount and properties of the gamma-rays detected by the telescopes. Gamma-rays interact in the upper atmosphere and produce Cherenkov light, which is detected by the telescopes. The amount of light produced is directly proportional to the energy of the gamma-ray. In addition, the light is emitted in a narrow cone along the direction of the gamma-ray.

At lower zenith angles the Cherenkov light has to travel a smaller distance through the atmosphere, so there is less absorption. This means that lower energy gamma-rays can be detected.

At higher zenith angles the Cherenkov light of low-energy gamma-rays is totally absorbed, but the Cherenkov light cones of the high-energy ones are longer, and hence the section of ground covered is larger, so particles that fall further away from the telescopes can be detected, increasing the amount of detected high-energy gamma-rays.

The zenith angle is maybe the most important parameter, when grouping the observations in order to produce models of the background.

The method implemented can filter the observations according to this (and other) parameters. An example using a dummy observation table generated with the tool presented on the first report is presented here (please click on the picture for an enlarged view):

Please notice that instead of the mentioned zenith angle, altitude as the zenith's complementary angle (altitude_angle = 90 deg - zenith_angle) is used.
In this case, the first table was generated with random altitude angles between 45 deg and 90 deg (or 0 deg to 45 deg in zenith), while the second table is filtered to keep only zenith angles in the range of 20 deg to 30 deg (or 60 deg to 70 deg in altitude).

The tool can be used to apply selections in any variable present in the observation table. In addition, an 'inverted' flag has been programmed in order o be able to apply the filter to keep the values outside the selection range, instead of inside.

Recapitulating the progress done until now, the next steps will be to finish the tools that I am implementing now: the filter observations method described before and the background cube model class on the previous report. In both cases there is still some work to do: an inline application for filtering observations and more methods to create cube background models.

The big milestone is to have a working chain to produce cube background models from existing event lists within a couple of weeks.

Friday, June 19, 2015

Progress report

The last two weeks have been a bit tough. After finishing the observation table generator mentioned in the previous post, I started working on the background module of Gammapy.

I am currently working on a container class for 3D background models (a.k.a. cube background models). The three dimensions of the model are detector coordinates (X and Y) and energy. These kind of background models are largely in use by Fermi. The development of tools for creating such background models is an important milestone in my project and so implementing a class to handle them is crucial. For now, the class can read cube models from fits files and slice the 3D models to produce 2D plots of the background rate.

Two kinds of plots are produced:

Color maps of the background rate for each energy slice defined in the model.
1D curves of the spectrum of the background rate for each detector bin (X, Y) defined in the model.

The figures attached have been performed using a sample background model fits file from the Gammalib repository. (Please click for an enlarged view).

More functionality should come soon into this class, for instance, methods to create the bg models using event lists and the smoothing of the models to attenuate the effects of the statistical nature of the event detection. As I mentioned, I had some trouble developing a more complex class in python and this task is taking more time than expected.I am working hard to keep on track.

Friday, June 5, 2015

First report

Wow! Two weeks are already gone?! Time goes fast, when working on an interesting project!

After a few iterations of the API with my mentors, I started working on the observation and datataset modules of Gammapy. These modules are essential for observation bookkeeping and data access. Since the background subtraction methods should act on the data, a good implementation of these modules is important for developing the toolbox mentioned in my proposal.

Before explaining what I am working on exactly, I will try to clarify how TeV gamma-ray observations are carried and organized.

A typical observation would be to point the telescope system (for instance H.E.S.S. or CTA) to a particular position in the sky (i.e. a source like the Crab nebula) for a certain amount of time (typically ~30 min.). Since most sources in gamma-ray need deep exposures (typically several hours to ~100 h) to obtain a significant emission, and hence actually "see something", an object is observed during multiple observations of ~30 min each.

Each of these observations delivers a large amount of data (raw data): telescope images of interesting events (plus also a lot of noise), atmosphere monitoring information, telescope calibration information, etc. Each experiment or observatory has a team of experts that filter the data and derive the properties of the primary particles (ideally gamma-rays, but also a lot of gamma-like background events) and produce the so called event list files: one file per observation.

These files contain ordered list of events with their properties (arrival direction, energy, etc.) and a header with the information common to all of them, for being in the same observation (observation time, telescope pointing direction, etc.). These event list files are the data, that is delivered to the observer/analyzer and are the input for the analysis tools.

In order to keep track of these observations and organize the event lists needed for a specific analysis, observation tables are used. These are but a list of observations IDs that should identify the corresponding event lists, together with interesting properties that help the analyzer in the selection of the observations needed for a specific analysis. These properties are for instance the observation time, the pointing position, the observation duration, the quality of the data obtained, etc.

My work on the last 2 weeks has been especially in the development of a tool that generates dummy observation tables that I will be able to use for testing some of the tools that I need to develop for the background module of Gammapy. In addition, I am working together with my mentors on defining the data format that Gammapy observation tables will support. A link to the pull request can be found here.

The work is not yet finished, but the tool generates already test observation tables with a few columns. Here is a screenshot with a 10 observation example table (please click on the picture for an enlarged view):

The algorithm of the tool generates random pairs of azimuth and altitude evenly distributed in spherical coordinates. Azimuth values are generated in the interval (0 deg, 360 deg), and altitude values in the interval (45 deg, 90 deg). It also generates random time values between the start of 2010 and the end of 2014. The az, alt pairs are then translated into celestial coordinates (RA, dec) for a specific observatory. More on coordinates and its transformations can be found in the Astropy docs here, or in a less technical way in Wikipedia here.

This tool could also be used in the future to generate example observation lists.

Items that need to be improved:

Write the doc specifying the supported format for observation tables in Gammapy.
Add additional columns to the dummy observation table generator.
Add a header to the observation table with for instance the name of the observatory/experiment (i.e. H.E.S.S. or CTA).
The times in the observation table generator should be written in a more convenient format.
Optional: since TeV gammas-ray observations are carried at night, it would be nice if the randomly generated observation times could be restricted to night hours (i.e. from 22h to 4h).
Optional: develop a tool to filter observation tables depending on a few parameters like:

Time interval: define t_min, t_max
Pointing position: define a circle/box in the sky.