The Scientific Case
for a Gemini Data Archive




DAVID SCHADE and DANIEL DURAND
Herzberg Institute of Astrophysics, Victoria, B.C., Canada



JEAN-RENE ROY
Université Laval, Québec Qc, Canada


and


MARIA-TERESA RUIZ
Universidad de Chile, Santiago, Chile




Summary

A well-designed and properly-implemented Science Data Archive, as part of the capabilities of the Gemini observatories, would be a major contribution toward the full exploitation of the unique characteristics of the Gemini telescopes. An effective archive would boost scientific productivity and would ensure that maximum value was extracted from the expensive-to-obtain observational data. In the short term, an archive would contribute to the efficiency and effectiveness of observatory operations, scientific planning, and preparation of observing proposals. Archiving requirements are consistent with, and would help to optimise, queue mode observing requirements. In the long term, a science archive would enable many of the high level science goals outlined in the Gemini Science Requirements document. Some specific goals-- for example, understanding the relationship between quasars and their host galaxies, probing variable phenomena in stars and galaxies, dynamics of galactic nuclei, would be unattainable without archival access to the complete set of Gemini observations. The collective impact of the full Gemini observation database would far exceed what could be produced by the programs of single observers or teams, spanning a wider range in physically important parameters such as redshift and luminosity of active galactic nuclei or galaxies. Furthermore, many high-priority Gemini projects would use the Gemini data in conjunction with Hubble Space Telescope, HIPPARCOS, ROSAT, IRAS, NGST and other archival data.

A science data archive, in partnership with a well-designed engineering archive, would play an important role in characterising and monitoring the behavior of the science instruments. Optimising their performance would be made much easier with the direct modes of data access provided by the archive.

Access to Gemini data by the wider community, after the proprietary period, would ensure that the data are fully exploited and that maximum scientific value is returned to the astronomical communities and, ultimately, the citizens that support Gemini. The true legacy of the Gemini observatories would be the collection of excellent observations produced by its innovative instruments and this legacy, in the form of the first scientifically effective archive of ground-based observations, would play an important role in astrophysical research long after Gemini itself ceases operations.

Science Requirements for a Gemini
Data Archive




The ultimate legacy of Gemini is the scientific data that it produces and the full scientific exploitation of those data is the most important goal of Gemini. The science requirements of the Gemini Data Archive are defined to further the pursuit of that goal.



1) The scientific communities of the partner countries must have online access to a complete catalogue of the Gemini observations and to the data in a recognised and readily usable astronomical format.

2) All descriptors necessary to qualify and to quantify the data must exist, be accessible, and be accurate. These descriptors include a record of the complete optical path, observatory systems status, and environmental conditions (e.g, logs of weather and instrument temperatures). This requirement implies strong linkage between the engineering archive and the scientific archive.

3) A mechanism must be in place to control access to the data in order to enforce proprietary rights.

4) Security of the data must be guaranteed against loss (for example, data should exist in duplicate in different locations).

5) The necessary elements must be in place to enable calibration by an automated processing pipeline. These elements include the existence of reliable calibration material for ALL observations (whether queue-mode or classical) and pipeline processing software. Calibration of data may be performed at the time of observation but the capability must exist to re-calibrate all data at any future time to take advantage of increased knowledge of the instruments and improved calibration material.

6) The Gemini archive facility must be considered to be an integral part of the Gemini operations environment. Archive requirements need to be considered alongside other important requirements by all members of the team: instrument scientists, engineers, observatory staff scientists, observing assistants, visiting astronomers, and archive staff.

7) The Gemini archive must maintain compatibility with evolving requirements for effective inter-archive access because many science projects will use a combination of archives as data sources.

8) A science archive is an evolving facility, and at all stages of its growth, it will require resources for continued development as well as operations. For instance, Gemini data must be transferred to new media as storage systems evolve. These data must be legible a century from now.




MISSION STATEMENT


The Gemini Science Archive should provide the scientific community of the partner countries online access to all Gemini science data and supporting information in order to allow full scientific exploitation of those data. The Gemini Science Archive should guarantee that the valuable datasets obtained with the Gemini Telescopes are saved and preserved for use by future generations for research and education.

INTRODUCTION

Space-based observatories have produced scientifically effective archives for over two decades. Data from IUE, IRAS, Einstein and other missions have made clearly important contributions to progress in astronomy. Hubble Space Telescope (HST) has broken new ground in the development of archives of optical data. The observations are all saved, pipeline processed and calibrated, catalogued and distributed. The HST archive has only recently begun to be heavily exploited and will be a valuable resource for decades to come. Hanisch (1998, SPIE, vol. 3349) recently reported that the data retrieval rate from the HST archive is now higher than the rate at which new data is being ingested; he also pointed out that, up to present, ten times more International Ultraviolet Explorer (IUE) data have been extracted from the IUE archive than was originally put in it.

Some large astronomical projects like the Palomar Sky Surveys, the Sloan Digital Sky Survey, or the 2Mass Survey are themselves archive projects. For example, The Sloan Digital Sky Survey (SDSS) -- using a telescope and instruments dedicated to that project-- will contain photometric, spectroscopic, and morphological parameters for several hundred million objects. Archiving is taking its place as one the most important resources that serve the astronomical research community.

It is important to appreciate the difference between a safe store for observatory data and a useful science archive. The science archive requires careful cataloging and effective search and retrieval tools as well as the capability of reliable calibration; see for example the AstroBrowse Web site at sol.stsci.edu/~hanisch/astrobrowse_form.html. These are the features which allow an archive to produce science and they are absent from basic data storage systems.

Archive Research Opportunities

There are at least 3 classes of archive research project. The first consists of cases where the data are used for an entirely different scientific project than they were obtained for. The second is the case where new, improved, or otherwise different and more effective methods of analysis are brought to bear on the data. The third, and perhaps most important class exploits the collective effect of the archive where a larger and more comprehensive dataset (consisting of all of the archive observations taken to date) spanning a wider range in some important parameter is available to the archive researcher than could ever be available to an individual proposer. The whole of the archive dataset is worth far more than the sum of the parts, and the linkages across archives and across wavelength regimes adds still more value to archive data.

An excellent illustration of the effective use of archive resources is "The Demography of Massive Dark Objects in Galaxy Centres" (Magorrian et. al 1997, astro-ph/9708072). Nearly all of the leading workers in this field have collaborated to produce a study which uses imaging data from at least 6 HST programs and incorporates kinematic information from more than 10 separate ground-based observational programs. Clearly, this approach of combining many years worth of observational effort into a large homogeneous dataset is extremely effective. The existence of good archive facilities makes this type of substantial scientific progress possible.

As a second illustration, the CFHT archive was searched for observations of NGC 1068, an AGN which displays spectroscopic and photometric variability. The search took only a few minutes of effort and returned 189 exposures from 8 separate programs spanning 7 years. Spectra and images in the optical were obtained in 6 programs and infrared observations were made in 2 additional programs. The long time baseline makes this a very valuable archival dataset. A search of the JCMT archive revealed 613 observations of this object in the sub-millimeter regime from numerous programs. These could be combined with the 283 HST observations taken over a period of 6 years with 6 different instruments in 22 separate programs. All of these data are available from a single archive site at CADC. Over 50 observations are available from 6 different X-ray and gamma ray missions through the HESARC archive. This is a very well-observed object but many sources have been observed at multiple wavelengths at multiple observatories over long time baselines and archiving preserves the value of these data for future research.

The range of published archival research is impressive. Koesterke et al. (1998 A&A 330,1041) combine HST and IUE archival spectra to study mass loss in four PG1159 stars, Cagnoni et al. (1998 ApJ 494,54) use archival ASCA observations to evaluate the contribution to the X-ray background of discrete sources in the 2-10 keV energy band. Sodemann & Thomsen (1998 A&A Suppl. 127,327) perform an extension and re-analysis of earlier crowded-field photometry in M32 from archival HST imaging. Archival data from the Burst and Transient Source Experiment (BATSE) is reprocessed by Kommers et al. (1997 ApJ 491,704 ) to achieve higher sensitivity and is searched for low-significance transient signals in the 50-300 keV range. Ciliegi & Maccacaro (1997 MNRAS 292,338) study time and spectral variability of Einstein Extended Medium Sensitivity Survey (EMSS) active galactic nuclei. Serendipitous asteroid trails are detected by Evans et al. (1997 AAS 29.0701) by examining 30,568 frames from the HST Wide-Field Camera and 96 moving objects are found. Rigopoulou, Lawrence, & Rowan-Robinson (1996 MNRAS 278,1049) combine archival and other data from the sub-millimetre (JMCT) to the X-Ray (ROSAT) and for a set of ultra-luminous IRAS galaxies. IRAS archival data is used by Noriega-Crespo et al. (1997 AJ 113,780) to produce a survey for bow shock structures around OB runaway stars, and follow-up work uses re-processed IRAS high-resolution maps.

It is evident from these examples--all involving space-based instrumentation in a key role-- that extremely valuable science over a wide range of subject areas can be done with a properly-implemented Science Data Archive and that much of this science would not be possible in its absence.

Ground-based astronomy archives

There is no fundamental reason why a Science Data Archive from a ground-based facility should be more difficult to implement or less valuable than an archive of data from a space-based observatory. There have historically been differences in facility design motivated by the fact that a higher level of planning and automation is required in space where real-time human intervention in observing procedures is much more difficult than on ground-based telescope. The Gemini observatories are being designed to operate in a mode that very much parallels that of space-based observatories and these design requirements will allow an effective archiving to be created. The motivation behind Gemini's design decisions is not the impossibility of human control (although the effect of Mauna Kea's altitude argues for a minimum of real-time decision-making), but the desire for optimum observatory performance which requires detailed planning of observations and queue-mode observing.

What are the unique difficulties of ground-based observing and archiving? The salient feature is varying weather conditions. The solution is to monitor and log transparency and seeing and maintain links between this information and the data. Queue-mode observing allows the observatory to respond to changing conditions by executing programs that are best-suited to those conditions thus optimising the scientific productivity of the Gemini facilities.

A number of other challenges faced by existing archives of ground-based observations are related to instrument and facility design as well as deficiencies in observing and logging procedures. Extensive experience with the archive of the Canada-France-Hawaii Telescope (CFHT) has brought these problems into sharp focus and has showed how to solve them. There have been four basic deficiencies. First, at CFHT there is no guarantee that adequate calibration material is obtained. Second, there is no requirement for adequate or uniform logging of observations and weather conditions. Third, there is no guarantee that data headers include ALL of the instrument, telescope, and other system configuration information that is essential to understand and reliably calibrate the data; furthermore there is no guarantee that some key components (e.g. filters) are in the right place, because they are not all encoded and monitored. These are the reasons that the CFHT archive--currently the best archive of ground-based data in existence--has not realized its full potential scientific productivity. Archive users, in general, simply lack sufficient information and confidence about what occurred during the execution of the observations to produce reliable science-quality data from the archive.

Gemini has been designed along the lines of a space-based observatory and this guarantees that most of the problems cited in the case of the CFHT archive are automatically resolved. An archive has been envisioned as an integral part of the Gemini facility. The success of the archive requires, above all, this element of integration with day-to-day operations of the observatory scientists and engineers, effective interactions with the instrument teams, and the contributions of the user community. In the ways it has designed its telescopes and instruments, the manner it has constructed its engineering archive and the plans it is setting for its operational modes, Gemini has already laid the foundations that are needed for the effective operation of a Science Data Archive.

Motivation for a Gemini Science Archive

The main argument in favor of allocating resources to a Science Data Archive is that it would increase the quality and the quantity of the science that is produced by the Gemini observatories. It would also preserve the scientific value of the Gemini data far into the future, not as part of an historical record but, rather, as data that would continue to contribute actively to scientific progress for decades to come.

An archive of science data would also play an important role in characterising, monitoring, and optimising the performance of the instruments. Comparison of newly-obtained data with those in the archive is the most reliable way to monitor performance.

A successful Gemini Science Data Archive would ensure that all scientists in all countries of the partnership would have access (following an appropriate proprietary period) to all of the data produced by the observatory. This would represent important added value to the Gemini partnership. The archive would ensure that the Gemini observations are fully exploited and that opportunities for doing Gemini research would be as widely distributed as possible. An archive would also be extremely valuable both for educational purposes and for public outreach activities (where HST has excelled). Furthermore, there is an issue of public accountability. The allocation of resources to an archive would demonstrate to the taxpaying public that all efforts were being taken to ensure that maximum value was being extracted from the Gemini facility and that these astronomical data were highly-valued and needed to be protected to preserve future research opportunities.

SCIENCE WITH THE GEMINI ARCHIVE

Gemini's first and subsequent generations of instrumentation will provide unprecedented observational capabilities and will open up new opportunities for study in fields as diverse as planetary searches and high-redshift clusters of galaxies. The uniqueness of Gemini science makes the provision of an archive of these data a compelling priority.

In the following sections, we discuss how a Science Archive would help to realize several specific scientific goals of the Gemini observatories. In some cases the main benefit is derived from the larger and more comprehensive database represented by the Gemini observations accumulated over a period of time. Sometimes the increased time resolution is important, for example, for the study of variable phenomenon and proper motions. In some cases, Gemini observations will be combined with those from other facilities to provide a wider baseline in time or in wavelength. In some examples observations will be used for a completely different scientific purpose from that for which they were obtained and sometimes they will be used for the same purpose, but the new results will be due to fresh viewpoints and more effective methods of analysis used by archival researchers. Gemini archival observations will be used as the basis for new proposals to Gemini and other facilities, and also will be employed in conjunction with data from sister archives. In all of these cases, the Gemini archive would help ensure that the best is made of the available information content.

Star Formation

Star formation occurs in molecular clouds in regions with large amounts of extinction and the spatial scales involved are small (1-104 AU which translates to subarcsecond even for nearby regions). High spatial resolution and infrared capabilities are most important. Imaging and moderate to high spectral resolution (requiring good light gathering power) are needed to answer questions about environmental effects on the initial mass function, the physical state of molecular clouds and grain chemistry. Accretion and outflow processes around Young Stellar Objects can be investigated, and the basic parameters of circumstellar disks in nearby regions will be measurable using a combination of adaptive optics, coronagraphy, and polarimetry.

The Gemini archive could be useful in several aspects of the study of star formation. For example:

AGN Unification and Quasar studies

One of the more thorny astrophysical problems is AGN unification. By this we mean deciding which classes of AGN objects are in fact identical, but simply viewed from different orientations, and also which classes differ in only minor respects and should be considered part of a spectrum with minor changes in some parameter. Opinions on this topic have swung wildly from the days when the (innumerable) classes of AGN were developed, to recent times when many people argue that all AGN are explicable by a single model with only minor changes.

In order to arrive at the true answer (most probably somewhere between the above extremes), one needs data of comparable quality on a wide range of AGN types, covering a wide range in redshift and luminosity. Such a large dataset would never be produced by a single observer because of time allocation limitations, but would inevitably accumulate in a Gemini archive. Two examples of pieces of the unification puzzle that illustrate the potential of an archive are:

4 views of an AGN
Figure 2: An example of multi-wavelength archive research combining Adaptive Optics with other existing data. This is an example of an X-ray selected AGN at z=0.037 observed at 4 different wavelengths from near-infrared to B-band (Schade & Crampton 1998 in preparation). All of these data are or will be available through archives. At left is an Adaptive Optics image in the H-band (pixel size 0.0375 arcseconds) taken at CFHT and next to it is an HST PC I-band image of the same object (scale 0.0455 arcseconds/pixel). These are displayed so as to show the diffraction rings in both images. The next two images are R-band and B-band images taken in seeing of 1.5 arcseconds (with pixel size 0.31 arcseconds) at La Palma with the JKT 1 meter telescope. The 1-meter integrations show the low surface brightness outer regions of the galaxies while the high-resolution images probe the inner structures with a good wavelength baseline. The boxes are each 101 pixels on a side. These data were simultaneously analysed to obtain structural parameters of the inner region of the AGN and of the host galaxy.

1. Radio loud QSOs and luminous radio galaxies:

Both AO and non-AO imaging with NIRI will allow comparison of the host galaxies of the two classes. IFU spectroscopy with GMOS and NIRS will show gas motions, both inflow and outflow over the galaxy nuclei, while polarimetry with NIRI and GMOS will show locations where scattered nuclear light may be found. To be successful, a unification model would have to explain any observed difference in the properties of the host galaxies or nuclear gas. It would also be supported by the discovery of strong scattered nuclear light in objects where it is not seen directly.

2. Seyfert 1 and 2 galaxies:

Several classic cases are known of Seyfert 2 galaxies which show broad lines when studied in polarised light. This has led some to claim that all Seyfert 2 galaxies are in fact Seyfert 1 galaxies, where the nucleus is obscured. There are however a number of problems with this simple hypothesis. Large well defined samples of Seyferts are needed, with high resolution imaging and IFU spectroscopy (NIRI, GMOS) to test this hypothesis.

There are many other promising unification candidates including: Broad absorption line and normal radio quiet QSOs, or BL Lac objects and low luminosity radio galaxies. Only by combining data from many different Gemini proposals will a large enough sample be obtainable. One can then start to assemble the entire picture, and finally, determine how many physical parameters are really needed to determine what an AGN will appear like to us. Being able to analyse uniformly NIRI imaging for all AGN, for example, would be a powerful use of a Gemini archive.

Quasars and AGN often attain new interest as result of discoveries in varied wavelength regimes, for example, the X-ray and radio. Figure 2 shows a set of data from various telescopes at various wavelengths that can be combined to derive information about the AGN and its host galaxy. Variability in some band, lensing, membership in large scale structure, or the presence of foreground absorbers all represent auxiliary information that might not be available when original Gemini observations are made, but which would renew interest in those observations. An archive thus plays a crucial role in maintaining the value of Gemini data for AGN science.

Dynamics of Galaxy Nuclei

High-resolution spectroscopy of nearby galaxy nuclei frequently provide evidence for the presence of massive black holes. Light gathering power and high spatial resolution give Gemini a unique capability to extend these studies in terms of data quality and in terms of target distance. Gemini will produce much larger samples of observations of galaxy nuclei, and permit the reliable assessment of the true frequency of the black hole phenomenon and its relation to galaxy properties. Multiple groups will be involved in galaxy nuclei investigations, and one can expect many more studies like that of Magorrian et. al (1997) mentioned is section 1.1 to use archive data in the future.

Evolution of Cluster and field galaxies

Gemini will be heavily involved in surveys for high-redshift clusters and field galaxies and in spectroscopic and imaging follow-up for these surveys. Many images of faint galaxies will be obtained. Adaptive Optics will provide superb spatial resolution for both imaging and spectroscopy.

Figure 6b) Density of arclets in cluster A2218 (with B less than or equal to 24.5) for a given model of mass distribution.
Figure 6c) Mean redshift of arclets in cluster A2218; more distant ``arc'' galaxies appear further away from the cluster center than less distant ones. (Bezecourt et al. 1998).

ADDITIONAL BENEFITS OF THE GEMINI SCIENCE ARCHIVE

Summary

The efficient and effective operation of a modern observatory requires (for both technical and scientific reasons):

These considerations suggest that a facility with all of the features of a Science Data Archive is required as an integral part of the efficient functioning and optimal science output of the Gemini telescopes.

Feasibility checks, proposal planning and optimisation

The Science Data Archive would allow proposers to evaluate quickly the feasibility of their program by study of actual data retrieved from the archive. There is no substitute for seeing results from the same instrument and configuration that an observer is considering using. Access to real data would allow proposers to estimate reliably the required integration times, and allow them to evaluate the effect of different atmospheric emissivity and image quality on the data needed for their science.

Nominal performance of the telescope and instruments could be verified by comparing newly acquired data with archival data; instrumental setup, verification, and performance monitoring would be facilitated. The observatory and its instruments could be monitored to study trends in behavior and fine-tuned to ensure optimal performance.

Quick-look tools

The provision of quick-look tools requires the existence of good calibration material and automated processing in real time. Thus a processing pipeline needs to be defined for each instrument where quick-look tools are implemented. Minimal calibration steps should include bias correction, flat-fielding and wavelength calibration. Additional effort would provide image distortion correction, atmospheric refraction correction and fluxing.

Queue scheduling

Simulations (Mountain, Simons, & Boroson, 1995 RPT-PS-G0053) provide evidence for substantial scientific gain from queue scheduling; MSB discuss models where between 70% and 100% of observing time is dedicated to this mode of operation. The most compelling science is often the most challenging technically, and the observations that fully exploit the best atmospheric conditions are likely to be obtained in queue mode.

Queue-mode observing automatically satisfies the requirements of a Science Data Archive. The data are obtained and accompanied by a) sufficient calibration material, b) electronic logs of the events that took place during data acquisition, and c) weather monitoring information. These products delivered to the proposers are sufficiently complete and reliable to allow them to carry out the desired analysis with confidence. There is no difference between delivering these products to the original proposer several hours after they were obtained and delivering them, following the proprietary period, to an archive user.

Calibration

In order that data be ``archive-able'', observations must always be accompanied by sufficient calibration material and logs of weather and other events. The definition of specific calibration requirements, which largely determine the value of the Gemini science archive, requires ongoing consultation with the user community. The information should be complete enough to redo the observations in exactly the same way. This requirement applies equally to queue-mode and classical-mode observing. There are several benefits to the classical-mode observer of obtaining the minimum calibration data. First, instrument teams and observing staff who use the instrument frequently possess a great depth of expertise and would advise which calibration procedures are necessary and would implement those procedures correctly and efficiently so that the minimum necessary amount of time is expended on calibration. Secondly, calibration data should be treated as ``shared'' data. This will often result in further reductions in calibration effort, since each observer is not required to obtain a complete calibration dataset where redundant material already exists. An additional benefit is that the quality of the calibration provided to the classical (as well as the queue-mode) observer would be as good as or better than they would otherwise have obtained. Finally, the calibration material would be tuned for integration into both the quick-look evaluation tools and into the processing pipeline that would exist for each instrument.

Pipeline Processing

The observer in either queue or classical mode would benefit from pipeline processing whether they use it to produce their final data products or whether they use it as a benchmark to evaluate the results of their own processing software. The production of pipeline software is an integral part of the archive process, but it is also an integral part of the implementation of quick-look tools for real-time evaluation of data quality. As is the case for calibration material, the instrument teams, and Gemini staff astronomers, will have considerable depth of expertise on processing of data from each instrument and would develop processing algorithms in consultation with users. The standard pipeline processing software will not satisfy all telescope users but, on balance, it would save most observers a considerable amount of time normally spent on routine data reduction and software development.

CONCLUSIONS

A Science Data Archive would represent a major contribution to the scientific productivity of the Gemini observatories in a number of ways. First, we have given a number of examples where it would enable first-rate scientific research that would never be done in the absence of an archive. Second, a data archive carries benefits for proposal preparation, instrument performance verification and optimisation, queue-mode observing, and Gemini operations in general. This is because it is more efficient to operate in an environment where information is managed and processed with the best existing technological tools. An archive is a major component of such an environment. Third, it would help to keep us competitive. The VLT and Subaru projects realize the power of effective archiving and are investing large resources in archive development. We know how to develop and run a highly productive science archive with a fraction of the resources these projects are expending. Finally, the Gemini Science Archive would distribute scientific opportunities among astronomers in the partner countries, would help to inform the public about the excitement and importance of astrophysical research, and would demonstrate to the taxpayers who support Gemini that we were acting responsibly in ensuring that these valuable data are being handled with the greatest care to ensure that they are fully exploited and that their value is being preserved for future generations of scientists.






Acknowledgements

We are grateful to Andy Woodsworth, Jim Hesser, Séverin Gaudet, David Bohlender, Tim Davidge, Laurent Drissen, Phil Puxley and Gordon Walker for their comments and help in preparing this paper.



30 March 1998

Previous Message from the President   Next Protecting Radio Astronomy   Table of contents Tables of Contents