http://www.astro.ku.dk/~erik/gaia/76.compression Sampling and compression of astrometric data including double stars =================================================================== E. Hoeg, V.V. Makarov 9 May 2000 SAG_CUO_76 ABSTRACT: An efficient compression by co-addition of data from the 16 CCDs in the astrometric field (AF) of GAIA is described. The method is a very simple, weakly lossy compression, but not mentioned in the GAIA Study Report (GSR). It does not depend on any hypothesis about the object being measured. A compression factor of theoretically 8.0 for the raw data is obtained. In view of this large potential gain it is proposed to transmit more samples per patch for the benefit of bright stars and double stars. The method assumes ideal linear response of the CCDs, the performance in case of realistic non-ideal CCDs cannot yet be estimated. ===== Introduction The current sampling assumes that each sample is obtained as an electronic addition during readout of 8 pixels perpendicular to the scan. Along scan a patch of 6 such samples are taken for each of the 16 astrometric CCDs which has been shown to be sufficient for single stars (see GSR Section 3.3.3, Fig. 3.16). These 16*6 =96 samples per star of 16 bits/sample contribute a raw data rate of 845 kbits/s per Astro instrument (Table 3.11). This means 1690 kbits/s for both instruments out of the total raw data rate of 6858 kbits/s from GAIA. The PSF for a blue star is more narrow than for a red star and will therefore suffer relatively more loss in accuracy by a compression. It has a FWHM=2.0 pixel according to Fig. 3.16a. The width along scan of 1 pixel =1 sample=37 mas. The peak of a given star image will not be precisely centred on the 6 samples due to : (1) uncertainty in the detection process, (2) the actual adjustment of the CCDs along scan, (3) the variations of scan velocity, and (4) proper motion of the object during the transit of the AF. The latter effect is only relevant for solar system objects and these shall be neglected in the following discussion. -------------------------------------------------------------------- |_|_|_|_|_|_| Patch of 6 samples | | Interval C containing the predicted position || Interval A || Interval B Figure 1. Patch of 6 samples in the astrometric field. -------------------------------------------------------------------- We consider now the actual position in the field obtained from the detection process. This detected position includes the above effect (1) which gives a constant (systematic) shift on all CCDs of the AF of the order 3 mas at V=20 mag, according to Table 3.5 of GSR. This shift is not relevant for the compression since it is systematic. We assume that it is possible onboard to predict the detected position on each CCD with a precision of better than 0.1 sample. This requires that the uncertainty of knowledge due to the above effects (2) and (3) is less than 0.1 sample. That is to say that the relative position of all samples along scan shall be known with a precision better than 0.1 sample = 4 mas from onboard astrometric calibrations of the CCDs, and that the scan velocity is well-known. The patch of 6 samples will always be extracted so that the predicted position is as close as possible to the centre of the patch, i.e. within +-0.5 samples hereof, inside the interval C of Fig. 1. ===== Compression by coaddition A possible scheme of compression would be to add all 16 patches resulting in 6 coadded samples. This would introduce a smearing corresponding to a rectangular function of width= about 1.0 sample, i.e. FWHM=1.0 sample. The above mentioned 0.1 sample uncertainty is negligible in this connection. The global PSF then has the width FWHMg = sqrt(2.0^2 + 1.0^2) = 2.24 sample (1) This means an increase by 11 per cent and accordingly an increase of the astrometric standard errors by 11 per cent, which is not acceptable for GAIA. The proposed scheme is to separate the 16 patches in two groups with the predicted position in the intervals A and B. The number of patches in each interval may vary between 0 and 16, and the decompression on the ground supposes that it is known which patches were placed in each interval. The compression consists in adding the patches belonging to each interval separately, resulting in two coadded patches. The global PSF belonging to each interval has the width FWHMg = sqrt(2.0^2 + 0.5^2) = 2.06 sample (2) This means an increase by 3 per cent and accordingly an increase of the astrometric standard errors by 3 per cent, which seems acceptable for GAIA. The compression amounts to a factor 8.0 for the astrometric data. The vast majority of observation should be compressed in order to have the benefit, but the uncompressed data are required for bright stars for calibration purposes. ===== Bright stars For very bright stars we propose to take 1 or perhaps 2 pixels/sample across scan in order to sample double stars and the bright spikes. The vertical, cross-scan spikes, but *not* the along-scan spikes, contain useful astrometric and photometric information, especially when the central part of the star image is saturated or non-linear. The vertical spike is about 5 mag fainter at a distance of 1 arcsec from the centre, as may be seen from Fig.6.6. It is then hardly useful to go further out. We propose to take 20 adjacent samples of 1 pixel, i.e. 20 samples across scan, covering 2.22 arcsec, for very bright stars, V<12.0 mag. This will introduce more readnoise per star, but the effect will be negligible because of the large number of photons. It is proposed below to take 12 samples/patch along scan instead of only 6 in order to cope better with double stars. Faint components of close double stars would not be well covered by only 6 samples. For stars between V=12 and 16 no compression by coaddition is applied since the photometric and astrometric calibration could be affected, and since these stars do not contribute significantly to the telemetry rate, according to Table 1. We define R in the second last column as the number of samples from one field crossing per star for all stars on the sky. For the first 4 lines with no compression we have R = Nst*Nsmp*16 For the last line with compression by coaddition R = Nst*Nsmp*2 ------------------------------------------------------------------ Table 1. Samples and patches for the astrometric field AF01-16. R is the relative amount of raw data. The last line assumes the new compression by coaddition. The total value of R for the proposed sampling and compression is then (480+600)*16+12000*2=17280+24000=41280 to be compared with the present 6000*16=96000 in the first line. V Nst Sample Nsmp R Notes mag 10^6 pixels samples/patch 10^6 < 20.0 1000 1*8 6*1 6000*16 GSR, Fig.3.7 < 12.0 2 1*1 12*20 480*16 Proposed 12 - 16.0 50 1*8 12*1 600*16 Proposed 16 - 20.0 1000 1*8 12*1 12000*16 Compression required 16 - 20.0 1000 1*8 12*1 12000*2 Compr. by coadd. ------------------------------------------------------------------ ===== Double stars Given the possible large compression factor of 8.0 the sampling of double stars should be reconsidered. We are currently assuming that stellar duplicity is detected as a widening of the image along scan by analysis of the patch of 5*5 samples in ASM3 (see Fig. 3.7 in RR). In case of a significantly widened image a larger patch of, e.g., 12 samples should be extracted in the AF. But the detection of widening is far from trivial and has not been studied in any detail. Given the significant number of double stars and the great scientific interest of these the issue is quite critical since the impact of the data rate could be very significant. Let us tentatively assume that 12 samples instead of 6 are extracted per patch for every detected stars. With the proposed compression we would still obtain a factor 4.0 on the current astrometric data rate if this would be applied at all magnitudes, but we will not apply the coaddition for bright stars. A stellar component at a distance of 4.0 samples=150 mas would still obtain a nearly perfect sampling for astrometry (see Fig. 3.16b). This figure also shows that a double star with a separation of 150 mas along scan is very well resolved. A patch of 12 samples is therefore sufficient even for a double star of large magnitude difference. This is the most difficult case because the bright component will be located near the centre of the patch and the faint component at a distance of 150 mas. The requirement for the detection of widening of images in ASM3 is therefore about 150 mas component separation along scan. At larger separations than 150 mas more than 12 samples should be taken according to a yet to be defined strategy. The resolution along scan in ASM1 or 3 appears to be about 150 mas according to Table 2 in LL-29. This implies that a detection of image widening in ASM3 is perhaps superfluous ! This assumes that 12 samples per patch are taken for all detections as we propose here. Further discussion should be based on simulations of the detection in ASM1 and ASM3 and subsequent measurement in the AF. ===== Comments by Lindegren to a draft of this report *** On 18 April: ...However, it is quite possible that the strongly non-linear CTI effects will make the calibration of the faintest stars the most difficult, and the presence of localized effects (charge traps) could make it essential to retain the individual CCD transits also for the faint stars. A more general argument against co-adding is that it increases the sensitivity to cosmic ray events, CCD defects etc by a significant factor. In general, the co-adding is suboptimal in the presence of any non-linear effect. Although it is a good idea to make use of the similarity of the 16 successive AF transits, my fisrt choice would be to use this in a lossless compression. The gain will certainly be much less than a factor 8, but probably better than a "blind" (non-predictive) lossless compression. On the double stars, I agree that it is a fairly critical problem that needs a lot of attention. *** On 19 April: Subject: Data compression Erik, Michael, I just found an incredibly elegant solution to the data compression problem for GAIA. US patent #5,533,051 describes a process whereby any string of n bits (n > 1) can be compressed by at least one bit. Recursively applying this to our 20 TB data sets reduces it to one bit. See http://gailly.net/05533051.html for details. Lennart ===== Conclusions The resulting impact on the telemetry rate for stars in the various magnitude ranges is shown in Table 1, R in the second last column. It appears that a better sampling than in the the GSR of bright and very bright stars, i.e. of all stars with V<16.0, can be recommended without serious impact on the telemetry. We propose here 12 samples along scan per patch instead of the 6 samples in the GSR. A better sampling also of the fainter stars can be recommended if the compression by coaddition is acceptable for these stars. Altogether a better performance for double stars and bright stars is obtained, and a reduction of the transmitted raw astrometric data by a factor 96000/41280=2.3. ------------------------------------===============================