Concentration of Starlight from Large Apertures into a Single Spatial Mode for
                                 Long-Baseline Interferometry
Jeff Meisner

This is a detailed paper written to accompany a much shorter paper (link through main page) submitted to the ASP Conference Proceedings, based on a poster presented at the ASP annual meeting symposium on interferometry and adaptive optics, June 28 - July 1, 1998.

Notes regarding this HTML version:
All of the figures have been placed in a separate file. Access all figures through the hyperlinks.
(Depending on your browser and installed fonts, you might have some problems with certain characters in the text! However the display equations are in graphic format and will come through fine.)

Main interferometry page

Concentration of Starlight from Large Apertures into a Single Spatial Mode for Long-Baseline Interferometry

(Unpublished version)

Jeff Meisner


A long-baseline optical stellar interferometer requires a minimum level of optical power available from each arm in order to operate in the fringe-tracking mode which enables coherent integration of fringe visibility. That optical power must be concentrated in a single spatial mode in order to interfere coherently. However atmospheric seeing places a limit on the amount of optical power that will be accepted into a single spatial mode for apertures much larger than the Fried parameter, thus placing a magnitude limit on the coherent operation of the interferometer.

However the use of an adaptive optics system may enable larger apertures to concentrate greater amounts of optical power in a single mode, thus extending the magnitude limit of the interferometer. Aside from systems using laser guide stars, an adaptive optics system requires a feedback signal derived from the detection of a portion of the collected starlight in order to co-phase the sub-apertures. Increasing the portion of light directed to the feedback system will therefore allow the adaptive optics system to operate on dimmer objects.

On the other hand, the optical power which is sacrificed for the production of the feedback signal becomes unavailable for the ultimate use by the interferometer. However any light which would not successfully be concentrated into the output mode may be obtained "cost free." This observation leads one to different design criteria for an adaptive optics system used for concentrating light into a single mode, as opposed to one designed for high resolution imaging by a single large aperture.

The sensitivity limits and light concentrating power of any such adaptive optics system can be found by the analysis of a hypothetical guided wave optical circuit forming a binary tree structure. Optical power is concentrated from subapertures feeding the branches of the tree toward the root yielding starlight concentrated in a single mode which may supply one arm of a long-baseline interferometer. Concentration of light along the tree structure occurs at 2-input modules each of which are similar, and are optimized for maximum light concentration at each stage. The control system for each module operates independently of the others, and is optimized on the basis of the power spectrum of phase noise expected for a given level of atmospheric turbulence. Performance limits are obtained for the resultant optical concentrating power as a function of incident flux, the Fried parameter, and the atmospheric coherence time parameter.


A long-baseline stellar interferometer can accurately determine underlying fringe visibilities only if interference over the entire pupil plane occurs in a single phase, or if the rms phase error over that area can be accurately determined. One simple and effective means of insuring this condition is by limiting the collecting aperture relative to the Fried parameter r0. This ensures that both beams have almost all of their power in a single spatial mode. Interference can only occur between waves launched into the same mode. Interference occurring in other spatial modes will occur in random relative phases and not substantially contribute to the signal, yet the detected photons will still contribute shot noise. Not only will the signal-to-noise ratio not be augmented by additional light, but calibration of visibility measurements will suffer as varying atmospheric conditions alter the amount of optical power in the intended spatial mode. For dim or weakly correlated sources, lack of a sufficient signal-to-noise ratio will prevent fringe-tracking and coherent integration of fringe visibility, greatly reducing the performance of the instrument when observing such objects.

Aside from limiting the primary apertures, spatial mode filtering can be obtained using pinholes or single-mode fibers. In each case the amount of light accepted into a single mode is limited by the Fried parameter. However adaptive optics and other active systems may be capable of measuring and partially compensating for random phase functions over an aperture larger than r0. The simplest such system, tip-tilt correction, can appreciably augment the power obtained in a single mode. Higher order adaptive optics systems have the potential to further concentrate power into a single mode, according to the factor by which the Strehl ratio is improved.


Consider the collection of light from a virtual point source by a square or hexagonal array of 2M subapertures as depicted in the left side of Figure 1. Each subaperture collects a sample of the same wavefront subject to a random phase shift dependent on the column density of the atmosphere above that subaperture (intensity variations will be ignored in the present analysis). The light from each subaperture is collected into an optical fiber or similar single-mode guided wave structure which enters an optical circuit for the purpose of concentrating the light received from the 2M subapertures into a single mode.

Now suppose that we have designed a "concentrator module" which accepts two optical signals derived from the same source but with a random phase shift between them, and outputs most of the light from the two inputs into a single output mode. Then using 2M-1 such concentrator modules in an M-tier tree structure, the light gathered from the 2M subapertures may be concentrated into a single spatial mode as shown in Figure 1 for the case of 16 subapertures (M=4) using 15 such concentrator modules.

Each of the 8 first-tier concentrators accepts inputs received from adjacent subapertures (0 and 1; 2 and 3; etc.) since these present the shortest baselines and thus have the smallest rms atmospherically induced relative phase shifts. The phase of the light input to a first tier concentrator module will be approximately the same as the phase of the wavefront as observed at the position of the center of the subaperture from which it is collected. The output of a first tier concentrator which combines two adjacent subapertures will have a phase approximately the same as that of the wavefront at the midpoint position between the centers of the adjacent subapertures. In Figure 1 these positions are marked with small dots. Likewise the outputs of the second tier concentrators have phases which are approximately the same as the phases of the wavefront present at positions (indicated by larger dots) midpoint between the virtual positions defined by the first tier concentrators, etc. In general, concentration at tier m will involve correcting for a random phase function f(t) whose statistics correspond to the phase difference between virtual points defined by the two outputs from concentrator modules at tier m-1, producing a concentrated optical signal whose phase is approximately that of a virtual point midway between the virtual points defined by the two previous concentrators.

For circular subapertures each of diameter D closely packed in a rectangular or hexagonal pattern, the length of the effective baseline for concentrating light at the mth tier, is given by Table I.

The design of a concentrator module at tier m should be tailored to the temporal statistics of the random process f which denotes the relative phase between its two optical inputs. The (2-sided) power spectrum of f over a baseline of length B is approximately given by:

where fb, the low frequency cutoff due to the baseline B, is given by:

V0 denotes the effective wind velocity of the turbulent atmospheric layers, and k is a constant related to the direction of that wind with respect to the baseline. In theory, if the wind were parallel to the baseline, k would be 2; for the wind perpendicular to the baseline k takes the value of 3. For simplicity, an intermediate value is employed. T0 in (2) is the atmospheric coherence time parameter defined as r0/V0. We have assumed an infinite outer scale of turbulence, which is essentially a worst-case assumption.

In a practical system these parameters would be estimated empirically, for instance, by observing the power spectrum of phase noise. The amplitude of the high-frequency asymptote is governed solely by T0, while the amplitude of the low-frequency curves (and the cutoff frequencies fb) are also a function of the baseline, as given by Table I for the various tiers of the concentrator circuit. The integral of Sff over all frequency will equal the well known result for the mean-square value of f over the baseline B:

Clearly the increased phase noise at longer baselines (higher tiers) will make the task of correcting that phase more difficult. On the other hand that increased difficulty at higher tiers will be ameliorated by the stronger optical signals present at their inputs, due to the previous stages of optical concentration. To evaluate the net performance of the tree structure shown in Figure 1, we must first design and analyze the operation of a single concentrator module.


Consider two optical signals E1 and E2 which differ only by the phase shift f:

If f is known, then the power from E1 and E2 can be easily combined using the optical circuit depicted in Figure 2. E2 is first run through a delay compensator which shifts its phase by y. Knowing f, we set y to the same value, so that X1 and X2, the inputs to the passive network, are now identical signals, both equal to E1. The network produces outputs which are the sum and difference of the inputs. Since X1=X2, the output Y2 is zero, but the output Y1 has an amplitude of 2E1, or double the power of a single input. Of course this perfect concentration of the input power was only possible because we knew the relative phase difference f. In general, for a setting of the delay compensator y not equal to f, the resulting outputs of the network in Figure 2 would be given by:

If the phase f were unknown (or, equivalently, if E1 and E2 were incoherent relative to each other) then concentration of the input light would not be possible. Instead, the expected intensities of the outputs would be found by averaging the cosine over all angles to yield zero, so that the expected output powers would each be |E1|2, the same as the power of each input, in accordance with the brightness theorem.

In the important practical case in which an estimator for f is used in which y is modelled as a gaussian random variable with a mean equal to f and a variance of e2, the expected outputs would be:

In (8) we have defined the concentrating power C as the factor by which the expected output power exceeds the individual input powers. Clearly C cannot exceed 2, and a concentrating power which is not significantly greater than unity is obviously of no value.

Now let us look at schemes for controlling y in order to maximize the concentrating power C. Consider the network shown in Figure 2. The light output at Y1 is the useful output, but the light output at Y2 is unused and can be observed. In fact, the amount of light observed at Y2 will be an indication of the concentrating power achieved, according to (9). Unfortunately the power level observed at Y2 is uncorrelated with the control error f-y, and thus cannot be used in a linear system to servo the delay compensator. Instead we must substitute a network which has at least three outputs.

A lossless network implies a unitary scattering matrix. For the application of concentrating light when y=f so that the two optical inputs to the network are identical, we desire an output which sums those inputs in phase and sends relatively little power to the other outputs. The class of possible photo-mixing networks which satisfy this property and produce two suitable outputs for feedback, is described by the following scattering matrix, in which q remains to be chosen.


The resulting system using such a photomixing network is shown in Figure 3. The concentrated optical power is available at the #1 output of the network, and again is maximized when y=f. Outputs 2 and 3 are each incident on optical detectors in order to generate a feedback signal for controlling y. The electrical signals from these two detectors are subtracted and input into a causal linear filter whose transfer function is denoted H(s), the output of which, y controls the delay compensator. The filter transfer function H(s) and the network design parameter q are to be chosen in order to maximize the concentrating power C.(1)

Let us call the intensity of either input in photons per second I0, and let the field amplitudes (E1, E2, X1, and X2) be in units such that the conversion between |E|2 and I is unity:

For a given phase error f-y, the intensity of the feedback outputs from the photomixing network, |Y2|2 and |Y3|2, are found to be:

So it can be seen that to first order, the gain of the system in detecting the phase error f-y is proportional to sin(q). Meanwhile, the concentrating power C suffers with increasing q according to:

Again, e2 denotes the mean-square value of the phase error f-y. The achieved concentrating power as a function of q is plotted in Figure 4  for rms phase errors of 0, .5, 1.0, and 1.5 radians. But to determine the ultimate concentrating power, we must determine the phase error, which itself generally will decrease as q increases since increasing q increases the effective gain of the feedback network, according to (14).

The optical feedback signals are each detected by a photon-counting detector whose quantum efficiency is denoted q. We will assume a low rate of dark counts, so that quantum limited detection is achieved. These two electrical signals are subtracted to form the feedback signal z, which we will take to be in units of photons per second. Then the expected value of z is simply:

Since we hope for the phase error f-y to be reasonably small, we have approximated the sine as the error angle itself, thus obtaining a linear model. This assumption however will be substantially violated in cases of lower light levels, calling into question the accuracy of such results.

z also contains a white noise component n, whose power spectral level is the sum of the shot noise contributions of the two detectors. Since the (two-sided) power spectral level of shot noise is equal to the count rate, we sum the count rate for the two detectors due to the optical signals Y2 and Y3 and obtain:

Thus modelling z as the mean value given by (16) plus a white noise term n whose power spectral level is given by (17), we can describe the resulting system with the equivalent circuit shown in Figure 5. G, the effective gain of the photomixer and detector combination is given by:

The feedback loop filter H(s) has yet to be selected among all possible causal filter functions. Once specified, it follows from analysis of Figure 5 that the power spectrum of the phase error e=f-y will be given by:

The power spectrum Sff of atmospheric phase noise f over the baseline B is given by (2), and the white noise level Snn is given by (17). Then the mean-square phase error 2 can then be found by integrating (19) over all frequency.

We now wish to find a filter transfer function H(s) and value of q which will maximize the concentrating power C. One suboptimum but simple choice for H is to employ an integrator of gain g:

Though suboptimum, we will find that the resulting performance is only somewhat poorer than the maximum obtainable. However the value of g must be optimized. For a given integrator gain g, the resulting mean-square error e2 can be found by solving:

where A is given by:

wb is the radian frequency equivalent of the low-frequency cutoff fb previously given by (3), thus:

Q is a function given by the following integral which is evaluated numerically.

A computer program is used to optimize the integrator gain g in order to minimize the error e2 given by (21). With g optimized, the concentrating power is plotted as a function of q for various input power levels in Figure 6. The assumed baseline is r0 corresponding, for instance, to first or second tier concentrators using subaperture diameters equal to r0. Similar curves for a baseline equal to 4r0 are plotted in Figure 8. The rms phase errors corresponding to the curves of Figure 6 are plotted in Figure 7.

The input power levels indicated on these graphs are normalized as follows:

In other words the curve labelled 1.0 corresponds to 1 photon per coherence time T0 for the case of a detector with 100% quantum efficiency (or, in general, 1/q photons per T0 for a detector quantum efficiency of q). It can be seen that normalized input powers as low as 1 or 2 are able to produce significant power concentrations, although these results may be called into question as they entail uncorrected phase errors of more than 1 radian rms, which defies the small signal sine approximation used in (16). Figure 8 indicates that somewhat stronger input powers are necessary in the case of a longer baseline to achieve the same concentrating powers.

Instead of using a simple integrator for H(s), an optimum control system can be derived on the basis of a Kalman filter for the estimation of f using the model of Figure 5 with the feedback path removed. The design of an optimum estimation filter for a random process whose power spectrum is specified by a rational function with additive white noise, is specified in [M. C. Yovits, J. L. Jackson, "Linear filter optimization with game theory considerations," IRE Nat. Conv. Rec., part 4, pp 193-199, 1955] and an expression for the mean-square estimation error is supplied. Unfortunately f does not have a power spectrum given by a rational function, however a rational function of some order can be obtained to approximate the actual function to any specified accuracy. Therefore it is in principle possible to design a loop filter which will approach the residual phase correction error given by the following expression:

Substituting in previously determined values for the shot noise level Snn, the power spectrum of atmospheric phase noise Sf f, and the effective gain G, we obtain the following result for the mean-square phase error:


The integral as a function of A has been evaluated numerically. Unfortunately A is itself a function of e2, the result we are seeking. Thus (27) is solved iteratively.

Plots of the concentrating power C versus q for various input powers, for a baseline B=r0, are plotted in Figure 9. The dashed lines in Figure 9 correspond to the comparable values obtained using a simple integrator for H(s), as already plotted in Figure 6. As can be seen, the simple integrator delivers performance approaching the ideal filter, especially at larger q where the reduction in a1 (11) accounts for a larger part of the loss in concentrating power, rather than the phase error. In any case, the peak value of the suboptimum curves are not greatly reduced below the peak value using the ideal filter, indicating that even a first order filter would be a reasonable (although unnecessarily simple) choice.


The expected concentration of optical power based on (27) multiplies the assumed input power at each tier to obtain the input power for the following tier. Baselines for each tier are based on the hexagonal aperture pattern values shown in Table I. Figure 10 shows the result of power concentration along a 10 tier tree, for subapertures of diameter r0 collecting light whose normalized fluxes are 1.0, 1.4, 2.0, 2.8, 4.0, 5.6, and 8.0 photons per r02 per T0 (with ideal detectors used for phase detection). At each tier the computer has found the optimum q for the concentrator, and that value in degrees is printed on the graph. It can be seen that incident fluxes of 1.0 and 1.4 are wholly insufficient to allow optical concentration, whereas the net concentrating power over 10 tiers (found as the decibel difference between a curve at tier 0 and tier 10), increases from 5 dB at a flux of 2.0, to 22 dB for an incident flux of 8.0. Perfect concentration (which can be almost obtained with a large incident flux) would be 30 dB for 10 tiers.

The thresholding behavior relative to optical input power is clearly depicted in Figure 11 in which the output of a concentrator tree with 6, 8, or 10 tiers is plotted versus the incident flux, for aperture diameters equal to 2r0 [Note that the figure's caption is in error - J.M.]. The information content of photon streams with normalized fluxes below about 2 are simply insufficient to permit optical concentration, whereas much larger brightness levels rapidly take advantage of the capability of the concentrator system. Changing the diameter of the subapertures from .5r0 to 3r0 causes almost no change in the threshold point. For a fixed number of tiers, 2r0 apertures, which individually collect about 50% more light than r0 apertures, will produce substantially larger concentrated outputs. However for a given total collecting area, the r0-sized subapertures are able to concentrate more light (but require four times the concentration hardware).


If one were to build a guided wave light concentrating tree using phase shifters and photomixing networks, it is questionable whether the three-output photomixing network shown in Figure 3, described by (10) - (12), would be a practical choice. However the two-output balanced photomixer shown in Figure 2 is routinely implemented with a 50% transmissive mirror, or in guided wave technology, as a directional coupler. In fact, the net response corresponding to the three-output photomixer shown in Figure 3, yielding intensities described in (14) and (15), can be implemented using the balanced two-output photomixer with temporal multiplexing, as depicted in Figure 12. That is accomplished by phase modulating one input to the network with a square wave of amplitude ±q. The mean power concentrated in the summing output is identical to the concentrated output |Y1|2 obtained from the three-output device evaluated in (15). The electrical output of a detector observing the light of the difference output is treated alternately as either the |Y2|2 signal or the |Y3|2 output, depending on the polarity of the dithered phase. Again, the average signals for |Y2|2 and |Y3|2 are identical to those found in (14) for the three-output photomixer used in Figure 3.

In the construction of a practical concentrator module, the phase dithering required to implement this scheme would simply employ the delay correction compensator already required, with the ±q square wave added to the desired phase correction feedback signal y, as shown in Figure 12. In addition to the greatly simplified fabrication of such a device, two other advantages stand out. First, only a single low-noise photon-counting detector is required (rather than two), and the net noise power due to dark counts is cut in half. Secondly, with such an apparatus, the value of q may be easily varied "on the fly" simply by altering the amplitude of the square wave applied to the phase shifter. q could thus be adapted to different atmospheric conditions and input light levels, rather than being hard-wired to some "compromise" value.

It should also be noted that a practical concentrator module would not require two individual delay compensators as depicted in Figure 3. In that depiction, balanced phase shifts insured that correction of the relative phase shift between the two inputs did not alter the phase of the output signal, which we intend to be the average phase of the inputs. However the +y/2 and -y/2 phase shifts of Figure 3 are equivalent to the single -y phase shift in Figure 12 plus a phase shift of +y/2 in the concentrated light output. That phase shift in the output, however, can just as well be absorbed into the phase shift of the next tier concentrator module, making the additional hardware superfluous. Of course, in such an implementation, the concentrator modules are no longer technically "independent" as originally stipulated.

Although a concentrator tree as described could be designed to collect light from a lenslet array using the light gathered from a large aperture, an equally effective implementation would result from 2M r0 sized objectives placed adjacently in a dense array as depicted in Figure 1. This would generally be more cost effective, since a large number of small objectives are less expensive than an equivalent large objective of the same total area.


While the concentrator tree analyzed above may be a practical choice for feeding an interferometer designed around fiber optic sources, the quantitative results may loosely apply to more conventional adaptive optics systems when used for the purpose of concentrating starlight into a single spatial mode. Although the details of the phase sensor (using the three-output photomixer in Figure 3) or the correction optics (using a controlled single-mode delay compensator) may appear unlike an imaging adaptive optics system, close parallels may be drawn.

For instance, consider a conventional Shack-Hartmann sensor detecting the phase gradient applying to a subaperture in the case of viewing a single point source. Viewing the sensor in a single transverse dimension, we can loosely view the two sides of the lenslet as two independent apertures of half the size, each producing an image with twice the width, and interfering at the detector plane. If we were to divide the potentially illuminated area of the detector plane in 3 regions, then with the two apertures exactly in phase (no wavefront tilt), most of the power would fall in the center region. With a substantial phase difference between the two half apertures (corresponding to a wavefront tilt), constructive interference would occur on one of the side regions and destructive interference on the other. The three regions are thus similar to the three output modes of the three-output photomixer in Figure 3. Likewise, the delay actuators in Figure 3 are similar to piston mode actuators in a wavefront correction system. Thus one would expect sensitivity limits for the system analyzed above to roughly apply to a well designed conventional adaptive optics system observing a point source.

Thus the results of Figure 11 in which the concentrated output power is plotted as a function of incident flux would be expected to be roughly applicable to the performance of a well designed conventional system. The incident flux in Figure 11 has been normalized to photons per r02 per T0 (for an ideal detector in the wavefront sensor). The plot indicates that a threshold of approximately two photons per r02 per T0 is required for concentration to become possible. Thus it could be concluded that a different physical design would be subject to the same approximate limitations.


1. Note that in Figure 3, instead of using a single delay compensator, we have split the required delay compensation into two balanced delays affecting both inputs, in order that the absolute phase of the output signal will be the mean phase of the inputs. While having no effect on the power concentration, this will prevent adjustments to y in tier m from affecting the phase of the light input to tier m+1 and thus altering the statistics of the random phase affecting that tier.