Jeff Meisner

*This is a detailed paper written to accompany
a much shorter paper (link through main page)
submitted to the ASP Conference Proceedings,
based on a poster presented at the ASP annual meeting symposium on interferometry
and adaptive optics, June 28 - July 1, 1998.*

**Notes regarding this HTML version:**
**All of the figures have been
placed in a separate file. Access all figures through the hyperlinks.
(Depending on your
browser and installed fonts, you might have some problems with certain
characters in the text! However the display equations are in graphic format
and will come through fine.)
**

**Concentration of Starlight from Large Apertures into a Single
Spatial Mode for Long-Baseline Interferometry**

(Unpublished version)

**Jeff Meisner**

__ABSTRACT__

A long-baseline optical stellar interferometer requires a minimum level of optical
power available from each arm in order to operate in the fringe-tracking mode which
enables coherent integration of fringe visibility. That optical power must be
concentrated in a single spatial mode in order to interfere coherently. However
atmospheric seeing places a limit on the amount of optical power that will be
accepted into a single spatial mode for apertures much larger than the Fried
parameter, thus placing a magnitude limit on the coherent operation of the
interferometer.

However the use of an adaptive optics system may enable larger apertures to
concentrate greater amounts of optical power in a single mode, thus extending the
magnitude limit of the interferometer. Aside from systems using laser guide stars,
an adaptive optics system requires a feedback signal derived from the detection of
a portion of the collected starlight in order to co-phase the sub-apertures.
Increasing the portion of light directed to the feedback system will therefore allow
the adaptive optics system to operate on dimmer objects.

On the other hand, the optical power which is sacrificed for the production of the
feedback signal becomes unavailable for the ultimate use by the interferometer.
However any light which would not successfully be concentrated into the output mode
may be obtained "cost free." This observation leads one to different design criteria
for an adaptive optics system used for concentrating light into a single mode, as
opposed to one designed for high resolution imaging by a single large aperture.

The sensitivity limits and light concentrating power of any such adaptive optics
system can be found by the analysis of a hypothetical guided wave optical circuit
forming a binary tree structure. Optical power is concentrated from subapertures
feeding the branches of the tree toward the root yielding starlight concentrated in
a single mode which may supply one arm of a long-baseline interferometer.
Concentration of light along the tree structure occurs at 2-input modules each of
which are similar, and are optimized for maximum light concentration at each stage.
The control system for each module operates independently of the others, and is
optimized on the basis of the power spectrum of phase noise expected for a given
level of atmospheric turbulence. Performance limits are obtained for the resultant
optical concentrating power as a function of incident flux, the Fried parameter, and
the atmospheric coherence time parameter.

**INTRODUCTION**

A long-baseline stellar interferometer can accurately determine
underlying fringe visibilities only if interference over the entire
pupil plane occurs in a single phase, or if the rms phase error
over that area can be accurately determined. One simple and
effective means of insuring this condition is by limiting the
collecting aperture relative to the Fried parameter r_{0}. This
ensures that both beams have almost all of their power in a single
spatial mode. Interference can only occur between waves launched
into the same mode. Interference occurring in other spatial modes
will occur in random relative phases and not substantially
contribute to the signal, yet the detected photons will still
contribute shot noise. Not only will the signal-to-noise ratio not
be augmented by additional light, but calibration of visibility
measurements will suffer as varying atmospheric conditions alter
the amount of optical power in the intended spatial mode. For dim
or weakly correlated sources, lack of a sufficient signal-to-noise
ratio will prevent fringe-tracking and coherent integration of
fringe visibility, greatly reducing the performance of the
instrument when observing such objects.

Aside from limiting the primary apertures, spatial mode filtering
can be obtained using pinholes or single-mode fibers. In each case
the amount of light accepted into a single mode is limited by the
Fried parameter. However adaptive optics and other active systems
may be capable of measuring and partially compensating for random
phase functions over an aperture larger than r_{0}. The simplest such
system, tip-tilt correction, can appreciably augment the power
obtained in a single mode. Higher order adaptive optics systems
have the potential to further concentrate power into a single mode,
according to the factor by which the Strehl ratio is improved.

**MODEL**

Consider the collection of light from a virtual point source by a
square or hexagonal array of 2^{M} subapertures as depicted in the
left side of
**Figure 1**. Each subaperture collects a sample of the
same wavefront subject to a random phase shift dependent on the
column density of the atmosphere above that subaperture (intensity
variations will be ignored in the present analysis). The light
from each subaperture is collected into an optical fiber or similar
single-mode guided wave structure which enters an optical circuit
for the purpose of concentrating the light received from the 2^{M}
subapertures into a single mode.

Now suppose that we have designed a "concentrator module" which
accepts two optical signals derived from the same source but with
a random phase shift between them, and outputs most of the light
from the two inputs into a single output mode. Then using 2^{M}-1
such concentrator modules in an M-tier tree structure, the light
gathered from the 2^{M} subapertures may be concentrated into a single
spatial mode as shown in **Figure 1** for the case of 16 subapertures
(M=4) using 15 such concentrator modules.

Each of the 8 first-tier concentrators accepts inputs received from
adjacent subapertures (0 and 1; 2 and 3; etc.) since these present
the shortest baselines and thus have the smallest rms
atmospherically induced relative phase shifts. The phase of the
light input to a first tier concentrator module will be
approximately the same as the phase of the wavefront as observed at
the position of the center of the subaperture from which it is
collected. The output of a first tier concentrator which combines
two adjacent subapertures will have a phase approximately the same
as that of the wavefront at the midpoint position between the
centers of the adjacent subapertures. In **Figure 1** these positions
are marked with small dots. Likewise the outputs of the second
tier concentrators have phases which are approximately the same as
the phases of the wavefront present at positions (indicated by
larger dots) midpoint between the virtual positions defined by the
first tier concentrators, etc. In general, concentration at tier
m will involve correcting for a random phase function f(t) whose
statistics correspond to the phase difference between virtual
points defined by the two outputs from concentrator modules at tier
m-1, producing a concentrated optical signal whose phase is
approximately that of a virtual point midway between the virtual
points defined by the two previous concentrators.

For circular subapertures each of diameter D closely packed in a
rectangular or hexagonal pattern, the length of the effective
baseline for concentrating light at the m^{th} tier, is given by Table
I.

The design of a concentrator module at tier m should be tailored to the temporal statistics of the random process f which denotes the relative phase between its two optical inputs. The (2-sided) power spectrum of f over a baseline of length B is approximately given by:

where f_{b}, the low frequency cutoff due to the baseline B, is given
by:

V_{0} denotes the effective wind velocity of the turbulent atmospheric
layers, and k is a constant related to the direction of that wind
with respect to the baseline. In theory, if the wind were parallel
to the baseline, k would be 2; for the wind perpendicular to the
baseline k takes the value of 3. For simplicity, an intermediate
value is employed. T_{0} in **(2)** is the atmospheric coherence time
parameter defined as r_{0}/V_{0}. We have assumed an infinite outer scale
of turbulence, which is essentially a worst-case assumption.

In a practical system these parameters would be estimated
empirically, for instance, by observing the power spectrum of phase
noise. The amplitude of the high-frequency asymptote is governed
solely by T_{0}, while the amplitude of the low-frequency curves (and
the cutoff frequencies f_{b}) are also a function of the baseline, as
given by Table I for the various tiers of the concentrator circuit.
The integral of S_{ff} over all frequency will equal the well known
result for the mean-square value of f over the baseline B:

Clearly the increased phase noise at longer baselines (higher
tiers) will make the task of correcting that phase more difficult.
On the other hand that increased difficulty at higher tiers will be
ameliorated by the stronger optical signals present at their
inputs, due to the previous stages of optical concentration. To
evaluate the net performance of the tree structure shown in
**Figure 1**, we must first design and analyze the operation of a single
concentrator module.

__THE CONCENTRATOR MODULE__

Consider two optical
signals E_{1} and E_{2 }which
differ only by the phase
shift f:

If f is known, then
the power from E_{1} and E_{2}
can be easily combined
using the optical
circuit depicted in
**Figure 2**. E_{2} is first run through a delay compensator which shifts
its phase by y. Knowing f, we set y to the same value, so
that X_{1} and X_{2}, the inputs to the passive network, are now identical
signals, both equal to E_{1}. The network produces outputs which are
the sum and difference of the inputs. Since X_{1}=X_{2}, the output Y_{2} is
zero, but the output Y_{1} has an amplitude of 2E_{1}, or double the
power of a single input. Of course this perfect concentration of
the input power was only possible because we knew the relative
phase difference f. In general, for a setting of the delay
compensator y not equal to f, the resulting outputs of the
network in **Figure 2** would be given by:

If the phase f were unknown (or, equivalently, if E_{1} and E_{2} were
incoherent relative to each other) then concentration of the input
light would not be possible. Instead, the expected intensities of
the outputs would be found by averaging the cosine over all angles
to yield zero, so that the expected output powers would each be
|E_{1}|^{2}, the same as the power of each input, in accordance with the
brightness theorem.

In the important practical case in which an estimator for f is
used in which y is modelled as a gaussian random variable with a
mean equal to f and a variance of e^{2}, the expected outputs
would be:

In **(8)** we have defined the *concentrating power* C as the factor by
which the expected output power exceeds the individual input
powers. Clearly C cannot exceed 2, and a concentrating power which
is not significantly greater than unity is obviously of no value.

Now let us look at schemes for controlling y in order to maximize
the concentrating power C. Consider the network shown in **Figure 2**.
The light output at Y_{1} is the useful output, but the light output
at Y_{2} is unused and can be observed. In fact, the amount of light
observed at Y_{2} will be an indication of the concentrating power
achieved, according to **(9)**. Unfortunately the power level observed
at Y_{2} is uncorrelated with the control error f-y, and thus
cannot be used in a linear system to servo the delay compensator.
Instead we must substitute a network which has at least three
outputs.

A lossless network implies a unitary scattering matrix. For the application of concentrating light when y=f so that the two optical inputs to the network are identical, we desire an output which sums those inputs in phase and sends relatively little power to the other outputs. The class of possible photo-mixing networks which satisfy this property and produce two suitable outputs for feedback, is described by the following scattering matrix, in which q remains to be chosen.

where:

The resulting system using such a photomixing network is shown in
**Figure 3**. The concentrated optical power is available at the #1
output of the network, and again is maximized when y=f.
Outputs 2 and 3 are each
incident on optical detectors
in order to generate a feedback
signal for controlling y.
The electrical signals from
these two detectors are
subtracted and input into a
causal linear filter whose
transfer function is denoted
H(s), the output of
which, y
controls the delay compensator.
The filter transfer function
H(s) and the network design
parameter q are to be
chosen in order to maximize the
concentrating power C.^{(1)}

Let us call the intensity of
either input in photons per second I_{0}, and let the field amplitudes
(E_{1}, E_{2}, X_{1}, and X_{2}) be in units such that the conversion between
|E|^{2} and I is unity:

For a given phase error f-y, the intensity of the feedback
outputs from the photomixing network, |Y_{2}|^{2} and |Y_{3}|^{2}, are found to
be:

So it can be seen that to first order, the gain of the system in detecting the phase error f-y is proportional to sin(q). Meanwhile, the concentrating power C suffers with increasing q according to:

Again, e^{2} denotes the mean-square value of the phase error
f-y. The achieved concentrating power as a function of q
is plotted in **Figure
4** for rms phase errors of 0, .5, 1.0, and 1.5
radians. But to determine the ultimate concentrating power, we
must determine the phase error, which itself generally will
decrease as q increases since increasing q increases the
effective gain of the feedback network, according to **(14)**.

The optical feedback signals are each detected by a photon-counting detector whose quantum efficiency is denoted q. We will assume a low rate of dark counts, so that quantum limited detection is achieved. These two electrical signals are subtracted to form the feedback signal z, which we will take to be in units of photons per second. Then the expected value of z is simply:

Since we hope for
the phase error
f-y to be
reasonably small,
we have
approximated the
sine as the error
angle itself, thus
obtaining a linear
model. This
assumption however
will be
substantially
violated in cases
of lower light levels, calling into question the accuracy of such
results.

z also contains a white noise component n, whose power spectral
level is the sum of the shot noise contributions of the two
detectors. Since the (two-sided) power spectral level of shot
noise is equal to the count rate, we sum the count rate for the two
detectors due to the optical signals Y_{2} and Y_{3} and obtain:

Thus modelling z as the mean value given by **(16)** plus a white noise
term n whose power spectral level is given by **(17)**, we can describe
the resulting system with the equivalent circuit shown in **Figure 5**.
G, the effective gain of the photomixer and detector combination is
given by:

The feedback loop filter H(s) has yet to be selected among all
possible causal filter functions. Once specified, it follows from
analysis of **Figure 5** that the power spectrum of the phase error
e=f-y will be given by:

The power spectrum S_{ff} of atmospheric phase noise f over the
baseline B is given by **(2)**, and the white noise level S_{nn} is given
by **(17)**. Then the mean-square phase error ^{2} can then be found by
integrating **(19)** over all frequency.

We now wish to find a filter transfer function H(s) and value of q which will maximize the concentrating power C. One suboptimum but simple choice for H is to employ an integrator of gain g:

Though suboptimum, we will find that the resulting performance is
only somewhat poorer than the maximum obtainable. However the
value of g must be optimized. For a given integrator gain g, the
resulting mean-square error e^{2} can be found by solving:

where A is given by:

w_{b} is the radian frequency equivalent of the low-frequency
cutoff f_{b} previously given by **(3)**, thus:

Q is a function given by the following integral which is evaluated numerically.

A computer program is
used to optimize the
integrator gain g in
order to minimize the
error e^{2} given by
**(21)**. With g optimized,
the concentrating power
is plotted as a function
of q for various
input power levels in
**Figure 6**. The assumed
baseline is r_{0}
corresponding, for
instance, to first or
second tier
concentrators using
subaperture diameters
equal to r_{0}. Similar
curves for a baseline
equal to 4r_{0} are plotted
in **Figure 8**. The rms
phase errors
corresponding to the
curves of **Figure 6** are
plotted in **Figure 7**.

The input power levels indicated on these graphs are normalized as follows:

In other words the curve
labelled 1.0 corresponds
to 1 photon per
coherence time T_{0} for
the case of a detector with 100% quantum efficiency (or, in
general, 1/q photons per T_{0} for a detector quantum efficiency of
q). It can be seen that normalized input powers as low as 1 or 2
are able to produce significant power concentrations, although
these results may be called into question as they entail
uncorrected phase errors
of more than 1 radian
rms, which defies the
small signal sine
approximation used in
**(16)**. **Figure 8**
indicates that somewhat
stronger input powers
are necessary in the
case of a longer
baseline to achieve the
same concentrating
powers.

Instead of using a
simple integrator for
H(s), an optimum control
system can be derived on
the basis of a Kalman
filter for the estimation of f using the model of **Figure 5** with
the feedback path removed. The design of an optimum estimation
filter for a random process whose power spectrum is specified by a rational function with
additive white noise, is specified in [M. C. Yovits, J. L. Jackson,
"Linear filter optimization with game theory considerations," I**RE
Nat. Conv. Rec.**, part 4, pp 193-199, 1955] and an expression for
the mean-square estimation error is supplied. Unfortunately f
does not have a power spectrum given by a rational function,
however a rational function of some order can be obtained to
approximate the actual function to any specified accuracy.
Therefore it is in principle possible to design a loop filter which
will approach the residual phase correction error given by the
following expression:

Substituting in previously determined values for the shot noise
level S_{nn}, the power spectrum of atmospheric phase noise S_{f
f}, and
the effective gain G, we obtain the following result for the mean-square phase error:

where:

The integral as a function of A has been evaluated numerically.
Unfortunately A is itself a function of
e^{2}, the result we are
seeking. Thus **(27)** is solved iteratively.

Plots of the concentrating power
C versus q for various input
powers, for a baseline B=r_{0}, are
plotted in **Figure 9**. The dashed
lines in **Figure 9** correspond to
the comparable values obtained
using a simple integrator for
H(s), as already plotted in
**Figure 6**. As can be seen, the
simple integrator delivers
performance approaching the
ideal filter, especially at
larger q where the reduction
in a_{1} **(11)** accounts for a larger
part of the loss in
concentrating power, rather than
the phase error. In any case,
the peak value of the suboptimum curves are not greatly reduced
below the peak value using the ideal filter, indicating that even
a first order filter would be a reasonable (although unnecessarily
simple) choice.

**NET PERFORMANCE OF THE CONCENTRATOR TREE**

The expected
concentration of optical
power based on **(27)**
multiplies the assumed
input power at each tier
to obtain the input
power for the following
tier. Baselines for
each tier are based on
the hexagonal aperture
pattern values shown in
Table I. **Figure 10**
shows the result of
power concentration
along a 10 tier tree,
for subapertures of
diameter r_{0} collecting
light whose normalized
fluxes are 1.0, 1.4,
2.0, 2.8, 4.0, 5.6, and
8.0 photons per r_{0}^{2} per T_{0} (with ideal detectors used for phase
detection). At each tier the computer has found the optimum q
for the concentrator, and that value in degrees is printed on the
graph. It can be seen that incident fluxes of 1.0 and 1.4 are
wholly insufficient to allow optical concentration, whereas the net
concentrating power over 10 tiers (found as the decibel difference
between a curve at tier 0 and tier 10), increases from 5 dB at a
flux of 2.0, to 22 dB for an incident flux of 8.0. Perfect
concentration (which can be almost obtained with a large incident
flux) would be 30 dB for 10 tiers.

The thresholding
behavior relative to
optical input power is
clearly depicted in
**Figure 11** in which the
output of a concentrator
tree with 6, 8, or 10
tiers is plotted versus
the incident flux, for
aperture diameters equal
to 2r_{0} [*Note that the figure's caption is in error* -
J.M.]. The information
content of photon
streams with normalized
fluxes below about 2 are
simply insufficient to
permit optical
concentration, whereas
much larger brightness
levels rapidly take
advantage of the
capability of the
concentrator system. Changing the diameter of the subapertures
from .5r_{0} to 3r_{0} causes almost no change in the threshold point.
For a fixed number of tiers, 2r_{0} apertures, which individually
collect about 50% more light than r_{0} apertures, will produce
substantially larger concentrated outputs. However for a given
**total **collecting area, the r_{0}-sized subapertures are able to
concentrate more light (but require four times the concentration
hardware).

**PRACTICAL IMPLEMENTATIONS**

If one were to build a
guided wave light
concentrating tree using
phase shifters and
photomixing networks, it
is questionable whether
the three-output
photomixing network
shown in **Figure 3**,
described by **(10)** -
**(12)**, would be a
practical choice.
However the two-output
balanced photomixer
shown in **Figure 2** is
routinely implemented
with a 50% transmissive
mirror, or in guided
wave technology, as a
directional coupler. In
fact, the net response corresponding to the three-output photomixer
shown in **Figure 3**, yielding intensities described in **(14)** and **(15)**,
can be implemented using the balanced two-output photomixer with
temporal multiplexing, as depicted in **Figure 12**. That is
accomplished by phase modulating one input to the network with a
square wave of amplitude ±q. The mean power concentrated in
the summing output is identical to the concentrated output |Y_{1}|^{2}
obtained from the three-output device evaluated in **(15)**. The
electrical output of a detector observing the light of the
difference output is treated alternately as either the |Y_{2}|^{2} signal
or the |Y_{3}|^{2} output, depending on the polarity of the dithered
phase. Again, the average signals for |Y_{2}|^{2} and |Y_{3}|^{2} are identical
to those found in **(14)** for the three-output photomixer used in
**Figure 3**.

In the construction of a practical concentrator module, the phase
dithering required to implement this scheme would simply employ the
delay correction compensator already required, with the ±q
square wave added to the desired phase correction feedback signal
y, as shown in **Figure 12**. In addition to the greatly simplified
fabrication of such a device, two other advantages stand out.
First, only a single low-noise photon-counting detector is required
(rather than two), and the net noise power due to dark counts is
cut in half. Secondly, with such an apparatus, the value of q
may be easily varied "on the fly" simply by altering the amplitude
of the square wave applied to the phase shifter. q could thus
be adapted to different atmospheric conditions and input light
levels, rather than being hard-wired to some "compromise" value.

It should also be noted that a practical concentrator module would
not require two individual delay compensators as depicted in **Figure
3**. In that depiction, balanced phase shifts insured that
correction of the relative phase shift between the two inputs did
not alter the phase of the output signal, which we intend to be the
average phase of the inputs. However the +y/2 and -y/2 phase
shifts of **Figure 3** are equivalent to the single -y phase shift in
**Figure 12** plus a phase shift of +y/2 in the concentrated light
output. That phase shift in the output, however, can just as well
be absorbed into the phase shift of the next tier concentrator
module, making the additional hardware superfluous. Of course, in
such an implementation, the concentrator modules are no longer
technically "independent" as originally stipulated.

Although a concentrator tree as described could be designed to
collect light from a lenslet array using the light gathered from a
large aperture, an equally effective implementation would result
from 2^{M} r_{0} sized objectives placed adjacently in a dense array as
depicted in **Figure 1**. This would generally be more cost effective,
since a large number of small objectives are less expensive than an
equivalent large objective of the same total area.

**APPLICATION OF ANALYSIS TO ALTERNATIVE HARDWARE CONFIGURATIONS**

While the concentrator tree analyzed above may be a practical
choice for feeding an interferometer designed around fiber optic
sources, the quantitative results may loosely apply to more
conventional adaptive optics systems when used for the purpose of
concentrating starlight into a single spatial mode. Although the
details of the phase sensor (using the three-output photomixer in
**Figure 3**) or the correction optics (using a controlled single-mode
delay compensator) may appear unlike an imaging adaptive optics
system, close parallels may be drawn.

For instance, consider a conventional Shack-Hartmann sensor
detecting the phase gradient applying to a subaperture in the case
of viewing a single point source. Viewing the sensor in a single
transverse dimension, we can loosely view the two sides of the
lenslet as two independent apertures of half the size, each
producing an image with twice the width, and interfering at the
detector plane. If we were to divide the potentially illuminated
area of the detector plane in 3 regions, then with the two
apertures exactly in phase (no wavefront tilt), most of the power
would fall in the center region. With a substantial phase
difference between the two half apertures (corresponding to a
wavefront tilt), constructive interference would occur on one of
the side regions and destructive interference on the other. The
three regions are thus similar to the three output modes of the
three-output photomixer in **Figure 3**. Likewise, the delay actuators
in **Figure 3** are similar to piston mode actuators in a wavefront
correction system. Thus one would expect sensitivity limits for
the system analyzed above to roughly apply to a well designed
conventional adaptive optics system observing a point source.

Thus the results of **Figure 11** in which the concentrated output
power is plotted as a function of incident flux would be expected
to be roughly applicable to the performance of a well designed
conventional system. The incident flux in **Figure 11** has been
normalized to photons per r_{0}^{2} per T_{0} (for an ideal detector in the
wavefront sensor). The plot indicates that a threshold of
approximately two photons per r_{0}^{2} per T_{0} is required for
concentration to become possible. Thus it could be concluded that
a different physical design would be subject to the same
approximate limitations.

**Footnotes:**

1. Note that in **Figure 3**, instead of using a single delay compensator, we
have split the required delay compensation into two balanced delays affecting
both inputs, in order that the absolute phase of the output signal will be the
mean phase of the inputs. While having no effect on the power concentration,
this will prevent adjustments to y in tier m from affecting the phase of the
light input to tier m+1 and thus altering the statistics of the random phase
affecting that tier.