This is a detailed paper written to accompany a much shorter paper (link through main page) submitted to the ASP Conference Proceedings, based on a poster presented at the ASP annual meeting symposium on interferometry and adaptive optics, June 28 - July 1, 1998.
Notes regarding this HTML version:
All of the figures have been
placed in a separate file. Access all figures through the hyperlinks.
(Depending on your
browser and installed fonts, you might have some problems with certain
characters in the text! However the display equations are in graphic format
and will come through fine.)
Concentration of Starlight from Large Apertures into a Single Spatial Mode for Long-Baseline Interferometry
(Unpublished version)
Jeff Meisner
ABSTRACT
A long-baseline optical stellar interferometer requires a minimum level of optical
power available from each arm in order to operate in the fringe-tracking mode which
enables coherent integration of fringe visibility. That optical power must be
concentrated in a single spatial mode in order to interfere coherently. However
atmospheric seeing places a limit on the amount of optical power that will be
accepted into a single spatial mode for apertures much larger than the Fried
parameter, thus placing a magnitude limit on the coherent operation of the
interferometer.
However the use of an adaptive optics system may enable larger apertures to
concentrate greater amounts of optical power in a single mode, thus extending the
magnitude limit of the interferometer. Aside from systems using laser guide stars,
an adaptive optics system requires a feedback signal derived from the detection of
a portion of the collected starlight in order to co-phase the sub-apertures.
Increasing the portion of light directed to the feedback system will therefore allow
the adaptive optics system to operate on dimmer objects.
On the other hand, the optical power which is sacrificed for the production of the
feedback signal becomes unavailable for the ultimate use by the interferometer.
However any light which would not successfully be concentrated into the output mode
may be obtained "cost free." This observation leads one to different design criteria
for an adaptive optics system used for concentrating light into a single mode, as
opposed to one designed for high resolution imaging by a single large aperture.
The sensitivity limits and light concentrating power of any such adaptive optics
system can be found by the analysis of a hypothetical guided wave optical circuit
forming a binary tree structure. Optical power is concentrated from subapertures
feeding the branches of the tree toward the root yielding starlight concentrated in
a single mode which may supply one arm of a long-baseline interferometer.
Concentration of light along the tree structure occurs at 2-input modules each of
which are similar, and are optimized for maximum light concentration at each stage.
The control system for each module operates independently of the others, and is
optimized on the basis of the power spectrum of phase noise expected for a given
level of atmospheric turbulence. Performance limits are obtained for the resultant
optical concentrating power as a function of incident flux, the Fried parameter, and
the atmospheric coherence time parameter.
INTRODUCTION
A long-baseline stellar interferometer can accurately determine
underlying fringe visibilities only if interference over the entire
pupil plane occurs in a single phase, or if the rms phase error
over that area can be accurately determined. One simple and
effective means of insuring this condition is by limiting the
collecting aperture relative to the Fried parameter r0. This
ensures that both beams have almost all of their power in a single
spatial mode. Interference can only occur between waves launched
into the same mode. Interference occurring in other spatial modes
will occur in random relative phases and not substantially
contribute to the signal, yet the detected photons will still
contribute shot noise. Not only will the signal-to-noise ratio not
be augmented by additional light, but calibration of visibility
measurements will suffer as varying atmospheric conditions alter
the amount of optical power in the intended spatial mode. For dim
or weakly correlated sources, lack of a sufficient signal-to-noise
ratio will prevent fringe-tracking and coherent integration of
fringe visibility, greatly reducing the performance of the
instrument when observing such objects.
Aside from limiting the primary apertures, spatial mode filtering
can be obtained using pinholes or single-mode fibers. In each case
the amount of light accepted into a single mode is limited by the
Fried parameter. However adaptive optics and other active systems
may be capable of measuring and partially compensating for random
phase functions over an aperture larger than r0. The simplest such
system, tip-tilt correction, can appreciably augment the power
obtained in a single mode. Higher order adaptive optics systems
have the potential to further concentrate power into a single mode,
according to the factor by which the Strehl ratio is improved.
MODEL
Consider the collection of light from a virtual point source by a
square or hexagonal array of 2M subapertures as depicted in the
left side of
Figure 1. Each subaperture collects a sample of the
same wavefront subject to a random phase shift dependent on the
column density of the atmosphere above that subaperture (intensity
variations will be ignored in the present analysis). The light
from each subaperture is collected into an optical fiber or similar
single-mode guided wave structure which enters an optical circuit
for the purpose of concentrating the light received from the 2M
subapertures into a single mode.
Now suppose that we have designed a "concentrator module" which
accepts two optical signals derived from the same source but with
a random phase shift between them, and outputs most of the light
from the two inputs into a single output mode. Then using 2M-1
such concentrator modules in an M-tier tree structure, the light
gathered from the 2M subapertures may be concentrated into a single
spatial mode as shown in Figure 1 for the case of 16 subapertures
(M=4) using 15 such concentrator modules.
Each of the 8 first-tier concentrators accepts inputs received from
adjacent subapertures (0 and 1; 2 and 3; etc.) since these present
the shortest baselines and thus have the smallest rms
atmospherically induced relative phase shifts. The phase of the
light input to a first tier concentrator module will be
approximately the same as the phase of the wavefront as observed at
the position of the center of the subaperture from which it is
collected. The output of a first tier concentrator which combines
two adjacent subapertures will have a phase approximately the same
as that of the wavefront at the midpoint position between the
centers of the adjacent subapertures. In Figure 1 these positions
are marked with small dots. Likewise the outputs of the second
tier concentrators have phases which are approximately the same as
the phases of the wavefront present at positions (indicated by
larger dots) midpoint between the virtual positions defined by the
first tier concentrators, etc. In general, concentration at tier
m will involve correcting for a random phase function f(t) whose
statistics correspond to the phase difference between virtual
points defined by the two outputs from concentrator modules at tier
m-1, producing a concentrated optical signal whose phase is
approximately that of a virtual point midway between the virtual
points defined by the two previous concentrators.
For circular subapertures each of diameter D closely packed in a rectangular or hexagonal pattern, the length of the effective baseline for concentrating light at the mth tier, is given by Table I.
The design of a concentrator module at tier m should be tailored to the temporal statistics of the random process f which denotes the relative phase between its two optical inputs. The (2-sided) power spectrum of f over a baseline of length B is approximately given by:
where fb, the low frequency cutoff due to the baseline B, is given by:
V0 denotes the effective wind velocity of the turbulent atmospheric
layers, and k is a constant related to the direction of that wind
with respect to the baseline. In theory, if the wind were parallel
to the baseline, k would be 2; for the wind perpendicular to the
baseline k takes the value of 3. For simplicity, an intermediate
value is employed. T0 in (2) is the atmospheric coherence time
parameter defined as r0/V0. We have assumed an infinite outer scale
of turbulence, which is essentially a worst-case assumption.
In a practical system these parameters would be estimated empirically, for instance, by observing the power spectrum of phase noise. The amplitude of the high-frequency asymptote is governed solely by T0, while the amplitude of the low-frequency curves (and the cutoff frequencies fb) are also a function of the baseline, as given by Table I for the various tiers of the concentrator circuit. The integral of Sff over all frequency will equal the well known result for the mean-square value of f over the baseline B:
Clearly the increased phase noise at longer baselines (higher
tiers) will make the task of correcting that phase more difficult.
On the other hand that increased difficulty at higher tiers will be
ameliorated by the stronger optical signals present at their
inputs, due to the previous stages of optical concentration. To
evaluate the net performance of the tree structure shown in
Figure 1, we must first design and analyze the operation of a single
concentrator module.
THE CONCENTRATOR MODULE
Consider two optical signals E1 and E2 which differ only by the phase shift f:
If f is known, then the power from E1 and E2 can be easily combined using the optical circuit depicted in Figure 2. E2 is first run through a delay compensator which shifts its phase by y. Knowing f, we set y to the same value, so that X1 and X2, the inputs to the passive network, are now identical signals, both equal to E1. The network produces outputs which are the sum and difference of the inputs. Since X1=X2, the output Y2 is zero, but the output Y1 has an amplitude of 2E1, or double the power of a single input. Of course this perfect concentration of the input power was only possible because we knew the relative phase difference f. In general, for a setting of the delay compensator y not equal to f, the resulting outputs of the network in Figure 2 would be given by:
If the phase f were unknown (or, equivalently, if E1 and E2 were
incoherent relative to each other) then concentration of the input
light would not be possible. Instead, the expected intensities of
the outputs would be found by averaging the cosine over all angles
to yield zero, so that the expected output powers would each be
|E1|2, the same as the power of each input, in accordance with the
brightness theorem.
In the important practical case in which an estimator for f is used in which y is modelled as a gaussian random variable with a mean equal to f and a variance of e2, the expected outputs would be:
In (8) we have defined the concentrating power C as the factor by
which the expected output power exceeds the individual input
powers. Clearly C cannot exceed 2, and a concentrating power which
is not significantly greater than unity is obviously of no value.
Now let us look at schemes for controlling y in order to maximize
the concentrating power C. Consider the network shown in Figure 2.
The light output at Y1 is the useful output, but the light output
at Y2 is unused and can be observed. In fact, the amount of light
observed at Y2 will be an indication of the concentrating power
achieved, according to (9). Unfortunately the power level observed
at Y2 is uncorrelated with the control error f-y, and thus
cannot be used in a linear system to servo the delay compensator.
Instead we must substitute a network which has at least three
outputs.
A lossless network implies a unitary scattering matrix. For the application of concentrating light when y=f so that the two optical inputs to the network are identical, we desire an output which sums those inputs in phase and sends relatively little power to the other outputs. The class of possible photo-mixing networks which satisfy this property and produce two suitable outputs for feedback, is described by the following scattering matrix, in which q remains to be chosen.
where:
The resulting system using such a photomixing network is shown in
Figure 3. The concentrated optical power is available at the #1
output of the network, and again is maximized when y=f.
Outputs 2 and 3 are each
incident on optical detectors
in order to generate a feedback
signal for controlling y.
The electrical signals from
these two detectors are
subtracted and input into a
causal linear filter whose
transfer function is denoted
H(s), the output of
which, y
controls the delay compensator.
The filter transfer function
H(s) and the network design
parameter q are to be
chosen in order to maximize the
concentrating power C.(1)
Let us call the intensity of either input in photons per second I0, and let the field amplitudes (E1, E2, X1, and X2) be in units such that the conversion between |E|2 and I is unity:
For a given phase error f-y, the intensity of the feedback outputs from the photomixing network, |Y2|2 and |Y3|2, are found to be:
So it can be seen that to first order, the gain of the system in detecting the phase error f-y is proportional to sin(q). Meanwhile, the concentrating power C suffers with increasing q according to:
Again, e2 denotes the mean-square value of the phase error
f-y. The achieved concentrating power as a function of q
is plotted in Figure
4  for rms phase errors of 0, .5, 1.0, and 1.5
radians. But to determine the ultimate concentrating power, we
must determine the phase error, which itself generally will
decrease as q increases since increasing q increases the
effective gain of the feedback network, according to (14).
The optical feedback signals are each detected by a photon-counting detector whose quantum efficiency is denoted q. We will assume a low rate of dark counts, so that quantum limited detection is achieved. These two electrical signals are subtracted to form the feedback signal z, which we will take to be in units of photons per second. Then the expected value of z is simply:
Since we hope for
the phase error
f-y to be
reasonably small,
we have
approximated the
sine as the error
angle itself, thus
obtaining a linear
model. This
assumption however
will be
substantially
violated in cases
of lower light levels, calling into question the accuracy of such
results.
z also contains a white noise component n, whose power spectral level is the sum of the shot noise contributions of the two detectors. Since the (two-sided) power spectral level of shot noise is equal to the count rate, we sum the count rate for the two detectors due to the optical signals Y2 and Y3 and obtain:
Thus modelling z as the mean value given by (16) plus a white noise term n whose power spectral level is given by (17), we can describe the resulting system with the equivalent circuit shown in Figure 5. G, the effective gain of the photomixer and detector combination is given by:
The feedback loop filter H(s) has yet to be selected among all possible causal filter functions. Once specified, it follows from analysis of Figure 5 that the power spectrum of the phase error e=f-y will be given by:
The power spectrum Sff of atmospheric phase noise f over the
baseline B is given by (2), and the white noise level Snn is given
by (17). Then the mean-square phase error 2 can then be found by
integrating (19) over all frequency.
We now wish to find a filter transfer function H(s) and value of q which will maximize the concentrating power C. One suboptimum but simple choice for H is to employ an integrator of gain g:
Though suboptimum, we will find that the resulting performance is only somewhat poorer than the maximum obtainable. However the value of g must be optimized. For a given integrator gain g, the resulting mean-square error e2 can be found by solving:
where A is given by:
wb is the radian frequency equivalent of the low-frequency cutoff fb previously given by (3), thus:
Q is a function given by the following integral which is evaluated numerically.
A computer program is
used to optimize the
integrator gain g in
order to minimize the
error e2 given by
(21). With g optimized,
the concentrating power
is plotted as a function
of q for various
input power levels in
Figure 6. The assumed
baseline is r0
corresponding, for
instance, to first or
second tier
concentrators using
subaperture diameters
equal to r0. Similar
curves for a baseline
equal to 4r0 are plotted
in Figure 8. The rms
phase errors
corresponding to the
curves of Figure 6 are
plotted in Figure 7.
The input power levels indicated on these graphs are normalized as follows:
In other words the curve
labelled 1.0 corresponds
to 1 photon per
coherence time T0 for
the case of a detector with 100% quantum efficiency (or, in
general, 1/q photons per T0 for a detector quantum efficiency of
q). It can be seen that normalized input powers as low as 1 or 2
are able to produce significant power concentrations, although
these results may be called into question as they entail
uncorrected phase errors
of more than 1 radian
rms, which defies the
small signal sine
approximation used in
(16). Figure 8
indicates that somewhat
stronger input powers
are necessary in the
case of a longer
baseline to achieve the
same concentrating
powers.
Instead of using a simple integrator for H(s), an optimum control system can be derived on the basis of a Kalman filter for the estimation of f using the model of Figure 5 with the feedback path removed. The design of an optimum estimation filter for a random process whose power spectrum is specified by a rational function with additive white noise, is specified in [M. C. Yovits, J. L. Jackson, "Linear filter optimization with game theory considerations," IRE Nat. Conv. Rec., part 4, pp 193-199, 1955] and an expression for the mean-square estimation error is supplied. Unfortunately f does not have a power spectrum given by a rational function, however a rational function of some order can be obtained to approximate the actual function to any specified accuracy. Therefore it is in principle possible to design a loop filter which will approach the residual phase correction error given by the following expression:
Substituting in previously determined values for the shot noise level Snn, the power spectrum of atmospheric phase noise Sf f, and the effective gain G, we obtain the following result for the mean-square phase error:
where:
The integral as a function of A has been evaluated numerically.
Unfortunately A is itself a function of
e2, the result we are
seeking. Thus (27) is solved iteratively.
Plots of the concentrating power
C versus q for various input
powers, for a baseline B=r0, are
plotted in Figure 9. The dashed
lines in Figure 9 correspond to
the comparable values obtained
using a simple integrator for
H(s), as already plotted in
Figure 6. As can be seen, the
simple integrator delivers
performance approaching the
ideal filter, especially at
larger q where the reduction
in a1 (11) accounts for a larger
part of the loss in
concentrating power, rather than
the phase error. In any case,
the peak value of the suboptimum curves are not greatly reduced
below the peak value using the ideal filter, indicating that even
a first order filter would be a reasonable (although unnecessarily
simple) choice.
NET PERFORMANCE OF THE CONCENTRATOR TREE
The expected concentration of optical power based on (27) multiplies the assumed input power at each tier to obtain the input power for the following tier. Baselines for each tier are based on the hexagonal aperture pattern values shown in Table I. Figure 10 shows the result of power concentration along a 10 tier tree, for subapertures of diameter r0 collecting light whose normalized fluxes are 1.0, 1.4, 2.0, 2.8, 4.0, 5.6, and 8.0 photons per r02 per T0 (with ideal detectors used for phase detection). At each tier the computer has found the optimum q for the concentrator, and that value in degrees is printed on the graph. It can be seen that incident fluxes of 1.0 and 1.4 are wholly insufficient to allow optical concentration, whereas the net concentrating power over 10 tiers (found as the decibel difference between a curve at tier 0 and tier 10), increases from 5 dB at a flux of 2.0, to 22 dB for an incident flux of 8.0. Perfect concentration (which can be almost obtained with a large incident flux) would be 30 dB for 10 tiers.
The thresholding
behavior relative to
optical input power is
clearly depicted in
Figure 11 in which the
output of a concentrator
tree with 6, 8, or 10
tiers is plotted versus
the incident flux, for
aperture diameters equal
to 2r0 [Note that the figure's caption is in error -
J.M.]. The information
content of photon
streams with normalized
fluxes below about 2 are
simply insufficient to
permit optical
concentration, whereas
much larger brightness
levels rapidly take
advantage of the
capability of the
concentrator system. Changing the diameter of the subapertures
from .5r0 to 3r0 causes almost no change in the threshold point.
For a fixed number of tiers, 2r0 apertures, which individually
collect about 50% more light than r0 apertures, will produce
substantially larger concentrated outputs. However for a given
total collecting area, the r0-sized subapertures are able to
concentrate more light (but require four times the concentration
hardware).
PRACTICAL IMPLEMENTATIONS
If one were to build a
guided wave light
concentrating tree using
phase shifters and
photomixing networks, it
is questionable whether
the three-output
photomixing network
shown in Figure 3,
described by (10) -
(12), would be a
practical choice.
However the two-output
balanced photomixer
shown in Figure 2 is
routinely implemented
with a 50% transmissive
mirror, or in guided
wave technology, as a
directional coupler. In
fact, the net response corresponding to the three-output photomixer
shown in Figure 3, yielding intensities described in (14) and (15),
can be implemented using the balanced two-output photomixer with
temporal multiplexing, as depicted in Figure 12. That is
accomplished by phase modulating one input to the network with a
square wave of amplitude ±q. The mean power concentrated in
the summing output is identical to the concentrated output |Y1|2
obtained from the three-output device evaluated in (15). The
electrical output of a detector observing the light of the
difference output is treated alternately as either the |Y2|2 signal
or the |Y3|2 output, depending on the polarity of the dithered
phase. Again, the average signals for |Y2|2 and |Y3|2 are identical
to those found in (14) for the three-output photomixer used in
Figure 3.
In the construction of a practical concentrator module, the phase
dithering required to implement this scheme would simply employ the
delay correction compensator already required, with the ±q
square wave added to the desired phase correction feedback signal
y, as shown in Figure 12. In addition to the greatly simplified
fabrication of such a device, two other advantages stand out.
First, only a single low-noise photon-counting detector is required
(rather than two), and the net noise power due to dark counts is
cut in half. Secondly, with such an apparatus, the value of q
may be easily varied "on the fly" simply by altering the amplitude
of the square wave applied to the phase shifter. q could thus
be adapted to different atmospheric conditions and input light
levels, rather than being hard-wired to some "compromise" value.
It should also be noted that a practical concentrator module would
not require two individual delay compensators as depicted in Figure
3. In that depiction, balanced phase shifts insured that
correction of the relative phase shift between the two inputs did
not alter the phase of the output signal, which we intend to be the
average phase of the inputs. However the +y/2 and -y/2 phase
shifts of Figure 3 are equivalent to the single -y phase shift in
Figure 12 plus a phase shift of +y/2 in the concentrated light
output. That phase shift in the output, however, can just as well
be absorbed into the phase shift of the next tier concentrator
module, making the additional hardware superfluous. Of course, in
such an implementation, the concentrator modules are no longer
technically "independent" as originally stipulated.
Although a concentrator tree as described could be designed to
collect light from a lenslet array using the light gathered from a
large aperture, an equally effective implementation would result
from 2M r0 sized objectives placed adjacently in a dense array as
depicted in Figure 1. This would generally be more cost effective,
since a large number of small objectives are less expensive than an
equivalent large objective of the same total area.
APPLICATION OF ANALYSIS TO ALTERNATIVE HARDWARE CONFIGURATIONS
While the concentrator tree analyzed above may be a practical
choice for feeding an interferometer designed around fiber optic
sources, the quantitative results may loosely apply to more
conventional adaptive optics systems when used for the purpose of
concentrating starlight into a single spatial mode. Although the
details of the phase sensor (using the three-output photomixer in
Figure 3) or the correction optics (using a controlled single-mode
delay compensator) may appear unlike an imaging adaptive optics
system, close parallels may be drawn.
For instance, consider a conventional Shack-Hartmann sensor
detecting the phase gradient applying to a subaperture in the case
of viewing a single point source. Viewing the sensor in a single
transverse dimension, we can loosely view the two sides of the
lenslet as two independent apertures of half the size, each
producing an image with twice the width, and interfering at the
detector plane. If we were to divide the potentially illuminated
area of the detector plane in 3 regions, then with the two
apertures exactly in phase (no wavefront tilt), most of the power
would fall in the center region. With a substantial phase
difference between the two half apertures (corresponding to a
wavefront tilt), constructive interference would occur on one of
the side regions and destructive interference on the other. The
three regions are thus similar to the three output modes of the
three-output photomixer in Figure 3. Likewise, the delay actuators
in Figure 3 are similar to piston mode actuators in a wavefront
correction system. Thus one would expect sensitivity limits for
the system analyzed above to roughly apply to a well designed
conventional adaptive optics system observing a point source.
Thus the results of Figure 11 in which the concentrated output
power is plotted as a function of incident flux would be expected
to be roughly applicable to the performance of a well designed
conventional system. The incident flux in Figure 11 has been
normalized to photons per r02 per T0 (for an ideal detector in the
wavefront sensor). The plot indicates that a threshold of
approximately two photons per r02 per T0 is required for
concentration to become possible. Thus it could be concluded that
a different physical design would be subject to the same
approximate limitations.
Footnotes:
1. Note that in Figure 3, instead of using a single delay compensator, we have split the required delay compensation into two balanced delays affecting both inputs, in order that the absolute phase of the output signal will be the mean phase of the inputs. While having no effect on the power concentration, this will prevent adjustments to y in tier m from affecting the phase of the light input to tier m+1 and thus altering the statistics of the random phase affecting that tier.