A Better Way to Search for ETI Signals

by Dr. Linas Vepstas <linas@linas.org>
April 2003

Abstract:
A better way to search for radio signals from extra-terrestrial intelligences is to look for PRN-encoded signals. Such encoding techniques offer powerful signal-to-noise improvements, making communication far, far more practical than using unencoded radio signals. This search technique will prove practical in the coming decades as Moore's Law provides the CPU cycles needed for such a search. ETI know Moore's Law, and depend on it to become intelligible.

Caution: This is an informal essay, a draft; conclusions in different parts of this text may contradict themselves. This is 'thinking aloud'/doodling with ideas.

Using Pseudo-Random Noise to Search for ETI

The current efforts being put forth by the SETI@Home team focus on searching for continuous-wave radio signals and energy spikes coming from the sky as signs of extra-terrestrial intelligence. This has been a marvelous effort, and has combed through very large quantities of recorded radio data on the worlds largest "supercomputer". But it strikes me that we are almost certainly looking for the wrong signal. We should be searching for 'PRN' signals, a form of carrier modulation and encoding that provides vastly superior noise, jamming and interference resistance. In particular, it offers interference resistance from both natural astronomical continuous-wave noise sources, and from terrestrial sources.

The PRN technique is used by the GPS (Global Positioning System), some cell radios, and by covert (spy) agencies. Pseudo-Random Noise phase-shift-keying has some very desirable noise-suppression properties that make it a natural candidate for use for communication by ETI. It allows low power transmissions to cut through noise, thus reducing the amount of power that a transmitter needs to be heard. For example, the GPS encoding system allows GPS satellites to transmit at power levels 30 dB below what they might 'otherwise' have to transmit. This is a huge amount of savings: three orders of magnitude. Of course, nothing is free: the penalty one pays is vastly reduced (three orders of magnitude) data rate. The GPS system has a 50-bits-per-second data rate, despite chewing up about 5 MHz of bandwidth. But that can be OK: 50 BPS is adequate for GPS, and I would imagine would be more than adequate for ETI transmissions.

I cannot over-emphasize the noise-suppression characteristics of pseudo-random encoding. The GPS encoding allows the signal level at the receiving antenna to be 30dB below the thermal noise figure at the antenna. That is, if you used a simple amplifier hooked up to a GPS antenna, all you would see, if you looked at the power level, would be pure, unadulterated thermal noise. Yet, there's a signal in there, and its 30dB down. To the un-initiated, this sounds impossible, a contradiction, free energy. But all GPS antennas operate in exactly this way, and the system was designed to work this way on purpose. There is an easy calculation one can perform: Computing the thermal noise figure for a transistor at room temperature for a 5MHz bandwidth. Its about -105dBm. You can look up the SIS (signal-in-space) figures from the GPS spec, ICD-200C, and see that its about -130 dB. The fact that such a weak signal is receivable is an offshoot of the Shannon-Hartley theorem, which relates channel capacity to bandwidth in the presence of noise.

It is in fact this property of being under the thermal noise floor that makes PRN modulation such an appealing thing for spy agencies. If a spy transmits a signal using this technique, and the counter-spy does not know the secret encoding key, the signal cannot be received and decoded. (Well, it can in, theory, but in practice it becomes very very hard). The encoding keys can be arbitrarily long and cryptographically secure, preventing the counter-spies from detecting the transmission. Indeed, a variant of this is used by the military to encode the GPS P/Y signal, preventing non-military users from getting the more accurate positions that the system is capable of providing.

It is outside of the scope of this essay to explain how PRN encoding works; it is carefully detailed in textbooks and papers. However, a brief description is in order. One starts by choosing a 'good, random' string of bits to act as the encryption key. In the case of GPS, this key is 1023 bits long. The key has nearly equal numbers of ones and zeros in it, and the distribution of pairs of ones, pairs of zeros, etc. falls off rapidly, as it would in a random (white-noise) spectrum. Each GPS satellite has its own unique key; the keys are published and well known and serve to identify the transmitting satellite. The signal is transmitted by repeating this key over and over. For GPS, each bit is slightly less than a microsecond, so that the pattern repeats every millisecond exactly, for a 1.023 MHz 'carrier' (which is then used to PSK the 1.575GHz radio signal). To modulate the 'carrier', the ones and zeros in the pattern are inverted. For GPS, this is done every 20 repeats (for a 50 BPS signal) and in WAAS, every 2 repeats. To find a signal, one searches for the particular pattern of bits, by convoluting the received radio signal with the known key of a given GPS satellite. The integration represented by the convolution will have the effect of smoothing out all noise that is not an exact match for transmitted pattern. The fact that the integration takes place over 1023 bits is what gives the 30 dB process gain (although even more gain can be gotten by integrating over the 20 millisecond bit period; but this is descending into the details.) Note, however, how this technique 'wastes bandwidth': a 1.023MHz carrier is used to encode a 50 (500 for WAAS) bit-per-second signal. If this were plain-old AM or FM radio, one would need only 50 Hz to transmit a 50BPS signal. Three orders of magnitude of bandwidth use are traded for three orders of magnitude of signal-to-noise suppression. It is the use of a 'pseudo-random' key that provides the conversion of bandwidth to SNR process gain.

ETI are well aware of the generic theory of PRN-based PSK encoding schemes. No doubt, they know of some we do not. It is important to stress that PRN encoding is a very 'intellectually natural' way to encode a signal, much as AM radio seems 'obvious' and 'natural' to the beginning radio student. There is nothing about PRN schemes that smells of 'human nature', something bizarre or freaky that only humans want, need or would invent. The cryptographic theory behind PRN is a natural subject in mathematics, and finds pervasive application in computing. It would be as natural to ETI as it is for us.

Of course, if an ETI was broadcasting a signal that it wanted to be heard, it would be pointless to use an excessively long encoding key, since doing so would make it very hard for us to detect the signal. But this is exactly where things get interesting. Short keys are much easier to detect, but offer less noise-suppression. In fact, the process gain is just the logarithm of the length of the repeating pseudo-random noise sequence. GPS uses a 1023-bit long PRN sequence, which is what provides the 30dB of process gain. But long keys are hard to detect. That's because the detection algorithm requires a convolution of the bit-sequence with the radio signal. The convolution will yield zero (or rather, a Gaussian noise distribution) unless the bit-sequence is within about half-a-bit of alignment with the signal, at which point the convolution strength jumps dramatically. Thus, to pick out a 1023-bit pattern, one must go through three steps:

Know what pattern to use, to look for. There is a fairly limited number of 1023-bit patterns that have the desirable noise properties; GPS uses 37 of these, plus another 20 for WAAS; some more are reserved for EGNOS. To find a GPS satellite, you try each pattern in turn.
One must try every correlation possibility: one must compute the convolution for each of 1023 shifts, and preferably two times that, to pick something out.
Because the GPS satellites are moving, one must try a variety of different Doppler shifts.

As can be seen, this is a significant computational problem. For GPS, this processing is done by hardware, in custom chips that can perform the needed computations in real-time. To use this technique for ETI signals would be even more complex. There would be far more PRN sequences to try, since we do not know what the sequence length is. One might expect ETI to try a million-bit long key, to provide +60dB SNR gain. We also don't know what frequency the ETI would be broadcasting at, and so need to perform a frequency search (analogous to the GPS Doppler search). Extra-terrestrial sources could also be accelerating at significant rates, and so a search through different chirp rates is required. So the amount of computing power needed to detect a signal is absolutely enormous.

But all this is doable, especially if you have the worlds largest supercomputer. Put yourself in the shoes of an advanced civilization that would be capable of mounting a project to transmit a strong signal. Presumably, they would be technologically advanced by many thousands of years over our own. Presumably, they might still know Moore's law to be true (or not). If we imagine the computing resources that humanity may have in 30 or 100 years, searching for a PRN needle in a thermal-noise haystack might be entirely reasonable. We know this, and, importantly, the ETI know this too. They would use this technology to provide process gain of many orders of magnitude to cut through the clutter of natural phenomena, and to cut through terrestrial interference. They know that we posses (or will soon posses) the computing power needed to find the signal. They know that it is far, far cheaper and easier to gain orders of magnitude of computing power than to build radio dishes that are orders of magnitude larger (or to build radio transmitters that are orders of magnitude more powerful. ETI may not need to build a terawatt transmitter, far less may suffice. They know this, and we know this, and thus it is natural that they would solve the problem through computing brute-force rather than through amplifier brute force. They know that computing brute-force will be a lot cheaper for us than radio-reception brute force is: they'll choose the economically viable solution.

I should also point out that once one has detected a signal, once one has locked onto a signal, once one knows the key, tracking it is easy. Once one knows the key, the carrier frequency, and the phase offset, tracking is nearly trivial. The only hard part is to try all of the different possibilities and combinations. The only hard part is the search: the actual reception, once the signal has been found, is easy. This is really the key concept being touted here: The search of PRN signals is really hard, because of the combinatorial explosion. However, the use of PRN makes possible extremely long integration periods and thus a tremendous amount of noise suppression. This, in turn, makes it possible for the transmitting civilization to use relatively low power and even (gasp!) an omni-directional antenna! It overcomes the question of whether the transmitting antenna is aimed at us, of whether we were looking while they were aiming at us.

I should point out that not all astronomical noise is 'white noise'. As the SET@Home project ably shows, there are vast amounts of pseudo-CW noise. In the SETI@Home terminology, a possible detection of a narrow-band continuous wave signal is a 'Gaussian'. They have found many, many transient Gaussians. I call them transient, because when one looks again at the same patch of sky, one almost always doesn't see them again. The origin of these 'Gaussians' is unknown. If they have a purely statistical distribution, then they are just the normal tail-end of statistically distributed noise. If the distribution of these 'false positives' is beyond the expected random distribution, then one has to look for a serious explanation. Besides ETI, they may be from astronomical sources (novas, neutron stars, etc.) or they may be of a particularly subtle form of terrestrial interference. From this experience, I think it might be appropriate to conclude that the cosmos is littered with naturally occurring, narrow-band, CW 'noise', giving searchers such as SETI@Home lots of false-positive hits. By contrast, a PRN-modulated signal would cut through this clutter; it would almost certainly not be naturally occurring. While one might be able to imagine a natural process that radiates narrow-band CW, it is much harder to imagine one that radiates PRN. Such a natural source would have to be governed by some sort of chaotic attractor that happens to have the signature of a fairly complex generating boolean polynomial. It seems unlikely. So this is the point: PRN can cut through not only white-noise clutter, but also through CW clutter. With the exception of GPS and spy-agency use, it should also cut through virtually all man-made interference.

Don't be mislead by the above paragraph. The search for PRN signals will also have a combinatorially large explosion of 'false-positives'. Because white noise is uncorrelated in the time domain, by exploring a combinatorially large space of possible signals, one will find a huge number of combinations where the white noise just happened to add instead of cancelling. The search for PRN is very hard. It's 'good' properties emerge only after the signal is found.

Idea:Quantum finite automata have the interesting property of being able to recognize a "language" such as a PRN signal. Of course, this is a property shared with standard classical discrete finite automata, which is how PRN's are normally recognized in actual signal processing code. However, unlike a standard classical finite automaton, the quantum finite automaton can track all possible starting sequences in parallel. This can avoid a huge part of the computational difficulty. Furthermore, the quantum automata are potentially simple and fast: there is a hope that standard optical bench setups might suffice for building the needed quantum finite state machine. Of course, this has never been done, but doesn't seem out of reach.

Finally, one should note that ETI use of PRN sequences easily explains the negative results seen so far by other SETI programs (e.g. the Harvard BETA Project). If a carrier were PSK modulated at a chipping rate of a few megahertz (for example), then the 1/2 second integration time used by the BETA project spectrometers would completely wipe out the signal. To detect a PRN signal, one needs not only to perform quadrature over long periods of time, but one needs to perform it while superimposing the same exact PRN sequence as the transmitter. Any other sequence, or a mismatch of the sequence, just wipes out the signal.

Thus, based on these arguments, I believe that it is of utmost importance to begin attempts to search for these types of signals. The search could be considerably more complex, and take much much longer than searches made so far. But in a different sense, it seems to have a much, much higher probability of success. It is far more natural. I would love to participate in such an effort.

Footnotes

The GPS Signal-in-Space strength is below the thermal noise threshold of a GPS antenna only if one is using an omni-directional antenna, as is needed to receive the full constellation of satellites. If instead, one uses a dish aimed at a satellite, together with a narrow-band receiver, then the gain of using a dish will pull the raw modulated energy out of the noise, and make it quite visible in FFT graphs/spectrum analyzers. Remember: thermal noise in a narrow-band receiver is much, much less than in a wideband receiver. To actually use GPS, one must use a wideband receiver.
It is possible to detect very very weak CW signals by integrating over very very long periods of time. The longer one integrates, the more noise suppression one achieves. This is the 'basic science' that allows weak signals to be detected, and underlies the SETI@Home search. Because integrating a PRN signal is a lot like integrating a CW signal, one might naively think that the use of PRN modulation provides absolutely no advantage over a pure CW signal. Naively, PRN seems to only make detection harder, not easier. Naively, one would be right, if one could assume that all noise sources were pure, non-correlated white noise. But noise sources are not pure white noise. There is lots of man-made clutter that is very non-random, non-white-noise. A lot of stellar noise is likely to be shot noise, which has a non-trivial autocorrelation. The use of PRN provides an additional filtering that allows this clutter to be eliminated, to eliminate all the false-positives that a narrow-band search is likely to lock onto.
The actual "best" communications encoding one could use for intra-galactic communications should be chosen in the light of a good model of galactic and local noise. In statistics, 'Bertrand's Paradox' illustrates that one cannot make meaningful statistical statements without having an underlying model of the physical processes. This applies to galactic noise as well as lines and circles. By having a good model, one can then devise a good code that can effectively suppress that noise. It is reasonable to assume that ETI has such a model, and that they are using an appropriate code. It is therefore appropriate to try to come up with the same model, and then divine the communications scheme. It is the conceit of this article the PRN-based phase- or frequency-shift keying is that scheme, although this is a leap.

How Hard Is This Really? Ballpark Estimates

Signal Strengths

What is a reasonable signal strength that we could expect to receive? If a modern-day entrepreneur had about a million dollars to spare, he/she might use a megawatt transmitter, modulate it with a signal of their choice, and use a modest-sized (a few meters) dish to beam the signal into space, all this with currently available, more-or-less off-the-shelf technology. If that entrepreneur were about 100 light-years away, the received signal strength back at earth would be in the ballpark of -300dBm. This is a very very weak signal, and would be about 200dB below the thermal noise in some small, home-made antenna here on earth. 200dB is 20 orders of magnitude. A PRN pattern providing 20 orders of magnitude of gain would have to have a very long length, repeating only every zillions of years, and would have a very poor data encoding rate: only one bit every zillion years. Never mind that the channel characteristics (total bandwidth, galactic non-white noise) probably wouldn't support this kind of process gain.

However, this 200dB gap can be bridged: if the transmitter were more powerful, if the transmitting antenna were more focused, if the transmitter were closer, and if the receiving antenna was larger and used modern space-communications electronics, then the gap between thermal noise in the receiving antenna and the received signal could drop to some 60 dB, which could be easily achievable with PRN sequences in the million-chip range, and a communications data rate of about 1 bit per second. To find which million-chip sequence was being used, and at what carrier frequency, is conceivable using todays technology. In fact, I believe that one could use technology that is available today, purchased with entirely reasonable civilian, non-governmental budgets, and communicate effectively with the nearest stars, were there anybody there to communicate with. To back this claim, more accurate estimates of the various gains & losses are needed.

The above discussion begs the question of why anyone would transmit such a signal towards earth. There are two important classes of transmitters: those who intentionally transmit towards us, and those whose signal accidentally just happens to splash onto the earth. The intentional transmitters are presumably those who have made a sky survey, identified the nearest, most-likely habitable solar systems, and are beaming messages in such a fashion as to make it easy for us to detect them. The second class would be those who are using this technique to communicate with someone else, and the earth just happens to be grazed by their signal. We should presume that such a signal would be intentionally difficult for us to receive and decode, and would be encrypted. An interesting tell-tale would be, though, another signal coming from the opposite direction of the sky!

Needed Estimates:

Thermal noise is proportional to receiver bandwidth, but the receiver bandwidth must equal or be better than the transmitter chipping rate. The chipping rate is in turn determined by the desired communications data rate, and the desired PRN length. The PRN length in turn determines how much thermal noise can be overcome. We need a graph of effective gain over thermal noise as a function of the communications rate and the PRN sequence length. (Answer: its flat, explain why).
For a given PRN length, how many different sequences are there that would have good communications properties (i.e. have good noise distributions)? Of these, how many could be generated with short boolean polynomial generators? These numbers determine how hard the search will be: We need to try the various different polynomials, and we need to try each different chip phase. In other words, we need a graph of search difficulty as a function of PRN sequence length, given that the actual PRN sequence is unknown. (Note: the only 'interesting' PRN lengths are those with N being a large prime, as these are the ones that spread out the spectrum the most. By contrast, if N is a product of small primes, then many/most spectral lines are absent.)
What is the range of frequencies we want to search? The obvious range is the 'water hole', 1.4GHz to 1.7 GHz. We need to try to fit each PRN pattern at each frequency band. The size of a frequency band is given by the chipping rate. Since we don't know the chipping rate, we would need to try many of these, thus searching many overlapping bands. What is a reasonable search algorithm through these overlapping bands? (A 'reasonable' algorithm would detect at least some energy even if it was not dead-on in chip rate or carrier frequency). We need a graph showing search complexity as a function of chipping rate.
Distant sources are likely to be receding with considerable (apparent) accelerations, and thus its likely that the received signal will be 'chirped'. Find the max astronomical chirp rate. Discuss the search strategy in face of chirp rate. Note that GPS avoids the chirp-rate question by using fairly short integration periods, and I & Q signals to feed a PLL which can lock onto and track changing carrier frequencies. Define the longest possible integration time before chirp rate become significant.
The number of false-positives increases in direct proportion to the number of searches made. This is because of the uncorrelated nature of white noise. If one integrates for a long time at one frequency, on average the noise will cancel out. But if one integrates a long time with a million different possible phase combinations at that frequency, one will have almost a million cases where the noise cancelled out on average, and a handful where, statistically, the noise failed to cancel out. These are the false positives. We need a formula that expresses the statistical distribution of the false positives. This is just the formula for false positives for integrating a noisy CW signal, times the combinatorial explosion of search possibilities. In short, the use of PRN modulation does not make searching easier; the number of false positives explodes with the same combinatorial vengeance as the search space.
We need a graph that puts the above together: the search complexity as a function of desired chipping rate and communications rate (minimizing noise/maximizing process gain).
Next, we need to work backwards from the above data: assuming a given search complexity, how far away, and how powerful, would the transmitter need to be, for us to be able to receive it & find it?
Practical considerations. Antennas, pre-amplifiers and down-converters all affect operation and provide gain in various ways. These need to be characterized ... Gain from use of modern-day, typical 2-meter or 10-meter dish antennas. Noise figure of preamp. Gain from down-conversion .. what is it -- 30 dB for 1.5 GHz to 150 MHz down-conversion. etc. etc.

Note that a modified fast-Fourier technique can be used to quickly search multiple PSK combinations. This insight implies that it searching for FSK (frequency shift-keyed) signal is roughly as hard as PSK, since the FFT provides the energy bins which must be summed in every possible combination to find the FSK pattern. Note that the large number of combinations makes this seem like an NP problem, but I think a simple hill-climbing search can be pretty good. First one looks for a 'hill' on a fairly short integration period. One then needs only to compare the 'hill' to where the next repeat of the PRN pattern. If there's no hill there, then nothing, and no signal was detected. If there is one, then one needs to examine a few nearest neighbors as well. This is no longer an NP search, this is a polynomial-time search. Right? Or did I confuse myself?

Noise and Antenna Factoids

Some quick points and references to facts about antenna gain and noise sources. These help establish that for a receiver with a 300MHz bandwidth, the practical noise floor for an antenna is about -100 dBm and that one can use antennas of about 50 dBi, that is, dishes of about 4 meters in diameter, before one trades thermal noise for galactic/CMB noise as the predominant noise source.

The 300MHz bandwidth is the approximate width of the water window. Arguments developed in later sections indicate that a PRN signal is likely to occupy the entire width of the window, thus motivating the use of this number for the noise calculations.

Background sky brightness in the water window is approx 1E-20 Watts / (meter² steradian Hz) and consists of roughly equal galactic synchrotron & cosmic 2.7K background. The synchrotron power varies by a factor of 100 from in-plane to pole views.
Sky Radio Brightness at 20, 40 and 80 GHz
Formula for blackbody radiation, for power impinging on a unit area, solid angle, frequency is P df dOmega dA = 2h (f³/c²) df dOmega dA / (e^hf/kT -1). For T=2.7K, f=1.5GHz we get hf/kT = 2.7e-2 and P = 1.8E-21 Watts/(Hz sr m²)
Overview of Blackbody Radiation, including simple derivation.
Black Body Radiation, (mirror) a detailed derivation from quantum principles.
The total amount of Johnson noise in the first antenna pre-amp transistor is given by Boltzmann's constant, which is k = 1.38 x 1e-23 Joules/Kelvin. At a room temperature of 290K, this corresponds to a thermal noise power of P = -173.98 dBm + 10 log₁₀ (frequency in Hz), where dBm refers to the decibel power level referenced to milliWatts. For an antenna held a liquid-nitrogen temperatures, the noise can held down to -179 dBm + ...
2.3 meter dish having gain of 38 dBi, beam width of 7.0 Degrees. (At what frequency? not 1.5 GHz!)
Agassiz Station 26 meter dish, effective aperture of 239 meters squared. (What is the dBi for this antenna?)
Assume a 300MHz bandwidth and a room-temperature receiver. The Johnson (thermal) noise energy at the input to the first antenna preamp is then -174 + 85 dBm = -89 dBm. This compares roughly to the background sky brightness for 2.3 meter dish which is about 4m² . 1E-2 ster . 300MHz . 1E-20 Watts/(m² sr Hz) = 1E-13 Watts = -100 dBm . All dishes will see about the background sky brightness, because the larger collecting area of a large dish is offset by its narrower beam. Both sources of noise vary linearly with the bandwidth. For small, cheap antennas, we can approximate the total noise by the thermal noise. For slightly better antennas (e.g. cooled), the dominant background becomes the CMB/Galactic background. The point being that both are roughly comparable in magnitude; and ETI would use the CMB/Galactic background as the noise floor for receiver calculations. Hmmmm ...

Equations & Constants

(Under construction). Definitions of variables and constants:

N = PRN Sequence Length
f = Chip Rate (number of PRN chips transmitted per second)
W = Carrier frequency (1.4 to 1.7 GHz)
R = Data transmission bitrate = f/N
B_chip = Bandwidth of the PRN signal = 4f (approximately)
A_r = Receiver aperture
t = Integration time
T_noise = Antenna temperature

Some simple formulas:

R = f/N. We assume that the PRN sequence will be modulated by the data at this rate, a rather natural rate for PRN modulation.
B = 4f (approximately). PRN PSK modulation splatters energy over a broad frequency band, this range encompasses most of the energy.
W_delta = search step size = (1/4) f/N = R/4. This is the optimal step size between searched frequencies. This assumes that the integration period is N/f which is the length of one data bit. Over the course of the integration, the pseudo-phase must less than 90 degrees (one-fourth of a full cycle), in order for the signal to not be wiped out. Ergo, the one-fourth in the step size.

Shannon-Hartley Theorem

The Shannon-Hartley theorem is a celebrated theorem that relates the theoretical maximal data rate that can be transmitted through a channel for a given bit-error rate (BER) and bandwidth B in the presence of Gaussian white noise. It does not specify the actual coding scheme needed to achieve this data rate and bit-error rate; it is merely a statement that an encoding technique exists. The first part of the theorem states that

C = B log₂ (1+Signal/Noise)

where C is the channel capacity, and B the bandwidth. The actual achievable data rate R must be less than C; to approach C while maintaining a low BER requires arbitrarily 'complex' encoding schemes. Rearranging this formula, we can see that using a very low data rate spread over a very large bandwidth can provide us with the ability to work with very low signal-to-noise ratios. In particular, for a 0.1Hz data rate spread over 400MHz of bandwidth, we can realize 90 dB of process gain; i.e. work with signal-to-noise ratios of 1.0e-9 of -90 dB.

The theorem also makes an important statement about minimum energy levels. The noise power N = BN₀ is a product of the receiver bandwidth times the noise density. The signal power S = R E_b is a product of the actual bit rate times the energy per bit. Substituting for S and N, and using the inequality R < C, we get an expression for the minimum energy per transmitted data bit:

(B/R) (2^R/B - 1) < E_b/N₀

For R << B that we are considering, the above can be expanded into

ln 2 (1+ (R/2B) ln 2) N₀ < E_b

which defines the minimum energy per transmitted data bit. If the energy per bit is less than this, there is no way, using any encoding scheme, to receive the data. This limit is called the Shannon Limit.

If you don't have a background in signalling theory, but know integrals and sine waves and noise, then you can understand the theorem by thinking about a sine wave added to white noise. To pull out the sine-wave signal, one must integrate/convolve with a sine wave. Think about how long one needs to integrate this combination before the integral of the sine-wave part equals/exceeds the integral of the noise part. Note that the inverse of this time period is the channel capacity.

Applying Shannon-Hartley to SETI

At the antenna feedhorn, N₀ = kT is just the Johnson noise density in the first stage of the preamp. The only way to overcome this value is to use a bigger dish to collect more signal energy out of the ether and direct it into the feedhorn. However, there is a limit to the size of the dish: Once the dish is large enough (about 3 or 4 meters or so), the dominant source of Gaussian/white noise is no longer the Johnson noise, but becomes the CMB and the Galactic synchrotron radiation. As reviewed above, the CMB is black-body thermal radiation, and its noise power is proportional to both the collecting area of the dish, and the solid-angle of the dish beamwidth. For parabolic dishes, the beamwidth is given by the diffraction limit as is inversely proportional to the dish area. Thus, increasing the size of the dish will not reduce the amount of CMB collected. The only way to move beyond the diffraction limit is to use a phased array/synthetic aperture interferometer setup, so that the solid angle becomes small, without overly increasing the size of the collecting area.

For a room-temperature feedhorn, ln2 N₀ = ln2 kT = 2.8E-21 Joules is the minimum bit energy. By comparison, the signal received from a one-megawatt transmitter driving an omni-directional antenna 100 light-years distant from us will be about 9E-32 Joules/(second meter²). To bridge this nearly eleven orders of magnitude, we need to hope that the ETI use a signalling rate of 0.01 Hz (for two orders of magnitude), we need a cryogenic feedhorn (for another one-n-a-half orders of magnitude), the hope that ETI is using a 10 megawatt transmitter (instead of one) and a directional antenna, and/or they're closer, and finally, a collecting area in the tens-of-thousands square meter range. Ugh. Only then can we expect the energy-per-bit to exceed the Shannon Limit imposed by thermal Johnson noise in the preamp. These limits and theoretical considerations apply equally to a simple, Morse-code-like CW modulation that traditional SETI searches look for, as it would to a PRN encoding scheme. PRN modulation does not provide a mechanism to get around the Shannon Limit.

A large collecting area can beef up the collected power to drown out the Johnson noise. However, roughly 10dB later, the Cosmic Microwave Background (CMB) kicks in, and simply increasing dish size does not change the amount of CMB received. The amount of extra CMB power received by the larger dish is exactly negated by the reduced CMB power due to the sharper focus of the dish. The dish area and the diffraction-limited solid angle vary in inverse proportion, and cancel out in the black-body noise formula. The total noise power is independent of the dish size. The Shannon Limit applies to all sources of noise in a communications channel: it applies to the CMB as well. The best way to minimize CMB in the communications channel is to make the solid angle as small as possible for a given collecting area size, using synthetic aperture/phased array interferometer techniques. A good way to understand this is to imagine building a traditional imaging radio astronomy telescope. Each pixel picks up a chunk of the CMB noise for that part of the sky. Making each pixel smaller (increasing the angular resolution) decreases the amount of sky viewed and thus decreases the CMB received in that pixel. Increasing the collecting area of the telescope makes each pixel brighter, increasing the amount of both signal (if any) and noise collected. If we now imagine that only one of the pixels contains a SETI source, and none of the others do, then we can throw away the other pixels (and their noise contribution), and focus on the one with a signal. (Of course we have to look at each pixel, first).

In an earlier section, we noted that CMB noise at this frequency (water window, 1.5 GHz) in a diffraction-limited dish is about 4E-22 Watts/Hz = 4E-22 Joules, independent of the dish size. The large dish needed to collect enough bit-energy to overcome Johnson noise should also be enough to overcome the CMB, more or less. Since one cannot build a huge dish (except at Aricebo), one must bolt together a bunch of small dishes. To keep the CMB noise from becoming additive, we need to hook up the dishes using a synthetic aperture technique so as to keep the beamwidth down. To provide a more accurate estimate, we need to look at how noise propagates through synthetic aperture electronics, including the down-conversion (digital and/or analog), Nyquist aliasing of the noise in the digitizer, and the fact that this needs to be broad-band to make the PRN detection work ... This requires some hefty work.

To recap: The Shannon-Hartley theorem shows that PRN trickery can serve to pluck a weak signal out of heavy noise. However, it only works up to a point; to go beyond that point, one needs to use antenna trickery to increase the amount of collected energy per bit, while using very narrow beam-widths to minimize noise coming from CMB and galactic sources.

ToDo: Determine if synchrotron radiation is important, and if so, how it affects the signalling theory (since its not (?) white noise). In particular, what is the autocorrelation function for galactic noise in the water window?

Shannon-Hartley and Amateur SETI Efforts

The above calculations, and in particular, the minimum energy per bit calculations, seem to imply that there is not a lot of hope for the amateur with a surplus TV satellite dish. The ETI would have to have a very powerful signal, and be very close. (And they would need to be transmitting CW, which I don't think is likely).

However, there may be some interesting games that individuals/amateurs can play, using high-bandwidth Internet connections to build very large synthetic aperture scopes. The basic idea is that 50KHz of bandwidth can be digitized into an approximate 400 kbit/second digital stream, which can be piped across DSL/Cable Internet connections. These can then be combined in more-or-less realtime to provide high-resolution images. Accurate timestamps are needed so that the raw signals can be time-correlated to create the synthetic aperture. The GPS signal can provide a (relatively) cheap microsecond-accurate timestamp. I don't know if this timestamp is enough, or if calibrated atomic clocks are needed. Hmm.... Can one even do interferometry after down-conversion, or must it be done before? I would think so, but maybe one looses too much data after narrow-band filtering. It shouldn't be too hard to make this work, and it could be fun! Small dishes on opposite ends of a continent could potentially generate some rather dramatic high-resolution images of traditional radioastronomy sources!

One of the interesting aspects of such a stunt might be that most terrestrial interference sources are geographically localized, and thus will appear in only one or a few antennas. Thus, antennas scattered across a continent can provide a degree of interference immunity. Satellites, such as GPS, and esp. geostationary satellites, can still blanket a continent, unfortunately.

The biggest problem with arrays of small dishes is their poor performance in rejecting Johnson noise. For example, if 100 small dishes are aimed at the same signal, the total signal power will be 100 times larger than it is for one dish. Noise power from Gaussian white noise sums as the root-mean-square: the total noise power of an array of 100 dishes will be only 10 times that for a single dish. Thus, the S/N ratio for 100 dishes is 100/10 = 10 times better than for a single dish. By comparison, one can achieve the same gain in S/N by increasing the diameter of a dish by sqrt(10) = 3.2. That is, a single 6-meter dish has better S/N characteristics than 100 2-meter dishes. In terms of total cost (including operating cost), building & running a single 6-meter dish is cheaper than 100 2-meter dishes.

References

Channel Capacity, a simple derivation (mirror).

Fourier Spectrum of a PSK Signal

(This section reviews a phase-shift signal and its Fourier transform. The math here is easy; its basic discrete Fourier, its just hampered by lack of Greek letters in HTML).

Assume a pseudo-random binary bit sequence of length N is given by b_k for integer k in [0,N-1]. Use this to phase-shift modulate a carrier of frequency W. The phase shift is p, the length of each bit is T. Then the

f(t) = Sum_k=0^N-1 S_k(t) sin (Wt + S_k(t) p b_k)

is the signal defined on the time interval [0,NT]. We used the notation S_k(t) to be a step function that is one on the interval t=[kT,(k+1)T] and is zero otherwise. To make f(t) quasi-periodic with period NT, one chooses a frequency W such that NTW = 2pi m for some integer m. The PRN (pseudo-random noise bit sequence) then repeats with period NT. This signal, as described above, does not carry any data; to modulate the signal with data, one inverts all of the bits to denote a one, or leaves them to denote a zero. That is, the maximum possible data rate is 1/NT, whereas the 'chipping' rate is 1/T.

If the data consists of all zero's, then f(t) is absolutely periodic, and a discrete Fourier transform is appropriate to describe the signal. The signal is given by

f(t) = Sum_n=0^inf c_n cos (2 pi nt / NT) + s_n sin (2 pi nt / NT)

Solving for the Fourier coefficients c_n and s_n one gets

c_n = 1/ (2 pi (m+n)) Sum_k=0^N-1 [ cos [2pi (m+n) k/N + p b_k] - cos [2pi (m+n) (k+1)/N + p b_k] ] + 1/ (2 pi (m-n)) Sum_k=0^N-1 [ cos [2pi (m-n) k/N + p b_k] - cos [2pi (m-n) (k+1)/N + p b_k] ]

and

s_n = 1/ (2 pi (m+n)) Sum_k=0^N-1 [ sin [2pi (m+n) k/N + p b_k] - sin [2pi (m+n) (k+1)/N + p b_k] ] - 1/ (2 pi (m-n)) Sum_k=0^N-1 [ sin [2pi (m-n) k/N + p b_k] - sin [2pi (m-n) (k+1)/N + p b_k] ]

Note that if N is the product of small primes, then it is very likely that adjacent terms will cancel each other out. If N is a large prime, then adjacent terms will almost certainly not cancel (unless N divides (m+n) or (m-n) which is unlikely). Overall, there is more energy spread over a broader bandwidth if N is prime, although it would be nice to back this claim with a proof.

From this formula, we can also see that most of the energy will be carried by about 4N coefficients centered about m=n. Thus, the longer the PRN sequence, the more Fourier coefficients need to be considered. From this, one can see that narrow-band interference will wreck maybe a few of the coefficients without spoiling the signal as a whole. In other words, the larger the N, the better the interference suppression.

Based on the above calculations, we can place some limits:

The maximum reasonable chip rate f=1/T is about 100 MHz. We conclude this by noting that most of the power in the signal is in the band of about 4f, and the width of the 'water window' is about 400MHz.
Scattering from turbulent galactic gasses will broaden a line by 0.01Hz to 0.1Hz. This seems to provide a natural limit to the spacing of spectral lines. Thus, the smallest reasonable spacing is given by 1/NT = 0.01Hz. Lines spaced more closely than this would seem to bleed energy into each and thus not provide any additional process gain.
Combining the above, we conclude the largest practical value for N is about 1E10. This provide a maximum of about 100 dB of process gain. Expressed as a power of 2, its 2^33 bits. This number is in the ballpark for todays computers.
Note that 1/NT is also the data rate. Thus, a natural data rate seems to be about 0.01 to 0.1 Hz.

Factiods

These factoids help support the above arguments

A 'perfect' (monochromatic) CW signal will be 'Doppler broadened' by scattering from turbulent interstellar gas. The smearing is approx 0.01Hz to 0.1Hz for the water window. (Need reference).

PRN Sequence Considerations

As we saw from the above, the only interesting PRN sequences are those whose length is a prime number. We also saw from bandwidth considerations that the largest practical N for the water window is about 1E10. How many prime numbers are there that are less than N=1E10? A rough approximate formula for the number of prime numbers less than N is N/log(N); so for N=1E10, Pi(1E10) = 1E10/23 = 5E8. In other words, slightly less than 500 million prime numbers need to be attempted.

For a given length N, how many different PRN sequences are there? The answer depends on how 'good' one wants the PRN sequence properties to be. A 'good' sequence will have the same Poisson distribution of same-bit lengths as white noise. A 'good' sequence will spread power evenly over all of the Fourier coefficients of of the signal. In fact, most random sequences will be 'good' sequences, and there are approximately 2^N of these. For N=1E10 this is insanely large; it is impossible to check all of these. Thus, we need to limit the number of sequences that might be searched for. To do so, we need to guess the kinds of sequences that ETI might choose to use.

An obvious set of candidates are those sequences that have small polynomial generators: i.e. those that can be generated by small shift registers whose various taps are XOR'ed together and fed back. Sequences of length N will typically have polynomial generators of degree log₂ N. Thus, for a bit sequence of approximate length N = 2³² we can work with a shift register of about 32 bits. There are 2³² ways to XOR the 32 taps together; however, most of these will fail to generate long sequences. I believe (this needs support/derivation) that about sqrt (2³² = 2¹⁶ polynomials are reasonable PRN generators. If this is correct, then there are approx 1E10 x sqrt(1E10) = 1E15 different PRN sequences that one might have to search for. This is starting to get to be a dauntingly large number. Don't forget that for any given sequence, one has to try each of the N different bit alignments, which implies 1E10 x 1E15 = 1E25 possibilities to search. Each of these in turn needs to be tried at many different frequencies and chirp rates. One might limit the later searches to chirp rates that assume that the transmitter has been corrected to appear stationary in the galactic frame of reference. However, one might also like to limit the number generators to attempt.

To this end, one might attempt to look for notable primes and generators. Are there notable, 'famous' primes in the vicinity that are 'preferable' in some way? e.g. 2³²+-1? Are there notable generators? e.g. generators that not only have long sequences and good PRN properties, but are also notable by their relationship to other famous problems, e.g. finite/sporadic groups, Fermat's Last Theorem, etc.?

Counting Primes provides some estimates for the number of prime numbers.
Chirp due to earth spin is 0.16 Hz/sec

Building a Receiver

Factoids

Since spread-spectrum is an inherently broadband phenomenon, most astronomy receivers are useless, as they are designed to be narrow-band.
LNBF -- Low Noise pre-amplifier, Block Frequency Down-converter is ideal. The point being that PSK signals survive down-conversion beautifully.
Noise figure of 0.4 dB in low-noise preamp Model 1691ULNA from Downeast Microwave, Inc.
Haystack SRT Small Radio Telescope Receiver Details
Antennas a Russian academic Journal.
Synthetic Aperture Theory
Synthetic Aperture Imaging with Arrays of Arbitrary Shape (abstract only)
The Society of Amateur Radio Astronomers (SARA)

Results & Support

SETI@Home graphs showing Gaussian fitting process and some results.

Draft of April 2003, Linas Vepstas
Spell checking and minor additonal URL links, August 2006.
Copyright (c) 2003 Linas Vepstas linas@linas.org

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included at the URL http://www.linas.org/fdl.html, the web page titled "GNU Free Documentation License".