Digital Sonar Experiment Part 1
For the last 10 years or so, I have wanted to build an underwater robot of some sort. I assume this is the result of reading too much Cussler, and doing a bit of SCUBA diving, but I can't seem to shake it. I've been following a number of projects, partitcularly OpenROV, with great interest. One of the challenges that does not seem to have been tackled much by hobbyists is an affordable underwater sonar. There are plenty of inexpensive sonar units for land and aerial robotics - the AR Drone makes good use of them for low altitude sensing - but there don't seem to be any for water use yet.
I figured it made the most sense to understand the algorithms and get them debugged in air first, since sound travels much faster in water, and there are issues with acoustic coupling and waterproofing in water that are not a problem in air.
Although I intend to build the unit with a microprocessor board eventually, I chose to explore what could be done with a sound card first, just for ease of programming. I found that it has been done before, successfully, and documented well by this gentleman. His page gave some very useful information about echo detection, introducing me to the idea of correlation functions to detect a short signal in another signal stream. With that information, I found an online digital signal processing (DSP) book, that discussed correlation functions and their uses.
I knew nothing of signal processing when I started this, I was surprised that it's not really difficult from an algorithm standpoint. It does involve a fair amount of computation. Essentially, an audio signal, once normalized, consists of a bunch of amplitude values between -1 and 1 at some sampling rate. For human audible frequencies, that's commonly 44100 samples per second. The reason for this is that for perfect reproduction of a signal, you must sample it at twice the frequency you are looking to reproduce. This is the Nyquist Rate. Therefore, 44100 hz should be able to perfectly reproduce the signal up to 22050 hz, which I figured was the limit of my laptop's audio hardware. A second's worth of audio is 44100 float values between -1 and 1.
The simplest way of performing a correlation function is to load one array with the reference signal you are looking for - the ping - and another array with the recorded results. Start with the recorded array at position 0, and multiply each value in the chirp with the next $chirplength samples from the recorded audio. Sum the products of each mutiplication (essentially calculating the area under the resulting curve) and record it into a results array at position 0. Shift to position 1 in the recorded audio array, and repeat, this time storing the results in position 1 of the results array. Repeat for all positions in the recorded audio array, up until you hit the spot where the chirp length would pass the end of the recorded audio array, and stop.
What you are doing is creating a sliding window, comparing the chirp to each chunk of audio in the recorded audio array. The output of the correlation is really interesting - when the audio is a very close match, you get a large positive spike in the results that correspond to a position in the recorded audio. That spike represents a copy of your chirp in the recorded audio. The first time you see it, it's the chirp your speaker sent. After that, it's echoes of that chirp off objects.
Knowing the position of each echo, you can compute the time that it took from when the primary pulse was sent with your sample rate. That's the time the sound was in flight. Divide by two for the round trip, compare against the speed of sound, and boom, you've got distance to your target. Neat, huh?
The correlation function works best on a chirp signal - audio that changes in frequency over time. The chirp must be very short, or the echo will arrive before the chirp finishes playing. I found that chirps in the 1.5 ms range gave best results for me - it results in a minimum range of 3-4 feet. I generated the chirp in Audacity, sweeping from 4000-10000 hz at full amplitude.
Here's a recorded slice of audio, showing the chirp, some echoes and background noise.
Here's the output of the correlation function:
Once you can do it for a single chirp, you can assign an intensity to the spikes, and graph them over many pulses. This gives you a fish finder type view - in this example, time goes from top to bottom, and distance from left to right. Note the noise on the far left side of the image - that's probably caused by the speaker continuing to "ring" for a few milliseconds after the signal stops.
I chose WinPython with the PyAudio library to test with, because it comes bundled with graphing libraries and other useful things. I used a USB combined microphone/speaker intended for chat, because I could move it around and it was pretty directional.
It works - here's a sample run showing the returns with the speaker/mic starting near the wall, and then moving slowly away, then back in, etc. You can see a white line that look like peaks and valleys corresponding to the distance, along with background noise and multipath echoes. The minimum distance was about 3 feet - the maximum was about 8.
The left side of the screen represents the location of the sensor - imagine it's attached to the hull of a boat. White and grey dots represent echoes at different distances. The white line that is produced as I bring the speaker/mic away from the wall gets farther to the right, indicating more time in flight, and longer distances. As I bring it closer to the wall, it gets closer to the left side of the screen. One way to visualize this - if the sensor was mounted on a boat, the line would show the depth of water over the bottom. You also get weaker returns for sounds that have bounced around on indirect paths, or schools of fish.
Video of operation:
Sound Card Sonar Experiment 1 from Jason Bowling on Vimeo.
1) Implement it on a nice fast micro, probably a Stellaris Launchpad.
2) Get it working in water with new transducers
3) If possible, at some point, maybe: Sidescan. :-)
Ideas for improvement? Did you find it useful? I'd appreciate a comment if so. Thanks!
import pyaudio import wave import struct import math import pylab import scipy import numpy import matplotlib import os import scipy.io.wavfile import threading import datetime import time from Tkinter import * class SonarDisplay: #Displays greyscale lines to visualize the computed echos from the sonar pings. Time and distance increase as you go from left to right. It's like a fishfinder rotated on its side. def __init__(self): self.SCREENHEIGHT = 1000 self.SCREENWIDTH = 1024 self.SCREENBLOCKSIZE = 2 master = Tk() self.w = Canvas(master, width=self.SCREENWIDTH, height=self.SCREENHEIGHT, background="black") self.w.pack() def getWidth(self): return self.SCREENWIDTH def getHeight(self): return self.SCREENHEIGHT def getBlockSize(self): return self.SCREENBLOCKSIZE def plotLine(self, result, y): x = 0 numJunkSamples = 250 intensityLowerThreshold = .26 #peaks lower than this don't get plotted, since they are probably just background noise #on my machine, the first couple hundred samples are super noisy, so I strip them off #these are the first samples immediately after the peak that results from the mic hearing the ping being sent #Until the echo is clear of these (about 3 feet in air) it is hard to pull out of the noise, so that is minimum range result = result[numJunkSamples:] limit = self.SCREENWIDTH / self.SCREENBLOCKSIZE if limit < len(result): limit = len(result) for a in range(0,len(result)): intensity = 0 if (result[a] > intensityLowerThreshold): intensity = result[a] * 255.0 if (intensity > 255): intensity = 255 if (intensity < 0): intensity = 0 rgb = intensity, intensity, intensity Hex = '#%02x%02x%02x' % rgb self.w.create_rectangle(x, y, x + self.SCREENBLOCKSIZE, y + self.SCREENBLOCKSIZE, fill=str(Hex), outline=str(Hex)) x = x + self.SCREENBLOCKSIZE self.w.update() class Sonar: #Mic initialization and audio recording code was taken from this example on StackOverflow: http://stackoverflow.com/questions/4160175/detect-tap-with-pyaudio-from-live-mic #Playback code based on Pyaudio documentation def callback(self, in_data, frame_count, time_info, status): data = self.wf.readframes(frame_count) return (data, pyaudio.paContinue) def __init__(self): self.FORMAT = pyaudio.paInt16 self.SHORT_NORMALIZE = (1.0/32768.0) CHANNELS = 2 self.RATE = 44100 INPUT_BLOCK_TIME = .20 self.INPUT_FRAMES_PER_BLOCK = int(self.RATE*INPUT_BLOCK_TIME) WAVFILE = "test.wav" print "Initializing sonar object..." #Load chirp wavefile self.wf = wave.open(WAVFILE, 'rb') #init pyaudio self.pa = pyaudio.PyAudio() #identify mic device self.device_index = None for i in range( self.pa.get_device_count() ): devinfo = self.pa.get_device_info_by_index(i) print( "Device %d: %s"%(i,devinfo["name"]) ) for keyword in ["mic","input"]: if keyword in devinfo["name"].lower(): print( "Found an input: device %d - %s"%(i,devinfo["name"]) ) self.device_index = 1 # I selected a specific mic - I needed the USB mic. You can select an input device from the list that prints. if self.device_index == None: print( "No preferred input found; using default input device." ) # open output stream using callback self.stream = self.pa.open(format=self.pa.get_format_from_width(self.wf.getsampwidth()), channels=self.wf.getnchannels(), rate=self.wf.getframerate(), output=True, stream_callback=self.callback) notNormalized =  self.chirp =  #read in chirp wav file to correlate against srate, notNormalized = scipy.io.wavfile.read(WAVFILE) for sample in notNormalized: # sample is a signed short in +/- 32768. # normalize it to 1.0 n = sample * self.SHORT_NORMALIZE self.chirp.append(n) def ping(self): #send ping of sound #set up input stream self.istream = self.pa.open( format = self.FORMAT, channels = 1, #The USB mic is only mono rate = self.RATE, input = True, input_device_index = self.device_index, frames_per_buffer = self.INPUT_FRAMES_PER_BLOCK) # start the stream self.stream.start_stream() # wait for stream to finish while self.stream.is_active(): pass self.stream.stop_stream() #reset wave file for next ping self.wf.rewind() def listen(self): #record a sort section of sound to record the returned echo self.samples =  try: block = self.istream.read(self.INPUT_FRAMES_PER_BLOCK) except (IOError) as e: # Something bad happened during recording print( "(%d) Error recording: %s"%(self.errorcount,e) ) count = len(block)/2 format = "%dh"%(count) shorts = struct.unpack( format, block ) for sample in shorts: # sample is a signed short in +/- 32768. # normalize it to 1.0 n = sample * self.SHORT_NORMALIZE self.samples.append(n) self.istream.close() #Uncomment these lines to graph the samples. Useful for debugging. #matplotlib.pyplot.plot(self.samples) #matplotlib.pyplot.show() def correlate(self): #perform correlation by multiplying the signal by the chirp, then shifting over one sample and doing it again. Highest peaks correspond to best matches of original signal. #Highest peak will be when the mic picks up the speaker sending the pings. Then secondary peaks represent echoes. #my audio system has a significant delay between when you send audio and when it plays. As such, we send the wav, start recording, and get a large number of #samples before we hear the ping. That just slows correlation down, so we remove it. This probably requires tuning between different systems. Safest, but slowest, is zero. junkThreshold = 5000 self.samples = self.samples[junkThreshold:] self.result =  for offset in range(0, len(self.samples)-len(self.chirp)): temp = 0 for a in range(0, len(self.chirp)): temp = temp + (self.chirp[a] * self.samples[a + offset]) self.result.append(temp) def clip(self): #highest peak is the primary pulse. We don't need the audio before that, or the chirp itself. Strip it + chirpLength off. Remaining highest peaks are echoes. largest = 0 peak1 = 0 for c in range(len(self.result)): if (self.result[c] > largest): largest = self.result[c] peak1 = c self.result = self.result[peak1:] return self.result #main control code #initialize sonar and display sonar = Sonar() display = SonarDisplay() screenHeight = display.getHeight() screenWidth = display.getWidth() screenBlockSize = display.getBlockSize() #size of each row in pixels #each ping results in an array of correlated values with peaks corresponding to the time that echoes were recieved. #send pings until we have reached the bottom of the display. An improvement would be to add scrolling. y = 0 for a in range(0,screenHeight-screenBlockSize,screenBlockSize): sonar.ping() sonar.listen() sonar.correlate() result = sonar.clip() display.plotLine(result, y) y = y + screenBlockSize