Sunday, December 22, 2013

New Project - Sonar experiments

Digital Sonar Experiment Part 1

For the last 10 years or so, I have wanted to build an underwater robot of some sort. I assume this is the result of reading too much Cussler, and doing a bit of SCUBA diving, but I can't seem to shake it. I've been following a number of projects, partitcularly OpenROV, with great interest. One of the challenges that does not seem to have been tackled much by hobbyists is an affordable underwater sonar. There are plenty of inexpensive sonar units for land and aerial robotics - the AR Drone makes good use of them for low altitude sensing - but there don't seem to be any for water use yet.

I figured it made the most sense to understand the algorithms and get them debugged in air first, since sound travels much faster in water, and there are issues with acoustic coupling and waterproofing in water that are not a problem in air.

Although I intend to build the unit with a microprocessor board eventually, I chose to explore what could be done with a sound card first, just for ease of programming. I found that it has been done before, successfully, and documented well by this gentleman. His page gave some very useful information about echo detection, introducing me to the idea of correlation functions to detect a short signal in another signal stream. With that information, I found an online digital signal processing (DSP) book, that discussed correlation functions and their uses.

Echo Detection

I knew nothing of signal processing when I started this, I was surprised that it's not really difficult from an algorithm standpoint. It does involve a fair amount of computation. Essentially, an audio signal, once normalized, consists of a bunch of amplitude values between -1 and 1 at some sampling rate. For human audible frequencies, that's commonly 44100 samples per second. The reason for this is that for perfect reproduction of a signal, you must sample it at twice the frequency you are looking to reproduce. This is the Nyquist Rate. Therefore, 44100 hz should be able to perfectly reproduce the signal up to 22050 hz, which I figured was the limit of my laptop's audio hardware. A second's worth of audio is 44100 float values between -1 and 1.

The simplest way of performing a correlation function is to load one array with the reference signal you are looking for - the ping - and another array with the recorded results. Start with the recorded array at position 0, and multiply each value in the chirp with the next $chirplength samples from the recorded audio. Sum the products of each mutiplication (essentially calculating the area under the resulting curve) and record it into a results array at position 0. Shift to position 1 in the recorded audio array, and repeat, this time storing the results in position 1 of the results array. Repeat for all positions in the recorded audio array, up until you hit the spot where the chirp length would pass the end of the recorded audio array, and stop.

What you are doing is creating a sliding window, comparing the chirp to each chunk of audio in the recorded audio array. The output of the correlation is really interesting - when the audio is a very close match, you get a large positive spike in the results that correspond to a position in the recorded audio. That spike represents a copy of your chirp in the recorded audio. The first time you see it, it's the chirp your speaker sent. After that, it's echoes of that chirp off objects.

Knowing the position of each echo, you can compute the time that it took from when the primary pulse was sent with your sample rate. That's the time the sound was in flight. Divide by two for the round trip, compare against the speed of sound, and boom, you've got distance to your target. Neat, huh?

The correlation function works best on a chirp signal - audio that changes in frequency over time. The chirp must be very short, or the echo will arrive before the chirp finishes playing. I found that chirps in the 1.5 ms range gave best results for me - it results in a minimum range of 3-4 feet. I generated the chirp in Audacity, sweeping from 4000-10000 hz at full amplitude.

Here's a recorded slice of audio, showing the chirp, some echoes and background noise.

Here's the output of the correlation function:

Once you can do it for a single chirp, you can assign an intensity to the spikes, and graph them over many pulses. This gives you a fish finder type view - in this example, time goes from top to bottom, and distance from left to right. Note the noise on the far left side of the image - that's probably caused by the speaker continuing to "ring" for a few milliseconds after the signal stops.

I chose WinPython with the PyAudio library to test with, because it comes bundled with graphing libraries and other useful things. I used a USB combined microphone/speaker intended for chat, because I could move it around and it was pretty directional.

It works - here's a sample run showing the returns with the speaker/mic starting near the wall, and then moving slowly away, then back in, etc. You can see a white line that look like peaks and valleys corresponding to the distance, along with background noise and multipath echoes. The minimum distance was about 3 feet - the maximum was about 8.

The left side of the screen represents the location of the sensor - imagine it's attached to the hull of a boat. White and grey dots represent echoes at different distances. The white line that is produced as I bring the speaker/mic away from the wall gets farther to the right, indicating more time in flight, and longer distances. As I bring it closer to the wall, it gets closer to the left side of the screen. One way to visualize this - if the sensor was mounted on a boat, the line would show the depth of water over the bottom. You also get weaker returns for sounds that have bounced around on indirect paths, or schools of fish.

Video of operation:

Sound Card Sonar Experiment 1 from Jason Bowling on Vimeo.

Next steps:

1) Implement it on a nice fast micro, probably a Stellaris Launchpad.
2) Get it working in water with new transducers
3) If possible, at some point, maybe: Sidescan. :-)

Ideas for improvement? Did you find it useful? I'd appreciate a comment if so. Thanks!

Code follows:

import pyaudio
import wave
import struct
import math
import pylab
import scipy
import numpy
import matplotlib
import os
import threading
import datetime
import time
from Tkinter import *

class SonarDisplay:
     #Displays greyscale lines to visualize the computed echos from the sonar pings. Time and distance increase as you go from left to right. It's like a fishfinder rotated on its side.

     def __init__(self):
        self.SCREENHEIGHT = 1000
        self.SCREENWIDTH = 1024
        self.SCREENBLOCKSIZE = 2
        master = Tk()
        self.w = Canvas(master, width=self.SCREENWIDTH, height=self.SCREENHEIGHT, background="black")

     def getWidth(self):
       return self.SCREENWIDTH

     def getHeight(self):
       return self.SCREENHEIGHT

     def getBlockSize(self):
       return self.SCREENBLOCKSIZE

     def plotLine(self, result, y):
        x = 0
        numJunkSamples = 250
        intensityLowerThreshold = .26 #peaks lower than this don't get plotted, since they are probably just background noise
        #on my machine, the first couple hundred samples are super noisy, so I strip them off
        #these are the first samples immediately after the peak that results from the mic hearing the ping being sent
        #Until the echo is clear of these (about 3 feet in air) it is hard to pull out of the noise, so that is minimum range
        result = result[numJunkSamples:]
        limit = self.SCREENWIDTH / self.SCREENBLOCKSIZE
        if limit < len(result):
          limit = len(result) 
        for a in range(0,len(result)):
          intensity = 0
          if (result[a] > intensityLowerThreshold):
             intensity = result[a] * 255.0
          if (intensity > 255):
               intensity = 255
          if (intensity < 0):
               intensity = 0
          rgb = intensity, intensity, intensity
          Hex = '#%02x%02x%02x' % rgb
          self.w.create_rectangle(x, y, x + self.SCREENBLOCKSIZE, y + self.SCREENBLOCKSIZE, fill=str(Hex), outline=str(Hex))
          x = x + self.SCREENBLOCKSIZE    

class Sonar:

     #Mic initialization and audio recording code was taken from this example on StackOverflow:
     #Playback code based on Pyaudio documentation

     def callback(self, in_data, frame_count, time_info, status):
       data =
       return (data, pyaudio.paContinue)

     def __init__(self):
        self.FORMAT = pyaudio.paInt16 
        self.SHORT_NORMALIZE = (1.0/32768.0)
        CHANNELS = 2
        self.RATE = 44100  
        INPUT_BLOCK_TIME = .20
        WAVFILE = "test.wav"

        print "Initializing sonar object..."
        #Load chirp wavefile =, 'rb')        
        #init pyaudio = pyaudio.PyAudio()

        #identify mic device
        self.device_index = None            
        for i in range( ):     
           devinfo =   
           print( "Device %d: %s"%(i,devinfo["name"]) )

        for keyword in ["mic","input"]:
           if keyword in devinfo["name"].lower():
              print( "Found an input: device %d - %s"%(i,devinfo["name"]) )
              self.device_index = 1    # I selected a specific mic - I needed the USB mic. You can select an input device from the list that prints.
        if self.device_index == None:
           print( "No preferred input found; using default input device." )

        # open output stream using callback =,
        notNormalized = []
        self.chirp = []

        #read in chirp wav file to correlate against
        srate, notNormalized =
        for sample in notNormalized:
           # sample is a signed short in +/- 32768. 
           # normalize it to 1.0
           n = sample * self.SHORT_NORMALIZE

     def ping(self):
        #send ping of sound

        #set up input stream
        self.istream =   format = self.FORMAT,
                            channels = 1,  #The USB mic is only mono
                            rate = self.RATE,
                            input = True,
                            input_device_index = self.device_index,
                            frames_per_buffer = self.INPUT_FRAMES_PER_BLOCK)

        # start the stream
        # wait for stream to finish 
        #reset wave file for next ping

     def listen(self):
       #record a sort section of sound to record the returned echo

       self.samples = []
            block =
       except (IOError) as e:
            # Something bad happened during recording
            print( "(%d) Error recording: %s"%(self.errorcount,e) )
       count = len(block)/2
       format = "%dh"%(count)
       shorts = struct.unpack( format, block )
       for sample in shorts:
           # sample is a signed short in +/- 32768. 
           # normalize it to 1.0
           n = sample * self.SHORT_NORMALIZE

       #Uncomment these lines to graph the samples. Useful for debugging.

     def correlate(self):
    #perform correlation by multiplying the signal by the chirp, then shifting over one sample and doing it again. Highest peaks correspond to best matches of original signal.
    #Highest peak will be when the mic picks up the speaker sending the pings. Then secondary peaks represent echoes.
       #my audio system has a significant delay between when you send audio and when it plays. As such, we send the wav, start recording, and get a large number of
       #samples before we hear the ping. That just slows correlation down, so we remove it. This probably requires tuning between different systems. Safest, but slowest, is zero.       

       junkThreshold = 5000  
       self.samples = self.samples[junkThreshold:]
       self.result = []

       for offset in range(0, len(self.samples)-len(self.chirp)):
           temp = 0
           for a in range(0, len(self.chirp)):
               temp = temp + (self.chirp[a] * self.samples[a + offset])

     def clip(self):
       #highest peak is the primary pulse. We don't need the audio before that, or the chirp itself. Strip it + chirpLength off. Remaining highest peaks are echoes.
       largest = 0
       peak1 = 0        
       for c in range(len(self.result)):
            if (self.result[c] > largest):
               largest = self.result[c]
               peak1 = c
       self.result = self.result[peak1:]
       return self.result

#main control code

#initialize sonar and display  
sonar = Sonar()
display = SonarDisplay()

screenHeight = display.getHeight()
screenWidth = display.getWidth()
screenBlockSize = display.getBlockSize()   #size of each row in pixels

#each ping results in an array of correlated values with peaks corresponding to the time that echoes were recieved.
#send pings until we have reached the bottom of the display. An improvement would be to add scrolling.
y = 0

for a in range(0,screenHeight-screenBlockSize,screenBlockSize):
 result = sonar.clip()     
 display.plotLine(result, y)  
 y = y + screenBlockSize


  1. what if you used four microphones instead of one, and you read the mic data at very short intervals and in each interval you used the slight delays between the mics picking the signal up to compute a 3d location. the intervals would be even shorther than the click, so you could gather many 3d points per click.

  2. Hi. Nice piece of work. One thing: You must sample at greater-than the Nyquist rate. If you look at a plot of a sin(t), you will see that sampling at exactly the Nyquist rate will give you a constant value. Output will be zero if the sampling and signal have no phase difference. This is a common error, probably because it is stated incorrectly in so many text books. Wikepedia has it right You need to band-limit to the Nyquist rate and sample at greater-than. A lot greater-than unless you want a slowly varying result. 8 times faster is a good place to start.

    Take a look at synthetic aperture methods. You can get resolution that is independent of range - with a lot more computing. If you find an affordable way to chirp in water with reasonable power, I would love to hear about it. I have been trying various things for years.

  3. John: That's a neat idea. I know you can do similar things with radio for direction finding... I would think it could be made to work with sounds too.

  4. Mr. Springer: Thanks very much for the clarification, and the pointer to the synthetic aperture methods, I'll do some reading on both to understand them better. I would be interested in learning what has and has not worked for your in your experiments - if you have some time and wouldn't mind discussing it, feel free to drop me an email. I'm thinking that the availability of cheap computing hardware (like the Launchpad and it's kin) should open up new possibilities for hobbyists that would have been expensive or bulky before now.

    1. I can not find how to extract your email from the blog or profile. -Springer

    2. I am kb8rnu at Google's standard mail domain. Forgive the obfuscation, trying to avoid the spam. :-)

  5. Nice project! I ran into a few problems when I tried to run your code at first but they were all solved. First I had to generate the chirp file, but you had nice instructions of how to do that. I could only choose whole milliseconds as duration though. I also had a few runtime problems, it turned out I had an old version of PyAudio.

    To other people out there: make sure you have at least PyAudio v0.2.7 installed, the debian repos only have v0.2.4, but v0.2.7 can be downloaded as a .deb file at

  6. Hey, thanks for the feedback! Could you provide a little information about your setup? I'm curious what you were using for a mic and speaker, as well as which OS you were using if you don't mind sharing.

    By the way, you can get fractions of a millisecond on your chirp in Audacity by selecting "samples" as the unit when generating the chirp rather than time.

  7. This comment has been removed by the author.

  8. Sorry, on re-reading I understand you're running Debian.

  9. I tried to run this code on a yoga 14 laptop under ubuntu. It wasn't playing the audio. I learned that very short audio files may not play due to buffering...
    I added an extra 0.5ms of silence to the chirp file, and it plays fine now.

    1. Ok... that solution introduces cascades of new problems. I'm not sure how to get this to work.