For many people in New Zealand our native birdsong is drowned out by the sounds of modern life. Whilst our cities are unavoidably noisy places, many rural areas also experience man-made noise from road traffic, agriculture and other activities.
The OurTrees forest is located in a remote part of the West Coast. The unsealed road through our land has very little vehicle traffic. Our bird recorders aim to capture clean recordings of our Beech forest environment and the birds who live there.
In this post I will:
- describe our prototype bird recorder built with open-source software/hardware
- illustrate the audio processing used to pull individual bird calls out of the raw recordings
- list the open-source audio processing/editing software used
- review some potential applications of this technology
Our bird recorder is designed to sit in the forest unattended for up to 6 weeks at a time. It will wake up 4 times per day (dawn, noon, dusk, midnight) to record audio via a 4-channel microphone array. After recording for 20 minutes it goes to sleep until the next scheduled recording time.
A webcam image is captured at the start of each recording. This gives us an idea of the conditions (sun, mist, rain, night/day etc) at time of recording.
As our forest is off-grid with no cellphone or internet access the device simply records and stores the audio for later collection.
After some experimentation with recording formats etc. we are currently storing the audio as 48kHz uncompressed stereo WAV files. These are of sufficient quality to support the post-processing required to extract birdcalls from the raw recording.
For more details on testing of our bird recorder can be found in our post Field Testing – Bird Recorder
Raw audio files are processed and individual bird calls are extracted into audio clips using the Audacity sound editor
Each Clip is then processed with additional noise reduction and filtering to improve the signal-to-noise ratio and clarity of the bird call before adding it to the clip database.
Clip creation is currently a manual process. The goal is to assemble a clip database that can be used to bootstrap training of a Machine Learning model.
Once trained our Machine Learning Model will count and classify birdcalls in bulk audio files captured by our bird recorders.
Click the play button below. You will hear that the birdsong is at a very low level in the raw audio file and is barely audible.
Now we use Audacity to clean and boost the raw audio. The effects are visible in both the Waveform and Spectrogram views of the clip as we apply the Normalization and Noise Reduction steps.
After processing the clip no longer sounds 100% natural however the cadence and frequency range of the processed clip produces a spectrogram image suitable for input to a machine-learning application.
Click play below to hear the end result.
Under ideal conditions (fine weather, no wind) the level of background noise is low making it easier to extract and process the bird calls. On other occasions however, we encounter challenging audio to work with. This may contain:
- Rain – the drops hitting tree leaves and dripping onto the forest floor create a significant amount of background noise. Drops that hit the microphone result in volume spikes which largely negate the Normalization step when processing the audio.
- Wind – leaves rustling in the breeze create fairly constant background noise which can be removed via noise reduction however gusts hitting the microphone create large transients making sections of the recording unusable. Locating the recorder in a sheltered spot has been proven to reduce wind noise.
- Insects and other critters – the microphone picks up insects as well as birds. A mosquito near the mic sounds like a dentist drill. A nearby bee or wasp sounds like a jumbo jet. The occasional tree frog is always a nice surprise though.
- Mechanical noise – Airplane, Chopper or Vehicle passing on the gravel road are all picked up by the microphone. Luckily these sounds are easily identifiable when viewing the audio waveform so can be removed before post-processing the remaining audio.
Machine Learning – The Hard Stuff
According to my research the most common approach to audio classification uses Spectrograms. In simple terms a spectrogram provides a visual representation of sound. This enables us to apply the following method in classifying birds in our recordings:
- Create a spectrogram image for each Clip in our audio Clip Database
- Use image-recognition/object-matching techniques to search for spectrogram patterns that look-like our clip inside a bulk audio recording
In the computer science field extensive research has been conducted in the areas of Shape Matching, Object Recognition and Facial Recognition which in turn has driven advances in the use of Machine Learning and AI within this problem space.
Here is a spectrogram of a single call from a Morepork (NZ native Owl)
And here is a longer section of the audio showing repeated calls by the same bird. You will hear some other birds calling in the background. Note the consistent visual appearance of the Morepork calls.
This video illustrates the process
Now that we have isolated the bird call we can use a Sliding Window approach to move through our large audio file looking for a visual pattern match with the spectrogram from our single bird call as follows.
This requires a lot of computation. Luckily we now have Machine Learning libraries which leverage GPU processing power to perform these operations in a highly parallel manner. Our goal is to re-purpose some retired cryptocurrency mining rigs to provide the GPU compute power for processing the bulk audio data collected by our Bird Recorders.
- Measuring Bird Species distribution and population density in a given region
- Monitoring the effect of Predator Control operations by comparing data collected Before/After application of predator control measures in a given region
- Bulk collection of audio data for future analysis (as ML/AI technology becomes more advanced)
At this time much of the audio processing is performed manually. This obviously limits our ability to process large amounts of audio data being collected by our Bird Recorder. In working towards using ML/AI to streamline this processing I will be focusing on the following areas:
- Building up the Clip Database of individual bird calls in our forest
- Automated analysis of bulk audio files to determine which Filter Chain to use (noise reduction, wind removal, raindrops). Automatically tag files that are unusable (due to excessive wind/rain noise etc)
- Automate the post-processing of raw recordings via pre-defined Filter Chains that take into account factors such as Microphone Noise, Night/Day, Windy/Calm, Fine/Rainy etc.
- Applying these techniques in processing the 30GB of audio data collected during field trials of our bird recorder.
I draw inspiration (and borrow code and ideas) from the ongoing work in birdsong classification by people much smarter than me.
These projects in particular have helped guide my explorations to date:
- The Cacophony Project – fellow kiwi’s working on targeted predator control solutions using machine-vision, audio lures and other cool stuff
- AudioNotebooks by Kyle McDonald
- Bird Sounds – a Google Experiment by Kyle McDonald and Manny Tan
- Automatic acoustic detection of birds through deep learning – Dan Stowell and his team
Open-source software used on this project includes:
- Base platform – GNU Linux, Python, Bash on a Raspberry Pi A+
- ML/AI – TensorFlow, Keras, numpy, scipy, matplotlib
- Audio processing – Audacity, librosa, sox, flac, lame
About the Author
I’m Dave Pugh, founder of The OurTrees Project
My work is based around our 30-acre Native Beech Forest in the West Coast region of New Zealand. Having worked in Software Development for 35+ years my passion is in applying technology solutions to improve measurement, monitoring and management of our native forests. email@example.com