5 tons of flacs
(July 25, 2015)
Here are some things I learned and remembered on the way to writing a flac decoder in C++11. I actually wrote most of this up in April, but I found this post sitting here basically done, so out it goes.
It's convenient to use
bcfor base conversions and general arithmetic. Just enter
obase=16, type in any decimal number and
bcwill tell you what it is in hex.
ibase=16is also useful for the reverse conversion. Be careful if you're changing
ibase, because you'll have to enter the new
obasein the base specified by
hexdumpis cool! The
-Cflag is particularly useful since it lets you see bytes in hex and as chars (did you know the first four bytes of a valid flac file spell out 'fLaC'?)
flac --analyze file.flacdumps out flac information for you when you get tired of squinting at hexdump output and parsing files manually.
Flac encodes integers in big-endian format, except when they're encoded in unary format or encoded using an extrapolation of UTF-8 (this is a neat trick; it never occured to me that you could use the byte-saving powers of UTF-8 for something other than...well, unicode characters). Another exception is tags, which store little-endian integers to comply with the vorbis spec. Delightful.
Bitfields are not your friend. Especially when the architecture of your dev machine doesn't match with the endianness of the data. They look more idiot-proof than bitshifts and masking, but don't be fooled. Here Be Dragons.
STL bitsets are nice...when they're enough. They are frequently not enough, especially if you're trying to slice and dice bytes. They are really convenient for printing out numbers in binary, though. I learned this particular trick from Katy.
Gstreamer0.10 without Pulseaudio
(March 06, 2015)
You used to be able to configure gstreamer0.10-using applications to output
sound using OSSv4 via gconf; I had the following in
<?xml version="1.0"?> <gconf> <entry name="audiosrc" mtime="1390923248" type="string"> <stringvalue>osssrc</stringvalue> </entry> <entry name="videosrc" mtime="1390923248" type="string"> <stringvalue>v4l2src</stringvalue> </entry> <entry name="videosink" mtime="1390923248" type="string"> <stringvalue>autovideosink</stringvalue> </entry> <entry name="audiosink" mtime="1390923248" type="string"> <stringvalue>oss4sink</stringvalue> </entry> </gconf>
It wasn't working after I reinstalled Linux on a new SSD (I suspected my old
drive was starting to fail).
It appears gstreamer now uses GSettings/dconf, so instead I ran the following
in a bash prompt.
music-audiosink was the only one I needed to get audio working in luakit, but
I figured that setting the rest of them couldn't hurt.
gsettings set org.freedesktop.gstreamer-0.10.default-elements music-audiosink oss4sink gsettings set org.freedesktop.gstreamer-0.10.default-elements chat-audiosink oss4sink gsettings set org.freedesktop.gstreamer-0.10.default-elements sounds-audiosink oss4sink
If you're using plain old ALSA, you can replace that with the following.
gsettings set org.freedesktop.gstreamer-0.10.default-elements music-audiosink alsasink gsettings set org.freedesktop.gstreamer-0.10.default-elements chat-audiosink alsasink gsettings set org.freedesktop.gstreamer-0.10.default-elements sounds-audiosink alsasink
Also useful for exploration are
get, which are used in this
gsettings list-keys org.freedesktop.gstreamer-0.10.default-elements gsettings get org.freedesktop.gstreamer-0.10.default-elements music-audiosink
Fun with Microphones
(March 03, 2015)
In the singing world, there are (at least) two different ways to produce sounds. They're called chest voice and head voice, possibly after the way they feel when you're using them: with chest voice, you feel vibrations mostly in your chest, and with head voice, you feel them mostly in your head (possibly sinus cavities? They're hollow and in approximately the right area).
Chest voice is the mechanism used when you're speaking, and is generally said to have more "colour" than head voice. Okay, sure, but what's colour?
Here's a recording of me singing the F above middle C with head voice and then chest voice, and a screenshot of the file when run through the spectrum analyzer program baudline.
Spectrogram (warning, this is kinda huge)
First, some background
Rarely in real life do you encounter a pure tone; rather, you hear a pure fundamental frequency accompanied by (also pure) overtones. The strength and pitch of these overtones determine the timbre (character) of the sound you're hearing. When it comes to human speech, the relative strengths of these overtones to one another determines the vowel sound that you hear (perhaps I'll explain this in a future post). A formant is, roughly speaking, a frequency range in which the overtones are especially loud. They're caused by resonance of your voice in your mouth, nasal cavities, chest, etc.
Frequently (hurrhurrhurr), overtones are integer multiples of the fundamental frequency. Such overtones are called harmonics. If $f$ is the fundamental frequency, $kf$ is called the $k$th harmonic.
I annotated the harmonic numbers in red and spraypainted in yellow anything that looked like it might have been a formant. Frequency (pitch) is plotted horizontally, with brighter pixels representing a stronger pitch presence. Each horizontal strip is a slice of time, and the higher a strip is, the earlier that time slice was.
You can see how all harmonics up to about 23 were present when I was using chest voice, though some were stronger than others. I'm not sure if they were actually not there or if my microphone was too shitty/the noise floor was too high/I was singing louder for the second tone.
Oh well, at least there were some things that my microphone did pick up for sure. The formant around harmonics 20-22 seems to be completely absent in my head voice, and the formant centred around the $10$th harmonic seems to have split up; the $9$th harmonic is visibly quieter than the $8$th or $10$th.
As an aside, I find it interesting that the second and third harmonics are louder than the fundamental frequency. I'm not sure if this is a peculiarity of my voice or of voices in general.
Other weird stuff my microphone picked up
There's also some other stuff in the recording. In the space between the two tones, there are a few extra lines that aren't accounted for.
My laptop uses an SSD, so the hard drive wasn't producing noise. My laptop
does, however, have a fan. I ran the command
sensors to get fan speed.
I figured that the fan speed didn't change much after I finished recording.
I got 3530RPM, which translates to just under 60Hz (divide by 60 to go from RPM
to Hz because there are 60 seconds in a minute).
If you squint, you can see that the barely visible first harmonic is around 59.22Hz, which seems about right. Okay, that explains the lines at 178Hz, 296Hz, 355, 414 and so forth all the way up to 1706Hz. It's hard to tell if the lowest even harmonics are actually not there or just lost in the noise, though.
On reflection, my fan does sound kind of like a saw, so this is not terribly surprising.
I don't know what's going on around 8750Hz, but I wouldn't be surprised if it was related to the fan. a lot of those frequencies are pretty close to multiples of 59.22.
Also, what's this all the way over at 19918Hz? It's far too high to be NTSC flyback whine, and too high to be an FM radio pilot tone (not that those should ever be audible anyway...)
Mysteries for another day, I guess.
- Loading WebKit2Gtk Web Extensions - December 02, 2014
- Steinitz' theorem - September 28, 2013
- PMATH 351 Final Exam Definitions - December 04, 2012
- PMATH 351 Midterm Definitions - October 31, 2012
- L'Hopital's Rule - October 07, 2012
- Chicken thyme - September 13, 2012
- Some advice from Miss Havisham - September 05, 2012