British Sign Language?
Could you not sign your secret message? Naturally, using no sound what so ever
Decryption is difficult and computationally expensive. So what if, instead of decrypting the content of a message, you found a correlation between the encrypted data and its meaning – without having to crack the code itself? Such an approach has been demonstrated by a group of University of North Carolina linguists working …
" ... patterns end up being reflected in the size of the data frame ... When the data created by CELP is encrypted, it retains the original frame size ... "
Even if you have no expertise in crypto at all (I certainly don't) that's pretty shocking. I suppose oversights can't really be avoided completely but this looks like a total absence of healthy paranoia.
The problem with Skype and other VoIP data is that they have to deal with the dual constraints of data and time. Time because voice communications has to be as close to realtime as possible in order for it to be of any practical use. That doesn't leave a whole lot of wiggle room for dealing with the data constraint of necessarily varying sizes (varying because of the need to optimize bandwidth in a tight pipeline like mobile or remote phoning).
Perhaps the best approach for the problem would be to find a computationally-modest but acceptable-quality voice codec that outputs at a constant bitrate. Then, with help from a key rotation, phoneme reconstruction gets stymied because cryptanalysts will have to deal with uniform packets.
This is my best guess based on having previously read into the research this was based on (concerning VBR in VoIP).
The music would have to be loud enough and varied enough (e.g. DnB as opposed to classical) in order to make a significant impact upon the bitstream (such being the nature of VBR encoding) in relation to the voice. Not sure if that makes sense.
If you had two people speaking simultaneously with short pauses between words and they both spoke with the same loudness, it would be harder to separate the words. If one person said one word, and the other another, the resulting bits would be as if only one person had spoken, and what he/she spoke was a single messy mash of the two words.
Perhaps an analogy is in order... if quiet background music is represented by a drop of yellow paint, and loud voice is a pot full of blue paint, mix the two together and you get a very-slightly-green blue paint. The yellow wasn't substantial enough to significantly alter the result and anyone looking at the paint will say it's blue, despite there being some yellow in it.
If you have a *pot* of yellow paint (*loud* background music) and mix the two together, you have a completely green paint. You have no idea if this was the original colour paint, or a combination of a range of colours, and there is increased difficulty in determining what the original colours/shades were.
tl;dr - Music would need to be noisy and make your voice pretty indistinguishable to a machine
Exactly how is this news? Statistical analysis and pattern matching to derive content or crack encryption has been in the standard toolkit of cryptanalysts and cryptologists forever. That is why a good encryption algorithm needs to have good diffussion. Unfortunately good diffussion is not possible for the almost-realtime and streaming nature of VOIP. In this case having streaming with a fixed bitrate would fix most of the problem.
It's news because they have improved upon previous methods in such a way that the feasability of the attack is increased and the accuracy of which can be constantly improved upon through sampling and training. Also because Skype is the main target for such an attack (popular and thought to be secure).
...it's partly due to time. The CELP family of voice codecs are rather old and based more on the need to optimize limited bandwidth. Security wasn't exactly in mind at the time, so perhaps a newer codec is needed. Then again, mobile devices have computing and power constraints as well, so developing one that is CBR, good quality, AND low bitrate will have some problems of its own.
My reading of the paper suggests that the root of the problem is that a certain phenome of a certain accent will encrypt to the same/similar string of bits every time with the same key. Essentially, they need to make the datastreams look truly random. Steganographic techniques should do the trick.
Unfortunately, this will mean more data has to go to and fro', and/or more computation will be required.