8/9/2019 Steg Final
1/28
Randal Calcote
Ty Welborn
MUSI 1335: Commercial Music Software
15 January, 2010
Models of Steganography for Digital Audio
Thesis:
Digital audio provides several robust media for embedding steganographic messages by
working with a variety of composition and recording techniques. The techniques of information
hiding are simple by comparison to mathematically complex methods currently in use
Introduction
Steganography is the process of hiding a specific message within a larger body of
information. Historically, it dates back to the Greco-Persian wars (490-487 B.C.)
Heroditus, (486-425 B.C.,) cites examples of transmitting secret tactical information using
wooden tablets and the human body as media. The Chinese and Europeans have also
developed systems of hiding information. In modern times, steganography has become
more important in the fields of privacy and financial transaction security. Increased
government restriction of cryptography and industry driven initiatives to protect
commercial and intellectual property have resulted in an ever increasing interest in
steganographic applications.
8/9/2019 Steg Final
2/28
The distinction between cryptography and steganography is necessarily soft.
Cryptography is a process of encoding a secret message based on a predetermined
algorithm. Instead of a specific algorithm, pure steganography uses random processes for
distributing information within the context of a larger, less specific message. In practice,
both methods are used in tandem to provide greater security than either one alone. For
transmissions such as information and finance to be effective, it is necessary to validate
and maintain both the identity and integrity of both the sender and receiver. All digital
transactions use encrypted passwords for this purpose. Intellectual and commercial rights
are usually established by public, digital signatures known as watermarks. These marks
vary from complex serial numbers to embedded logos in the software and digital media.
They may be visible or hidden, and they must withstand attempted removal procedures,
digitally or otherwise. These marks identify the owner or creator of film, music and
software. Similar to watermarking, cattle branding has been practiced from the 13th
century to the present to identify herd ownership at market. Watermarks were also added
to paper that would be used to print money in an attempt to recognize forgeries. Now,
watermarks are an integral part of most products transmitted digitally. They are usually
visible for graphics and invisible for data or sound files (Petitcolas 7).
Cryptographic and steganographic models evolved from the wax writing tablets of
the Greeks to the digital watermarks and copyright symbols of todays digital rights
management. Extensive material has already been published on the relevant mathematics
of cryptography based on large prime numbers and steganographic computer software for
distributing information within digital media. This paper will therefore, only briefly refer to
8/9/2019 Steg Final
3/28
relevant algorithms and texts. The systems will be presented summarily to demonstrate
some general principles of embedding covert information within several different media
formats.
Audio and sound are similar, but they are not exactly the same. Sound is produced
when a physical action creates an oscillating field of pressure in the air between the
occurrence of the event and the human ear. The ear converts these vibrations to a set of
electrical impulses which the human brain interprets as sound. The event is mechanical,
but the human experience of it is subjective. Audio is a representation of a sonic even
which is transmitted via electronic channels. It is then recorded for later use or broadcast
to a mass audience. Audio information, either analog or digital, provides a robust set of
media for embedding encrypted, or steganographic objects.
1. Practical History
Steganography is ubiquitous in modern society. All commercially recorded music
and video products carry digital watermarks to identify the owner of a digital copyright.
Music, video and software products also carry encrypted serial numbers to aid in tracking
unauthorized copies. MP3 files are a compressed audio file format developed by the
Motion Picture Experts Group for portability across the internet. Each file has ID 3v1 and
ID 3v2 tags embedded in the computer code that becomes the sound you hear when an
MP3 is played. The ID tags contain information about the music, its owner and point of
origin. Though they are hidden in plain sight,
8/9/2019 Steg Final
4/28
they can be viewed and edited by most mp3 players. However, the majority of users are
completely unaware of the existence of these tags (Chaum 85).
These electronic signatures are not only present in music and video files, but are
also included in electronic devices intended for reproduction and transmission across
extended computer and entertainment networks. For example, every printer carries a
unique identifier which it embeds in the printed output of every page (Brassil 1278-1287).
When anyone logs on to any computer or ATM, they do so using encrypted passwords to
allow access to that particular communications channel. These codes and watermarks
represent the most common, active elements in steganographic useage. The covert
channel is any communication path not originally designed to transfer information, but,
rather to validate the origin of materials. In computer systems, these channels are used to
return information to their owner while performing a service for another user or program,
such as trojans, ad-bots and spyware. They use the same structure as legitimate
programs made to validate identification and distribute information (Lampson 615).
Passwords and PINs, although not purely steganographic objects, are mentioned here
because cryptography is currently an important aspect of transmitting hidden information.
They are employed in an attempt to prevent malicious abuse of information systems.
Thus, cryptographic and steganographic systems used in tandem provide theoretically
more secure channels for transmitting covert or private information (Petitcolas 11).
These processes have been important for both commerce and war, and have
evolved throughout history. Herodtus cited several examples of steganography in his
Histories. In 440 B.C., Histius was the military leader of Miletus under Darius I, King of
8/9/2019 Steg Final
5/28
Persia. While living in Susa at the command of Darius, although loyal to the Persians, he
was unhappy with his condition and wanted to return to his home. He shaved the head of
a trusted slave, and tattooed a message on to his skin. After the slaves hair grew back, it
hid the message. This message was relayed in order to instigate a revolt against the
Persians, so that conditions at home would require his return to oversee the conflict
[Heroditus 84-87).
Another example cited in the Histories is that of Demerits, the King of Sparta from
515-491 B.C. Cleomenes, a rival for the throne, bribed the Delphic Oracle to denounce him
as an illegitimate King. After being deposed, he was forced to flee to the Persian court.
Upon learning of the Persian invasion plans of Xerxes in 480 B.C., he devised a plan to
warn Sparta of a coming invasion. After removing the wax from a writing tablet, he
scratched a message on the bare wood and applied a fresh coat of wax to the tablet to
cover the message. This tactic worked so well that the Spartans thought it was a fresh,
new writing tablet. In fact, the Spartan king Leonidas almost did not find the message in
time to contrive an adequate defense at the Battle of Thermopylae (Heroditus 87-89).
One method of steganography attributed to Julius Caesar is known as the shift
ciphe. It is simple to construct, and not very secure under scrutiny. In practice, each
letter in a plaintext message is replaced by a letter some fixed number of positions down
the alphabet. For example, with a shift of 3, a would be replaced by d, b would be
replaced by an e, and so on with x, y and z being replaced with a, b and c to complete
the cipher (Wilkipedia.org).
8/9/2019 Steg Final
6/28
Thus a message like hello world would be replaced by khoor zruog.
However, this message would be easily deciphered, even if the space between words were
omitted, khoorzruog. Only a slight improvement of the message security is achieved by
reversing the spelling of the whole message, gourzroohk, or even by reversing the
spelling and order, roohkgourz ollehdlrow.
What this means is that simple transposition of letters is basically insecure.
Randomizing letters and ciphers brings an increasing complexity to the job of deciphering
stegotext. In ancient China, masks were made with holes in them to indicate the position
of hidden letters embedded within a larger text. The mask was used as an overlay to
decipher the secret held within the larger text(Katzenbeisser 21).
8/9/2019 Steg Final
7/28
This process was reinvented by the 16th century Italian mathematician, Giralamo
Cardano in 1550. He was known to the French as Jerom Cardan. Hence, the grille cipher
is named the Cardan Grille in homage to Cardano. He proposed a method of hiding a
message in plain sight via use of a parchment cipher.
Holes were punched in the parchment at random intervals, and a message was
written inside the holes.
Then, the parchment was removed and the message was surrounded with letters
and numbers.
Cardanos proposal was primarily used as a literary game, quite common among
European aristocracy of the time. Decryption required the original, or an identical
8/9/2019 Steg Final
8/28
parchment cipher. This made for relatively good security provided the original author
composed a reasonable cover message around the intended stegotext. Although well
received by European nobles, the grille cipher served little use other than as a source of
amusement. The two main weaknesses of the process were the necessity of composing a
suitable cover, and the incriminating nature of possessing a grille cipher if apprehended by
an enemy (Wilkipedia.org).
Playing games with words held a place of esteem among European literata as early
as the Italian poet Boccaccio, (1313-75). Boccaccio wrote the worlds longest acrostic in
the form of a set of sonnets, the Amorosa Visione which used the first letter of every line
of the 1,500 word poem to pay homage to a certain, noble lady who would be forever
beyond his means. (Wilkins, E.H. 105-106)The acrostic is a literary form that takes the
first letter of every syllable, word or line to construct a new word. It would be difficult to
understand the decrypted message which would result from this poem in Italian.
However, a variant example in English will illustrate the concept. Here is an excerpt of text
from an email which I recently sent to a friend:
you should be glad you are notHere.my days begin very Early.a day in college is very Long.my classes always end Late.i am always glad when they are Over.no one fails if they Work.this is not really Obvious.
it does not seem Right.no one gets to do what they Like.our work is never Done.
In this example, the first letter of the last word in each line was chosen as a cipher,
and a text message was constructed around it. A cipher is a set of rules to follow in order
8/9/2019 Steg Final
9/28
to understand a secret communications. The Caesar Shift and Cardan Grille are both
examples of ciphers. Obviously, the letters were not bold, underlined or upper-case in the
original message. The message and cover text were both contrived to illustrate a basic
principle of steganography, that the most secure transmission should not only be hidden
to all but the intended receiver, but not even be discernable to casual observation.
The word steganography is derived from Greek words meaning covered and
writing. Various tactics have been used, such as highlighting specific letters with invisible
ink or changing the stroke length of important letters in a hidden message. Printing in the
late 16th and early 17th centuries provided a medium for embedding messages by
concealing, instead of encrypting information (Brassil 88). As seen in the image below, the
inaccuracies inherent in the printing process left random spaces and spurious information
characteristics that made casual detection difficult. J. Wilkins published a pamphlet in
1694, with an excruciatingly long title, wherein he describes how letters could be accented
by long strokes, errors, fonts and stylistic features at random points within the text in
order to, . . . . send swift and secret communications with a friend in privacy . . . Wilkins
would poke small holes above significant letters to distinguish them in the cover text
(Wilkins, J. 88-96).
8/9/2019 Steg Final
10/28
The German scientist, Gaspar Schott (1608 1666) made the first drawings of
universal joints, air pumps and other devices long before they were actually invented
somewhere else. Like many scholars of the time, he also studied music, and described a
steganographic model which substituted music notes for letters as a means of encoding
and decoding messages. The assignments were, and still may be random, or specific to a
given message. The outcome was rarely musical. However, it did serve the purpose of
hiding information from non-musicians (Petitcolas 13).The following illustration shows
both an encoded message, and the cipher which relates letters to specific notes. We will
return to this concept later when we examine the potential uses of steganography in
conjunction with MIDI.
8/9/2019 Steg Final
11/28
As time passed, cultures, and their methods of hiding information became
increasingly sophisticated. The French photographer Ren Dagron, was granted the first
microfilm patent in 1859. During the siege of Paris 1870-71 by the Prussian army, Dagron
sent carrier pigeons with messages on microdots across German lines. This was the first
military application of microfilm. Once Dagron achieved a photographic reduction of more
than 40 diameters, the microfilms produced weighed approximately 0.05 grams each, and
a pigeon could carry up to 20 at a time (Wilkipedia.org).
In 1940, actress Hedy Lamarr met her fourth husband, George Antheil, an avante
garde composer and author. In 1942, they received a patent for a secure communications
system that provided radio guidance for torpedoes. Remote control guidance of torpedoes
was first proposed in 1906 by Wilhelm von Siemens. Lamarr gained her knowledge of
guidance systems from her first husband, Fritz Mandl, an Austrian arms dealer. What
Lamarr added to the new system was the idea of frequency hopping, or spread spectrum
transmission, which is still used extensively in military communications. It involves
transmitting a signal over a seemingly random set of radio frequencies, switching between
them at split-second intervals. A radio receiver synchronized to the same switching pattern
8/9/2019 Steg Final
12/28
will receive the full transmission, but any radio that is not in sync will not be able to
decode a complete message. Such radio receivers will only detect small portions of the
broadcast, and instead, will only be able to intercept what appear to be static blips, thus
hiding the message from all but the intended recipients.
Antheil contributed ideas from his composition and performance experience. He
based his part of the design on a mechanism similar to the one used in his Ballet
Mcanique. The Ballet was first performed in Paris in June of 1926. The Ballet, which was
composed for various mechanical instruments, featured a player piano, electric bells, air
plane propellers and sirens, with all the devices on stage controlled mechanically by rolls
of paper tape punched with holes, similar to the ones used to control a player piano. In
the original patent, Antheil incorporated this concept as a means to control the rapid
switching of the both the transmitters and receivers used to relay messages via what
became known as spread spectrum technology.
The design was not received well by the US Navy. The patent described the
mechanism as . . . being similar to that of a player piano . . . The US Navy, in
considering the patent submission as a practical solution to guiding torpedoes,
disregarded the entire proposal based on their considered opinion that it would not be
feasible to, . . . fit a player piano inside a torpedo. Antheil responded, unsuccessfully,
that the device could be manufactured to be about the size of a watch. Consequently, the
design was not used until 1962 during the Cuban missile crisis. By this time, the original
patent had expired, but researchers at Sylvania repeatedly cited the patent as the original
source for developing the idea to a useful stage. The lesson to be learned from this is that
8/9/2019 Steg Final
13/28
times of military crisis are not very good times to introduce newly developing technologies,
regardless of their potential. This case also demonstrates that new ideas can come from
unlikely sources that do not rely on established methods (Braun 11-15).
Mathematicians from Cardan to the present have speculated on developing a
method for encrypting messages that will be secure against detection, and that will be
statistically sound under mathematical scrutiny. A model for achieving this goal was
developed by Merrill Flood and Melvin Dresther while they were working at the RAND
Corporation in 1950. This has become a classic, textbook model of invisible
communications.
Alice and Bob are prisoners in separate cells and they want to plan an escape. To do
so, they must communicate in secret. The prison warden, Wendy, arbitrates all aspects of
their daily lives, including communications. As an opponent of their plan, she may be
either a passive, active or malicious agent. If passive, she will allow communications to
pass between them. If active, she may either block or alter their messages. But, if she is
malicious, she can send fake messages to either or both parties, or put them both in
solitary confinement, thus preventing any chance of an escape. If they send messages
with content that is scrambled, they are using encryption. If they use steganographic
techniques, however, they will be sending messages that attempt to conceal the fact that
there is any covert communication. Under this system, they can openly send messages
along unclassified channels, which contain confidential information (Simmons 51-67).
To achieve this goal, they both select a pair of random numbers, which they will use
to encrypt and decrypt their information. The reciprocal pair of equations used to derive
8/9/2019 Steg Final
14/28
and verify results during the communication process also allows them to verify the identity
of the sender and the validity of the message content. After choosing their numbers, they
will each send their first number in the pair to the other one. They will keep their
respective second numbers secret. These will be used in verifying that the sender has used
the number that was originally exchanged to encrypt all of their communications.
Modular arithmetic plays an important role in deriving the random numbers used to
encrypt, or hide and decrypt, or observe sent messages. The expression used in this
process is stated as C = M K(mod n). M and n, like their secret prime numbers, are assigned
specific values, and are placed within the context of the following equations to assure
secure transmissions. Let M = 7, n = 13, a = 5 and b = 8. 5 and 8 are Alices and Bobs
secret numbers.
By choosing these values, Alice and Bob can now establish encryption keys with the
following calculations: A = M a (mod n), which she sends to Bob. She also receives B from
Bob, which he must calculate as B = Mb
(mod n), and send. She will compute her
decryption key by calculating K = B a (mod n), and Bob will also receive his key with the
equation K = A b (mod n). Therefore,
A = M a (mod n)A = 7 5 (mod thirteen)A = 16,807 (mod thirteen)A = 11
B = M b (mod n)B = 3 5 (mod thirteen)B = 243 (mod thirteen)B = 9
8/9/2019 Steg Final
15/28
Since every natural number is equivalent to the remainder obtained by dividing X
by n, and this number is called the residue of a (mod n), the residue obtained will become
the encryption key in this manner:
7 5 = 16,807,16,807 / 13 = 1,292.8461541,292.846154 1.292.000000 = 0.8461540.846154 x 13 = 1116,807ten = 11thirteen
When Alice wants to send Bob a message, she will create harmless content, known as a
cover object that will include A=11. Bob will use A to interpret the order of encrypted
letters as being every 11th character is significant to the encoded text of the message.
Similarly, Bob will use 3 5 = 243,
243 / 13 = 18.69230718.692307 18.000000 = 0.6923070.846154 x 13 = 9243ten = 9thirteen
Bob will send his message to Alice with the key of B = 9. Now, each of them can check the
results by comparison with their respective, secret values for a and b as follows:
B a = (M b) a B = M b= M ba (am) n = a m n - Rule of exponents= M ab a b = b a commutative property= (M a)b (am) n = a m n - Rule of exponents= A b A = M a
So, Alice and Bob can now, theoretically, exchange secret information over an insecure
channel, hoping that Wendy will not notice the message within their cover objects (Miller,
Herren, Hornsby, et al. 240-257). No third party observer should be able to distinguish
whether the sender is passively sending an empty cover object or an active message.
8/9/2019 Steg Final
16/28
The security of invisible communication depends entirely on the inability to distinguish
between a cover object and a secret transmission. Modifications should not be visible to
anyone but those involved in the communication process. For security purposes, the same
cover object should not be used more than once, as this would provide a framework for
deciphering future communications. Both the sender and receiver should destroy all sent
and received cover objects. No potential opponent should have access to the cover object
before the time of transmission. And finally, the cover object must contain a sufficient
amount of redundant data or space to conceal all of the secret information. Most
encryption and steganography software exploits the LSB, or least significant bit portions of
a binary file. This requires a short discussion on the topic of how computers speak
(Katzenbeisser 32).
2. Digital Media
A cover object can be any data, image or sound file. At the fundamental level, any
digital file consists of a series of 1s and 0s. Each 1 or 0 is called a bit, and a group of bits
placed in a sequence is called a byte. Bits are grouped in to 8s, 16s, 32s and 64s.
Computers are actually complex systems of electrical switches that are either turned on,
(1) or off, (0.) When electricity flows through a switch, current travels to a specific device
that performs its predefined function.
Each device in a computer system accesses a stored table of binary values which
correspond to a list of specific conditions for performing a task. For example, if you press
the a on a computer keyboard, the keyboard and computer are wired in such a way that
8/9/2019 Steg Final
17/28
a signal is sent through an electronic network, where it is then compared to the binary
value 01100001 from a standard, shared list of 8 bit combinations. 01100001 is the 97 th
entry in the ASCII standard table of binary communication codes. If there is a positive
match, then the human symbol a is first stored at a location in memory, and then sent
through a computer program to either be displayed on a monitor screen, printed by a
printer or stored for later recall in a collection of bytes that will be some type of text file.
If, however, the upper case A is sent by using the shift and a key at the same time, the
computer system uses 01000001 to transmit instructions between components as the
human symbol A. The following chart lists the 52 upper and lower case letters as
represented in the ASCII.
This system of substitution, called the ASCII code, was developed by the x3.2
Committee of the American Standards Association from 1960 to 1986. ASCII stands for
American Standard Code for Information Interchange, and was developed for Bell Labs as
8/9/2019 Steg Final
18/28
a method of transmitting telegraphic code. 128 binary codes for printed characters and
control characters are used to transmit text across the internet and other electronic
communications media (Wilkipedia.org).
In a text file, any letter, number, punctuation mark or keyboard command can be
expressed as a byte, a set of 1s and 0s. To say cat, typing the ASCII characters C
(#67=01000011,) A (#65=01000001) and T (#84=01010100) will print out the word CAT
to a computer screen, a file or a printer(Huber and Runstein 216-219). Each byte has
eight bits. The one at the left of the chain represents the largest numerical value in binary
and is called the most significant bit, or MSB. The number on the far right represents the
smallest binary value and is called the least significant bit, or LSB. Most computer
documents, text files, emails, and even pictures are relatively small, requiring only a few
thousand bytes for a complete representation of their data. By comparison, audio files are
large, with sizes going up in to millions of bytes, and are therefore more complex simply by
virtue of their size. The assembled bytes represent information about frequency (musical
pitch), amplitude (loudness) and elements of time, or the duration of a pitch. All of these
bytes have a MSB and a LSB. Current steganographic software uses the LSB section of files
as the medium of space to embed hidden messages. Either bit can be used, but the LSB is
the most common area for hiding information within digital media. Encryption software is
used to generate this code and its distribution within the file (Katzenbeisser 37).
In the example of The Prisoners Problem above, single digit prime numbers were
used to generate the encryption keys which were used by both parties to send private
messages. The resulting coded messages soon became insecure with repeated
8/9/2019 Steg Final
19/28
transmissions. This short term usefulness led to the development of the RSA secure
transaction standard at MIT in the early 1970s. The significant improvement lies in the
fact that the variables for the equation C = M K(mod n) which generate the encryption keys
have increased from single digit numbers to prime numbers with up to several hundred
digits. This drastically increases time and magnitude of both the calculations and
encryption/decryption process. Absolute security is never guaranteed, but robustness
against attack is certain.
3. Models of transmission beat coding
It is convenient at this point to move away from current theories and methods
which are so harshly burdened with math in order to consider some models of information
hiding within other disciplines, particularly those of music and audio recording.
Computers speak binary, or digital, at the fundamental level of machine
operation. Sophisticated electronic switching networks turn circuits either on (1) or off (0).
The sequence of 1s and 0s tells the computer how to generate electrical impulses which
will eventually be heard as sound. Large grouping of bits are assembled to make digital
audio files.
Although audio files are digital, their output is ultimately analog sound. When
sound is recorded using computers and software, it becomes audio, and therefore digital.
Noise or any other information can be added to a recording at any time. From this point
on in this discussion, it may safely be assumed that a recording is digital. The models being
presented were produced on software currently available to anyone. None of the software
8/9/2019 Steg Final
20/28
is spy-ware or malicious code, and it is not considered to be a steganographic tools. It was
actually designed for recording music and producing audio in various formats.
Digital audio has two major format divisions. They are digital audio and MIDI. The
distinction between digital audio and sampling is slight. It mainly pertains to the length
of the sound being recorded. In modern culture, a sample refers to a short, recorded
sound. This can be a short piece of music, a celebrity quote, or a sound effect. They are
used in hip hop records, broadcast commercials, movies and live theater.
A digital recording converts electrical impulses from a sound source to a string of
1s and 0s, by splitting a original sound into thousands of extremely short slices of digital
information, which are called samples in the realms of consumer and professional audio
recording. They are recorded and played back so fast, generally at 44,100 times per
second, or 44.1 k samples per second, that the human ear interprets them as continuous
sound in much the same way that a movie filmed at 24 frames per second projects an
illusion of unbroken activity on a movie screen. By comparison, analog tape recorders
capture real time sound events as a constantly changing and continuous stream of
information that mirrors the experience of sound.
Regardless of the format, digital or analog, the end result is the same. We produce
a recording of an event, which we can store, replay and manipulate with the proper
software and equipment. In that sense, all recordings are equal, and therefore, all content
is equal. Content is susceptible to manipulation. A typical recording of a rock
band would include drums, bass, guitar and vocals. These are all recorded separately, and
later mixed together to make a sound file that will become a commercial CD.
8/9/2019 Steg Final
21/28
Besides the typical group of instruments, an engineer, band, or songwriter might
want to include any other combination of instruments from kazoos to symphony
orchestras. These decisions are always arbitrary, and they always contribute to the final
recorded sound of all the instruments playing together. Many commercial recordings are
made using Pro Tools, a software package that emulates a traditional recording studio
inside a computer. A typical recording session brings some musicians in to a room who are
connected through microphones and cables to a DAW, or Digital Audio Workstation. The
DAW is comprised of the computer, Pro Tools, or some other recording software, and the
associated hardware to connect the musicians to their virtual recording studio. As the
musicians perform their song, an engineer establishes electrical contact and records the
performance on separate tracks, which will later be manipulated to produce a recording of
their song for commercial release.
Consumer demands and industry standards guarantee that almost any recording
will be in stereo. This means that there will be two separate channels, or sets of sound
coming out of the speakers used for listening. These two separate channels will be almost
identical, but will have slight differences in content for the purpose of recreating an
experience of live performance. Our hypothetical session has gone well, and we have a
song ready to release, when, suddenly the bass player says that he wants to add a secret
message for all of his fans. Being an Eagle Scout, and a great fan of steganography, he
decides to use Morse code to send his message, and he is going to spread it across the left
and right channels to indicate the dots and dashes, respectively. He could have just used
two notes played repeatedly on his bass to substitute for the dot and dash. But, he decided
8/9/2019 Steg Final
22/28
that the left, right distinction between the dots and dashes would add another layer of
security, since most people would not be looking for a secret coded message in a
tambourine track. He taps out the message with the tambourine, and it is recorded on to
a track. The pattern is simple, with no syncopation or ornamentation, and each tap always
falls exactly on a quarter note in the song. The engineer later separates the designated
beats to the left and right tracks of the final mix.
There are ten letters in the phrase hello world. The Morse code ciphers for the
letters are no longer than four pulses each, so one letter can be placed in each measure.
Their distribution across time is less important than their placement in the left or right
channel. The Morse code for hello world is:
H E L L O W O R L D* * * * * * - * * - - * * - - - * - - - - - * - * * - * * - * *L L L L L L R L L L R L L R R R L R R R R R L R L L R L L R L L
The L and R notes below the * and directly transpose the Morse code elements to a
stereo mix of the song. The part is not very musical or long, but it is also neither offensive,
nor out of place within the context of a rock song.
Everyone in the studio agrees that this is a good method of hiding the code,
everyone, that is, except the drummer, who feels like the tambourine is directing attention
away from his drum part. But, he does have a valid argument when he says, . . . that not
many people will easily notice the difference between a single beat being in the left
channel versus the right. So, he proposes that the code be placed using two different
notes on a piano. The notes will be played on the same beats as the tambourine, still
8/9/2019 Steg Final
23/28
audible, but less intrusive. The engineer adds some reverb to the piano track to make it
sound far away and groovy, which it does.
While all of this has been going on, the guitar player has been out in the parking lot
trying to flag down a pizza delivery truck, to no avail; there is no pizza in sight. So, when
he comes back to the studio and hears the crazy stuff that has been done to his song, he
is livid. Agreeing that a secret message in the song could boost sales, he proposes a way
to keep it really secret. Instead of using notes or beats that can actually be heard, he
suggests using short pulses of high frequency noise placed on the same beats. The human
ear can detect frequencies ranging from 20 hertz, or cycles per second on the low end to
20,000 hertz, or 20,000 cycles per second. The higher the number of vibration cycles that
occur, the higher the pitch will sound. The 20-20,000 Hz range is an average range. In
fact, many people cannot hear all the way up to the top of this frequency range, actually
losing the ability to hear sounds between 15-20,000 Hz. Knowing this fact, the band
agrees to use a pitch of 15,800 Hz, which will be almost indiscernible to the ear, but which
will none the less, appear in the left and right channels of the final mixed output. This
method works so well that it cannot be heard without a little electronic processing. Using
a high pass audio filter which will remove lower frequencies, and a compressor to reduce
the dynamic range, or difference between the loudest and softest sounds heard in the
song, the message can now be heard as a series of high frequency beeps playing along
with the rest of the song, which sounds like it is being played through very bad speakers.
If we ignore the inferior sound quality that this final processing creates, our secret
message can now be heard, regardless of the fact that most consumers will not be aware
8/9/2019 Steg Final
24/28
of it being in the mix, nor will they have a copy of Pro Tools on their home computer or car
stereo. However, a real spy could use a piece of equipment known as a spectrum
analyzer to graphically represent the frequency and duration of the high frequency pulses,
thus revealing the code to a trained eye. The band has successfully hidden their message
in the audio of their song.
But, digital audio is only one method of creating music with computers. MIDI is an
industry standard set of computer codes which allows computers and other electronic
musical instruments to exchange information and create music. This differs from digital
audio in that, no sound is played outside the computer for recording purposes. Instead, a
set of binary codes introduced in 1982, and known as the general MIDI standard instructs
the instrument and computers to generate a note with an electronic music synthesizer
inside the computer or controlling instrument.
Digital Performer is a computer based music notation program which allows
composers several methods of arranging notes in a song, by either playing them on an
electronic music keyboard, writing them on a virtual musical staff or by editing the actual
computer code that represents pitch, volume, duration and special effect processing to
make the notes sound more natural. The main benefit to MIDI over digital audio is that it
creates very small files which can easily be transferred over the internet. It is a convenient
way to write down musical scores which can then be played by musicians in a live setting.
INSERT BERG AND TRADITION
8/9/2019 Steg Final
25/28
For the last set of examples, we will again take the same message, hello world,
and apply the system of note substitution demonstrated by Gaspar Schott. As previously
mentioned, the method of substitution is arbitrary. The possibilities include using the
notes in the scale of a piece of music, a chromatic or other type of scale. In both cases, a
simple letter to note relationship is established, and the message is spelled out using
notes instead of letter. Here is the cipher for this example:
Since H is the first letter in the message, C5, or the C one octave above middle C on a
piano was chosen, with the rest of the letters being sequenced consecutively on either
side of C5.Now, by playing these notes on a MIDI keyboard that is hooked up to Digital
Performer, a melody is created.
In the next examples, our melody which, although not very musical, offers a medium to
transmit a message. By adding harmony parts to this melody, a la Cardon, the message
can be obscured even more from casual observation.
8/9/2019 Steg Final
26/28
In this example, the melody has been placed in the alto part, the second staff of notes
from the top of our song. Another approach would be to embelish our message with
ornamental notes to intentionally distract the eye and ear from tne message.
And, finally, a combination of harmony and spread spectrum distribution of notes across
the staff could be set up to create even more confusion for potential attackers looking for
our message.
8/9/2019 Steg Final
27/28
In this example, the music begins with the first note being placed in the top staff,
the second note in the next staff down and so on until all ten notes have been played. The
placement of notes on the staff begins again with the fifth and ninth being placed in the
top staff, and all other notes cascading across the staves in order to spell out our message.
While this pattern is being spelled out, the bass line, in the bottom staff,, walks steadily
through the same melody, with a few extra notes at the end to make it last as long as the
rest of the song.
These three embellishments represent only a small sampling of the possible tactics
of embedding a secret message within a MIDI file. Other tactics might include an acrostic
approach, where the first note of each measure or phrase held a significant note. An
encryption process could also require a message to be transposed with a shift cipher
before the note substitution ever began.
8/9/2019 Steg Final
28/28
In all of the audio examples presented, there is one important quality that
separates the examples from most hidden communications. As mentioned earlier, most
steganography and cryptographic software and practices hide their information in the LSB
portion of the cover object, since this has little or no effect on the appearance of the cover.
By placing the coded information squarely on the main beats of the song, either by note
values, or by adding extra sound, the information now resides in the MSB area of the song.
It is not randomized noise, but, it has been made a fundamental aspect of constructing
the file. This placement does not exempt the code from scrutiny by computer analysis, but
places it within a context of the composition , and therefore validates its placement,
making it, potentially less suspect as code that was added after the cover object was
generated.
4. Conclusion
It is possible to embed covert communications at several different levels of audio
production that do not adhere to current steganographic conventions. It is uncertain
whether current algorithms can detect these contextual, arbitrary ciphers. They will
appear as binary code that is subject to decryption. However, the key and randomization
variables are derived outside of normal decryption methods.Further study will be
required to determine exactly how robust they are to detection by current methods. Given
these questions, it is still fair to assume that the best played scans of 1s and 0s might still
lead someone else astray from the intended message, and may hint at a new set of tactics
in the field of audio forensics and steganography.
Top Related