Download - Steg Final

Transcript

8/9/2019 Steg Final

1/28

Randal Calcote

Ty Welborn

MUSI 1335: Commercial Music Software

15 January, 2010

Models of Steganography for Digital Audio

Thesis:

Digital audio provides several robust media for embedding steganographic messages by

working with a variety of composition and recording techniques. The techniques of information

hiding are simple by comparison to mathematically complex methods currently in use

Introduction

Steganography is the process of hiding a specific message within a larger body of

information. Historically, it dates back to the Greco-Persian wars (490-487 B.C.)

Heroditus, (486-425 B.C.,) cites examples of transmitting secret tactical information using

wooden tablets and the human body as media. The Chinese and Europeans have also

developed systems of hiding information. In modern times, steganography has become

more important in the fields of privacy and financial transaction security. Increased

government restriction of cryptography and industry driven initiatives to protect

commercial and intellectual property have resulted in an ever increasing interest in

steganographic applications.
8/9/2019 Steg Final

2/28

The distinction between cryptography and steganography is necessarily soft.

Cryptography is a process of encoding a secret message based on a predetermined

algorithm. Instead of a specific algorithm, pure steganography uses random processes for

distributing information within the context of a larger, less specific message. In practice,

both methods are used in tandem to provide greater security than either one alone. For

transmissions such as information and finance to be effective, it is necessary to validate

and maintain both the identity and integrity of both the sender and receiver. All digital

transactions use encrypted passwords for this purpose. Intellectual and commercial rights

are usually established by public, digital signatures known as watermarks. These marks

vary from complex serial numbers to embedded logos in the software and digital media.

They may be visible or hidden, and they must withstand attempted removal procedures,

digitally or otherwise. These marks identify the owner or creator of film, music and

software. Similar to watermarking, cattle branding has been practiced from the 13th

century to the present to identify herd ownership at market. Watermarks were also added

to paper that would be used to print money in an attempt to recognize forgeries. Now,

watermarks are an integral part of most products transmitted digitally. They are usually

visible for graphics and invisible for data or sound files (Petitcolas 7).

Cryptographic and steganographic models evolved from the wax writing tablets of

the Greeks to the digital watermarks and copyright symbols of todays digital rights

management. Extensive material has already been published on the relevant mathematics

of cryptography based on large prime numbers and steganographic computer software for

distributing information within digital media. This paper will therefore, only briefly refer to
8/9/2019 Steg Final

3/28

relevant algorithms and texts. The systems will be presented summarily to demonstrate

some general principles of embedding covert information within several different media

formats.

Audio and sound are similar, but they are not exactly the same. Sound is produced

when a physical action creates an oscillating field of pressure in the air between the

occurrence of the event and the human ear. The ear converts these vibrations to a set of

electrical impulses which the human brain interprets as sound. The event is mechanical,

but the human experience of it is subjective. Audio is a representation of a sonic even

which is transmitted via electronic channels. It is then recorded for later use or broadcast

to a mass audience. Audio information, either analog or digital, provides a robust set of

media for embedding encrypted, or steganographic objects.

1. Practical History

Steganography is ubiquitous in modern society. All commercially recorded music

and video products carry digital watermarks to identify the owner of a digital copyright.

Music, video and software products also carry encrypted serial numbers to aid in tracking

unauthorized copies. MP3 files are a compressed audio file format developed by the

Motion Picture Experts Group for portability across the internet. Each file has ID 3v1 and

ID 3v2 tags embedded in the computer code that becomes the sound you hear when an

MP3 is played. The ID tags contain information about the music, its owner and point of

origin. Though they are hidden in plain sight,
8/9/2019 Steg Final

4/28

they can be viewed and edited by most mp3 players. However, the majority of users are

completely unaware of the existence of these tags (Chaum 85).

These electronic signatures are not only present in music and video files, but are

also included in electronic devices intended for reproduction and transmission across

extended computer and entertainment networks. For example, every printer carries a

unique identifier which it embeds in the printed output of every page (Brassil 1278-1287).

When anyone logs on to any computer or ATM, they do so using encrypted passwords to

allow access to that particular communications channel. These codes and watermarks

represent the most common, active elements in steganographic useage. The covert

channel is any communication path not originally designed to transfer information, but,

rather to validate the origin of materials. In computer systems, these channels are used to

return information to their owner while performing a service for another user or program,

such as trojans, ad-bots and spyware. They use the same structure as legitimate

programs made to validate identification and distribute information (Lampson 615).

Passwords and PINs, although not purely steganographic objects, are mentioned here

because cryptography is currently an important aspect of transmitting hidden information.

They are employed in an attempt to prevent malicious abuse of information systems.

Thus, cryptographic and steganographic systems used in tandem provide theoretically

more secure channels for transmitting covert or private information (Petitcolas 11).

These processes have been important for both commerce and war, and have

evolved throughout history. Herodtus cited several examples of steganography in his

Histories. In 440 B.C., Histius was the military leader of Miletus under Darius I, King of
8/9/2019 Steg Final

5/28

Persia. While living in Susa at the command of Darius, although loyal to the Persians, he

was unhappy with his condition and wanted to return to his home. He shaved the head of

a trusted slave, and tattooed a message on to his skin. After the slaves hair grew back, it

hid the message. This message was relayed in order to instigate a revolt against the

Persians, so that conditions at home would require his return to oversee the conflict

[Heroditus 84-87).

Another example cited in the Histories is that of Demerits, the King of Sparta from

515-491 B.C. Cleomenes, a rival for the throne, bribed the Delphic Oracle to denounce him

as an illegitimate King. After being deposed, he was forced to flee to the Persian court.

Upon learning of the Persian invasion plans of Xerxes in 480 B.C., he devised a plan to

warn Sparta of a coming invasion. After removing the wax from a writing tablet, he

scratched a message on the bare wood and applied a fresh coat of wax to the tablet to

cover the message. This tactic worked so well that the Spartans thought it was a fresh,

new writing tablet. In fact, the Spartan king Leonidas almost did not find the message in

time to contrive an adequate defense at the Battle of Thermopylae (Heroditus 87-89).

One method of steganography attributed to Julius Caesar is known as the shift

ciphe. It is simple to construct, and not very secure under scrutiny. In practice, each

letter in a plaintext message is replaced by a letter some fixed number of positions down

the alphabet. For example, with a shift of 3, a would be replaced by d, b would be

replaced by an e, and so on with x, y and z being replaced with a, b and c to complete

the cipher (Wilkipedia.org).
8/9/2019 Steg Final

6/28

Thus a message like hello world would be replaced by khoor zruog.

However, this message would be easily deciphered, even if the space between words were

omitted, khoorzruog. Only a slight improvement of the message security is achieved by

reversing the spelling of the whole message, gourzroohk, or even by reversing the

spelling and order, roohkgourz ollehdlrow.

What this means is that simple transposition of letters is basically insecure.

Randomizing letters and ciphers brings an increasing complexity to the job of deciphering

stegotext. In ancient China, masks were made with holes in them to indicate the position

of hidden letters embedded within a larger text. The mask was used as an overlay to

decipher the secret held within the larger text(Katzenbeisser 21).
8/9/2019 Steg Final

7/28

This process was reinvented by the 16th century Italian mathematician, Giralamo

Cardano in 1550. He was known to the French as Jerom Cardan. Hence, the grille cipher

is named the Cardan Grille in homage to Cardano. He proposed a method of hiding a

message in plain sight via use of a parchment cipher.

Holes were punched in the parchment at random intervals, and a message was

written inside the holes.

Then, the parchment was removed and the message was surrounded with letters

and numbers.

Cardanos proposal was primarily used as a literary game, quite common among

European aristocracy of the time. Decryption required the original, or an identical
8/9/2019 Steg Final

8/28

parchment cipher. This made for relatively good security provided the original author

composed a reasonable cover message around the intended stegotext. Although well

received by European nobles, the grille cipher served little use other than as a source of

amusement. The two main weaknesses of the process were the necessity of composing a

suitable cover, and the incriminating nature of possessing a grille cipher if apprehended by

an enemy (Wilkipedia.org).

Playing games with words held a place of esteem among European literata as early

as the Italian poet Boccaccio, (1313-75). Boccaccio wrote the worlds longest acrostic in

the form of a set of sonnets, the Amorosa Visione which used the first letter of every line

of the 1,500 word poem to pay homage to a certain, noble lady who would be forever

beyond his means. (Wilkins, E.H. 105-106)The acrostic is a literary form that takes the

first letter of every syllable, word or line to construct a new word. It would be difficult to

understand the decrypted message which would result from this poem in Italian.

However, a variant example in English will illustrate the concept. Here is an excerpt of text

from an email which I recently sent to a friend:

you should be glad you are notHere.my days begin very Early.a day in college is very Long.my classes always end Late.i am always glad when they are Over.no one fails if they Work.this is not really Obvious.

it does not seem Right.no one gets to do what they Like.our work is never Done.

In this example, the first letter of the last word in each line was chosen as a cipher,

and a text message was constructed around it. A cipher is a set of rules to follow in order
8/9/2019 Steg Final

9/28

to understand a secret communications. The Caesar Shift and Cardan Grille are both

examples of ciphers. Obviously, the letters were not bold, underlined or upper-case in the

original message. The message and cover text were both contrived to illustrate a basic

principle of steganography, that the most secure transmission should not only be hidden

to all but the intended receiver, but not even be discernable to casual observation.

The word steganography is derived from Greek words meaning covered and

writing. Various tactics have been used, such as highlighting specific letters with invisible

ink or changing the stroke length of important letters in a hidden message. Printing in the

late 16th and early 17th centuries provided a medium for embedding messages by

concealing, instead of encrypting information (Brassil 88). As seen in the image below, the

inaccuracies inherent in the printing process left random spaces and spurious information

characteristics that made casual detection difficult. J. Wilkins published a pamphlet in

1694, with an excruciatingly long title, wherein he describes how letters could be accented

by long strokes, errors, fonts and stylistic features at random points within the text in

order to, . . . . send swift and secret communications with a friend in privacy . . . Wilkins

would poke small holes above significant letters to distinguish them in the cover text

(Wilkins, J. 88-96).
8/9/2019 Steg Final

10/28

The German scientist, Gaspar Schott (1608 1666) made the first drawings of

universal joints, air pumps and other devices long before they were actually invented

somewhere else. Like many scholars of the time, he also studied music, and described a

steganographic model which substituted music notes for letters as a means of encoding

and decoding messages. The assignments were, and still may be random, or specific to a

given message. The outcome was rarely musical. However, it did serve the purpose of

hiding information from non-musicians (Petitcolas 13).The following illustration shows

both an encoded message, and the cipher which relates letters to specific notes. We will

return to this concept later when we examine the potential uses of steganography in

conjunction with MIDI.
8/9/2019 Steg Final

11/28

As time passed, cultures, and their methods of hiding information became

increasingly sophisticated. The French photographer Ren Dagron, was granted the first

microfilm patent in 1859. During the siege of Paris 1870-71 by the Prussian army, Dagron

sent carrier pigeons with messages on microdots across German lines. This was the first

military application of microfilm. Once Dagron achieved a photographic reduction of more

than 40 diameters, the microfilms produced weighed approximately 0.05 grams each, and

a pigeon could carry up to 20 at a time (Wilkipedia.org).

In 1940, actress Hedy Lamarr met her fourth husband, George Antheil, an avante

garde composer and author. In 1942, they received a patent for a secure communications

system that provided radio guidance for torpedoes. Remote control guidance of torpedoes

was first proposed in 1906 by Wilhelm von Siemens. Lamarr gained her knowledge of

guidance systems from her first husband, Fritz Mandl, an Austrian arms dealer. What

Lamarr added to the new system was the idea of frequency hopping, or spread spectrum

transmission, which is still used extensively in military communications. It involves

transmitting a signal over a seemingly random set of radio frequencies, switching between

them at split-second intervals. A radio receiver synchronized to the same switching pattern
8/9/2019 Steg Final

12/28

will receive the full transmission, but any radio that is not in sync will not be able to

decode a complete message. Such radio receivers will only detect small portions of the

broadcast, and instead, will only be able to intercept what appear to be static blips, thus

hiding the message from all but the intended recipients.

Antheil contributed ideas from his composition and performance experience. He

based his part of the design on a mechanism similar to the one used in his Ballet

Mcanique. The Ballet was first performed in Paris in June of 1926. The Ballet, which was

composed for various mechanical instruments, featured a player piano, electric bells, air

plane propellers and sirens, with all the devices on stage controlled mechanically by rolls

of paper tape punched with holes, similar to the ones used to control a player piano. In

the original patent, Antheil incorporated this concept as a means to control the rapid

switching of the both the transmitters and receivers used to relay messages via what

became known as spread spectrum technology.

The design was not received well by the US Navy. The patent described the

mechanism as . . . being similar to that of a player piano . . . The US Navy, in

considering the patent submission as a practical solution to guiding torpedoes,

disregarded the entire proposal based on their considered opinion that it would not be

feasible to, . . . fit a player piano inside a torpedo. Antheil responded, unsuccessfully,

that the device could be manufactured to be about the size of a watch. Consequently, the

design was not used until 1962 during the Cuban missile crisis. By this time, the original

patent had expired, but researchers at Sylvania repeatedly cited the patent as the original

source for developing the idea to a useful stage. The lesson to be learned from this is that
8/9/2019 Steg Final

13/28

times of military crisis are not very good times to introduce newly developing technologies,

regardless of their potential. This case also demonstrates that new ideas can come from

unlikely sources that do not rely on established methods (Braun 11-15).

Mathematicians from Cardan to the present have speculated on developing a

method for encrypting messages that will be secure against detection, and that will be

statistically sound under mathematical scrutiny. A model for achieving this goal was

developed by Merrill Flood and Melvin Dresther while they were working at the RAND

Corporation in 1950. This has become a classic, textbook model of invisible

communications.

Alice and Bob are prisoners in separate cells and they want to plan an escape. To do

so, they must communicate in secret. The prison warden, Wendy, arbitrates all aspects of

their daily lives, including communications. As an opponent of their plan, she may be

either a passive, active or malicious agent. If passive, she will allow communications to

pass between them. If active, she may either block or alter their messages. But, if she is

malicious, she can send fake messages to either or both parties, or put them both in

solitary confinement, thus preventing any chance of an escape. If they send messages

with content that is scrambled, they are using encryption. If they use steganographic

techniques, however, they will be sending messages that attempt to conceal the fact that

there is any covert communication. Under this system, they can openly send messages

along unclassified channels, which contain confidential information (Simmons 51-67).

To achieve this goal, they both select a pair of random numbers, which they will use

to encrypt and decrypt their information. The reciprocal pair of equations used to derive
8/9/2019 Steg Final

14/28

and verify results during the communication process also allows them to verify the identity

of the sender and the validity of the message content. After choosing their numbers, they

will each send their first number in the pair to the other one. They will keep their

respective second numbers secret. These will be used in verifying that the sender has used

the number that was originally exchanged to encrypt all of their communications.

Modular arithmetic plays an important role in deriving the random numbers used to

encrypt, or hide and decrypt, or observe sent messages. The expression used in this

process is stated as C = M K(mod n). M and n, like their secret prime numbers, are assigned

specific values, and are placed within the context of the following equations to assure

secure transmissions. Let M = 7, n = 13, a = 5 and b = 8. 5 and 8 are Alices and Bobs

secret numbers.

By choosing these values, Alice and Bob can now establish encryption keys with the

following calculations: A = M a (mod n), which she sends to Bob. She also receives B from

Bob, which he must calculate as B = Mb

(mod n), and send. She will compute her

decryption key by calculating K = B a (mod n), and Bob will also receive his key with the

equation K = A b (mod n). Therefore,

A = M a (mod n)A = 7 5 (mod thirteen)A = 16,807 (mod thirteen)A = 11

B = M b (mod n)B = 3 5 (mod thirteen)B = 243 (mod thirteen)B = 9
8/9/2019 Steg Final

15/28

Since every natural number is equivalent to the remainder obtained by dividing X

by n, and this number is called the residue of a (mod n), the residue obtained will become

the encryption key in this manner:

7 5 = 16,807,16,807 / 13 = 1,292.8461541,292.846154 1.292.000000 = 0.8461540.846154 x 13 = 1116,807ten = 11thirteen

When Alice wants to send Bob a message, she will create harmless content, known as a

cover object that will include A=11. Bob will use A to interpret the order of encrypted

letters as being every 11th character is significant to the encoded text of the message.

Similarly, Bob will use 3 5 = 243,

243 / 13 = 18.69230718.692307 18.000000 = 0.6923070.846154 x 13 = 9243ten = 9thirteen

Bob will send his message to Alice with the key of B = 9. Now, each of them can check the

results by comparison with their respective, secret values for a and b as follows:

B a = (M b) a B = M b= M ba (am) n = a m n - Rule of exponents= M ab a b = b a commutative property= (M a)b (am) n = a m n - Rule of exponents= A b A = M a

So, Alice and Bob can now, theoretically, exchange secret information over an insecure

channel, hoping that Wendy will not notice the message within their cover objects (Miller,

Herren, Hornsby, et al. 240-257). No third party observer should be able to distinguish

whether the sender is passively sending an empty cover object or an active message.
8/9/2019 Steg Final

16/28

The security of invisible communication depends entirely on the inability to distinguish

between a cover object and a secret transmission. Modifications should not be visible to

anyone but those involved in the communication process. For security purposes, the same

cover object should not be used more than once, as this would provide a framework for

deciphering future communications. Both the sender and receiver should destroy all sent

and received cover objects. No potential opponent should have access to the cover object

before the time of transmission. And finally, the cover object must contain a sufficient

amount of redundant data or space to conceal all of the secret information. Most

encryption and steganography software exploits the LSB, or least significant bit portions of

a binary file. This requires a short discussion on the topic of how computers speak

(Katzenbeisser 32).

2. Digital Media

A cover object can be any data, image or sound file. At the fundamental level, any

digital file consists of a series of 1s and 0s. Each 1 or 0 is called a bit, and a group of bits

placed in a sequence is called a byte. Bits are grouped in to 8s, 16s, 32s and 64s.

Computers are actually complex systems of electrical switches that are either turned on,

(1) or off, (0.) When electricity flows through a switch, current travels to a specific device

that performs its predefined function.

Each device in a computer system accesses a stored table of binary values which

correspond to a list of specific conditions for performing a task. For example, if you press

the a on a computer keyboard, the keyboard and computer are wired in such a way that
8/9/2019 Steg Final

17/28

a signal is sent through an electronic network, where it is then compared to the binary

value 01100001 from a standard, shared list of 8 bit combinations. 01100001 is the 97 th

entry in the ASCII standard table of binary communication codes. If there is a positive

match, then the human symbol a is first stored at a location in memory, and then sent

through a computer program to either be displayed on a monitor screen, printed by a

printer or stored for later recall in a collection of bytes that will be some type of text file.

If, however, the upper case A is sent by using the shift and a key at the same time, the

computer system uses 01000001 to transmit instructions between components as the

human symbol A. The following chart lists the 52 upper and lower case letters as

represented in the ASCII.

This system of substitution, called the ASCII code, was developed by the x3.2

Committee of the American Standards Association from 1960 to 1986. ASCII stands for

American Standard Code for Information Interchange, and was developed for Bell Labs as
8/9/2019 Steg Final

18/28

a method of transmitting telegraphic code. 128 binary codes for printed characters and

control characters are used to transmit text across the internet and other electronic

communications media (Wilkipedia.org).

In a text file, any letter, number, punctuation mark or keyboard command can be

expressed as a byte, a set of 1s and 0s. To say cat, typing the ASCII characters C

(#67=01000011,) A (#65=01000001) and T (#84=01010100) will print out the word CAT

to a computer screen, a file or a printer(Huber and Runstein 216-219). Each byte has

eight bits. The one at the left of the chain represents the largest numerical value in binary

and is called the most significant bit, or MSB. The number on the far right represents the

smallest binary value and is called the least significant bit, or LSB. Most computer

documents, text files, emails, and even pictures are relatively small, requiring only a few

thousand bytes for a complete representation of their data. By comparison, audio files are

large, with sizes going up in to millions of bytes, and are therefore more complex simply by

virtue of their size. The assembled bytes represent information about frequency (musical

pitch), amplitude (loudness) and elements of time, or the duration of a pitch. All of these

bytes have a MSB and a LSB. Current steganographic software uses the LSB section of files

as the medium of space to embed hidden messages. Either bit can be used, but the LSB is

the most common area for hiding information within digital media. Encryption software is

used to generate this code and its distribution within the file (Katzenbeisser 37).

In the example of The Prisoners Problem above, single digit prime numbers were

used to generate the encryption keys which were used by both parties to send private

messages. The resulting coded messages soon became insecure with repeated
8/9/2019 Steg Final

19/28

transmissions. This short term usefulness led to the development of the RSA secure

transaction standard at MIT in the early 1970s. The significant improvement lies in the

fact that the variables for the equation C = M K(mod n) which generate the encryption keys

have increased from single digit numbers to prime numbers with up to several hundred

digits. This drastically increases time and magnitude of both the calculations and

encryption/decryption process. Absolute security is never guaranteed, but robustness

against attack is certain.

3. Models of transmission beat coding

It is convenient at this point to move away from current theories and methods

which are so harshly burdened with math in order to consider some models of information

hiding within other disciplines, particularly those of music and audio recording.

Computers speak binary, or digital, at the fundamental level of machine

operation. Sophisticated electronic switching networks turn circuits either on (1) or off (0).

The sequence of 1s and 0s tells the computer how to generate electrical impulses which

will eventually be heard as sound. Large grouping of bits are assembled to make digital

audio files.

Although audio files are digital, their output is ultimately analog sound. When

sound is recorded using computers and software, it becomes audio, and therefore digital.

Noise or any other information can be added to a recording at any time. From this point

on in this discussion, it may safely be assumed that a recording is digital. The models being

presented were produced on software currently available to anyone. None of the software
8/9/2019 Steg Final

20/28

is spy-ware or malicious code, and it is not considered to be a steganographic tools. It was

actually designed for recording music and producing audio in various formats.

Digital audio has two major format divisions. They are digital audio and MIDI. The

distinction between digital audio and sampling is slight. It mainly pertains to the length

of the sound being recorded. In modern culture, a sample refers to a short, recorded

sound. This can be a short piece of music, a celebrity quote, or a sound effect. They are

used in hip hop records, broadcast commercials, movies and live theater.

A digital recording converts electrical impulses from a sound source to a string of

1s and 0s, by splitting a original sound into thousands of extremely short slices of digital

information, which are called samples in the realms of consumer and professional audio

recording. They are recorded and played back so fast, generally at 44,100 times per

second, or 44.1 k samples per second, that the human ear interprets them as continuous

sound in much the same way that a movie filmed at 24 frames per second projects an

illusion of unbroken activity on a movie screen. By comparison, analog tape recorders

capture real time sound events as a constantly changing and continuous stream of

information that mirrors the experience of sound.

Regardless of the format, digital or analog, the end result is the same. We produce

a recording of an event, which we can store, replay and manipulate with the proper

software and equipment. In that sense, all recordings are equal, and therefore, all content

is equal. Content is susceptible to manipulation. A typical recording of a rock

band would include drums, bass, guitar and vocals. These are all recorded separately, and

later mixed together to make a sound file that will become a commercial CD.
8/9/2019 Steg Final

21/28

Besides the typical group of instruments, an engineer, band, or songwriter might

want to include any other combination of instruments from kazoos to symphony

orchestras. These decisions are always arbitrary, and they always contribute to the final

recorded sound of all the instruments playing together. Many commercial recordings are

made using Pro Tools, a software package that emulates a traditional recording studio

inside a computer. A typical recording session brings some musicians in to a room who are

connected through microphones and cables to a DAW, or Digital Audio Workstation. The

DAW is comprised of the computer, Pro Tools, or some other recording software, and the

associated hardware to connect the musicians to their virtual recording studio. As the

musicians perform their song, an engineer establishes electrical contact and records the

performance on separate tracks, which will later be manipulated to produce a recording of

their song for commercial release.

Consumer demands and industry standards guarantee that almost any recording

will be in stereo. This means that there will be two separate channels, or sets of sound

coming out of the speakers used for listening. These two separate channels will be almost

identical, but will have slight differences in content for the purpose of recreating an

experience of live performance. Our hypothetical session has gone well, and we have a

song ready to release, when, suddenly the bass player says that he wants to add a secret

message for all of his fans. Being an Eagle Scout, and a great fan of steganography, he

decides to use Morse code to send his message, and he is going to spread it across the left

and right channels to indicate the dots and dashes, respectively. He could have just used

two notes played repeatedly on his bass to substitute for the dot and dash. But, he decided
8/9/2019 Steg Final

22/28

that the left, right distinction between the dots and dashes would add another layer of

security, since most people would not be looking for a secret coded message in a

tambourine track. He taps out the message with the tambourine, and it is recorded on to

a track. The pattern is simple, with no syncopation or ornamentation, and each tap always

falls exactly on a quarter note in the song. The engineer later separates the designated

beats to the left and right tracks of the final mix.

There are ten letters in the phrase hello world. The Morse code ciphers for the

letters are no longer than four pulses each, so one letter can be placed in each measure.

Their distribution across time is less important than their placement in the left or right

channel. The Morse code for hello world is:

H E L L O W O R L D* * * * * * - * * - - * * - - - * - - - - - * - * * - * * - * *L L L L L L R L L L R L L R R R L R R R R R L R L L R L L R L L

The L and R notes below the * and directly transpose the Morse code elements to a

stereo mix of the song. The part is not very musical or long, but it is also neither offensive,

nor out of place within the context of a rock song.

Everyone in the studio agrees that this is a good method of hiding the code,

everyone, that is, except the drummer, who feels like the tambourine is directing attention

away from his drum part. But, he does have a valid argument when he says, . . . that not

many people will easily notice the difference between a single beat being in the left

channel versus the right. So, he proposes that the code be placed using two different

notes on a piano. The notes will be played on the same beats as the tambourine, still
8/9/2019 Steg Final

23/28

audible, but less intrusive. The engineer adds some reverb to the piano track to make it

sound far away and groovy, which it does.

While all of this has been going on, the guitar player has been out in the parking lot

trying to flag down a pizza delivery truck, to no avail; there is no pizza in sight. So, when

he comes back to the studio and hears the crazy stuff that has been done to his song, he

is livid. Agreeing that a secret message in the song could boost sales, he proposes a way

to keep it really secret. Instead of using notes or beats that can actually be heard, he

suggests using short pulses of high frequency noise placed on the same beats. The human

ear can detect frequencies ranging from 20 hertz, or cycles per second on the low end to

20,000 hertz, or 20,000 cycles per second. The higher the number of vibration cycles that

occur, the higher the pitch will sound. The 20-20,000 Hz range is an average range. In

fact, many people cannot hear all the way up to the top of this frequency range, actually

losing the ability to hear sounds between 15-20,000 Hz. Knowing this fact, the band

agrees to use a pitch of 15,800 Hz, which will be almost indiscernible to the ear, but which

will none the less, appear in the left and right channels of the final mixed output. This

method works so well that it cannot be heard without a little electronic processing. Using

a high pass audio filter which will remove lower frequencies, and a compressor to reduce

the dynamic range, or difference between the loudest and softest sounds heard in the

song, the message can now be heard as a series of high frequency beeps playing along

with the rest of the song, which sounds like it is being played through very bad speakers.

If we ignore the inferior sound quality that this final processing creates, our secret

message can now be heard, regardless of the fact that most consumers will not be aware
8/9/2019 Steg Final

24/28

of it being in the mix, nor will they have a copy of Pro Tools on their home computer or car

stereo. However, a real spy could use a piece of equipment known as a spectrum

analyzer to graphically represent the frequency and duration of the high frequency pulses,

thus revealing the code to a trained eye. The band has successfully hidden their message

in the audio of their song.

But, digital audio is only one method of creating music with computers. MIDI is an

industry standard set of computer codes which allows computers and other electronic

musical instruments to exchange information and create music. This differs from digital

audio in that, no sound is played outside the computer for recording purposes. Instead, a

set of binary codes introduced in 1982, and known as the general MIDI standard instructs

the instrument and computers to generate a note with an electronic music synthesizer

inside the computer or controlling instrument.

Digital Performer is a computer based music notation program which allows

composers several methods of arranging notes in a song, by either playing them on an

electronic music keyboard, writing them on a virtual musical staff or by editing the actual

computer code that represents pitch, volume, duration and special effect processing to

make the notes sound more natural. The main benefit to MIDI over digital audio is that it

creates very small files which can easily be transferred over the internet. It is a convenient

way to write down musical scores which can then be played by musicians in a live setting.

INSERT BERG AND TRADITION
8/9/2019 Steg Final

25/28

For the last set of examples, we will again take the same message, hello world,

and apply the system of note substitution demonstrated by Gaspar Schott. As previously

mentioned, the method of substitution is arbitrary. The possibilities include using the

notes in the scale of a piece of music, a chromatic or other type of scale. In both cases, a

simple letter to note relationship is established, and the message is spelled out using

notes instead of letter. Here is the cipher for this example:

Since H is the first letter in the message, C5, or the C one octave above middle C on a

piano was chosen, with the rest of the letters being sequenced consecutively on either

side of C5.Now, by playing these notes on a MIDI keyboard that is hooked up to Digital

Performer, a melody is created.

In the next examples, our melody which, although not very musical, offers a medium to

transmit a message. By adding harmony parts to this melody, a la Cardon, the message

can be obscured even more from casual observation.
8/9/2019 Steg Final

26/28

In this example, the melody has been placed in the alto part, the second staff of notes

from the top of our song. Another approach would be to embelish our message with

ornamental notes to intentionally distract the eye and ear from tne message.

And, finally, a combination of harmony and spread spectrum distribution of notes across

the staff could be set up to create even more confusion for potential attackers looking for

our message.
8/9/2019 Steg Final

27/28

In this example, the music begins with the first note being placed in the top staff,

the second note in the next staff down and so on until all ten notes have been played. The

placement of notes on the staff begins again with the fifth and ninth being placed in the

top staff, and all other notes cascading across the staves in order to spell out our message.

While this pattern is being spelled out, the bass line, in the bottom staff,, walks steadily

through the same melody, with a few extra notes at the end to make it last as long as the

rest of the song.

These three embellishments represent only a small sampling of the possible tactics

of embedding a secret message within a MIDI file. Other tactics might include an acrostic

approach, where the first note of each measure or phrase held a significant note. An

encryption process could also require a message to be transposed with a shift cipher

before the note substitution ever began.
8/9/2019 Steg Final

28/28

In all of the audio examples presented, there is one important quality that

separates the examples from most hidden communications. As mentioned earlier, most

steganography and cryptographic software and practices hide their information in the LSB

portion of the cover object, since this has little or no effect on the appearance of the cover.

By placing the coded information squarely on the main beats of the song, either by note

values, or by adding extra sound, the information now resides in the MSB area of the song.

It is not randomized noise, but, it has been made a fundamental aspect of constructing

the file. This placement does not exempt the code from scrutiny by computer analysis, but

places it within a context of the composition , and therefore validates its placement,

making it, potentially less suspect as code that was added after the cover object was

generated.

4. Conclusion

It is possible to embed covert communications at several different levels of audio

production that do not adhere to current steganographic conventions. It is uncertain

whether current algorithms can detect these contextual, arbitrary ciphers. They will

appear as binary code that is subject to decryption. However, the key and randomization

variables are derived outside of normal decryption methods.Further study will be

required to determine exactly how robust they are to detection by current methods. Given

these questions, it is still fair to assume that the best played scans of 1s and 0s might still

lead someone else astray from the intended message, and may hint at a new set of tactics

in the field of audio forensics and steganography.