optionsScalper

verbose=on, snakeOil=off, pontification=on, humanIntelligence=off

Subscriptions

<August 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456

News

I have been having problems with comments. If you need to comment, please see the contact button at the top of the page.

Navigation

Post Categories

About Me

JJBR

Articles

Milwaukee Bloggers

"Gentlemen" bloggers

GA/GP/EC/ML

Sensible People

F#

Math, NT, GT, TOC

Security Blogs

DirectX/Game Development

Wednesday, May 25, 2005 - Posts

Star Wars III: Revenge of the Math

(warning:  Mathematics silliness ahead)

Ok, so I'm invited by Microsoft to a showing of Star Wars III: Revenge of the Sith in Chicago last Thursday (thanks again guys).  As I mentioned here I enjoyed this movie, but did see a few things that didn't sit well with me.  Here's the biggie:

Anakin Skywalker is "the chosen one".  Everyone knows it, including all of the Jedi.  According to prophecy, he is supposed to bring balance to the force.  HYPOTHESIS:  balance means "same amount of badness as there is goodness".  Let's see if we can't apply a little math (I've got to learn how to do math symbology in HTML with MathML or I can't even attempt to be condescending).

  • Let A be the set of Sith Lords.  Further assume that each element k of A, has an attribute m, representing the Sith's midi-chlorian count.  (Note:  Insert condescending tuple relational calculus here)
  • Let B be the set of Jedi Knights.  Further assume that each element j of B, has an attribute m, representing the Jedi's midi-chlorian count.  (Note:  Insert condescending tuple relational calculus here)
  • Let SA be the cardinal number for the set A.
  • Let SB be the cardinal number for the set B.
  • Let FD be the amount of the force in the universe allocated to the Sith (dark side) and measured as the sum of m for each member of A.  (Note:  Insert condescending sigma notation here)
  • Let FJ be the amount of the force in the universe allocated to the Jedi and measured as the sum of m for each member of B.  (Note:  Insert condescending sigma notation here)
  • Let t mark time, or the temporal aspect, of the model. t0 is the initial state at the beginning of Episode III.  t1 is the ending state at the end of Episode III.

So at the beginning of Episode III (at time t0), we find our model's state to be:

  • SA = 2
  • SB  100 (I don't really know, so I'll guess)
  • FJ > FD

At the end of the movie (at time t1), we find that the model's observable facts show:

  • SA = 2
  • SB = 2 (and 2 infants)
  • FJ   FD

Since SA = SB and FJ   FD, it is clear that we can observe that the final state as described by the term "balance" was met.  What is left to determine is whether the balance that was achieved was causal or coincidental w.r.t "the prophecy".  I'll leave that to you the reader and suggest a look at the Student's T or other statistical measures.  Considering the degrees of freedom, this may be difficult.  As there is a lack of observations on the Jedi data set and their corresponding times of death on an axis with insufficient precision (temporal axis), it may be difficult to apply OLS or other regressions, ARIMA, GARCH (is the conditional heteroskedasticity on midi-chlorian count? or should I be a good capitalist and make a market in options on midi-chlorians?) or other models to this problem.

Analysis

Apparently all of the Jedi in the movie didn't understand that it was a simple math problem.  Yoda even states that the prophecy could be interpreted another way.  While Yoda survives (and could actually introduce survivor bias), it is clear that we must not allow for any subjective interpretation in the model.  No bias allowed.

To move from the initial state at time t0 to the final state t1, one or both of the following conditions must be satisfied:

  1. Jedi must die to reduce the number of elements in the Jedi set (B).
  2. Jedi must be converted to the dark side to simultaneously reduce the elements in the Jedi set (B) and increase the number of elements in the Sith set (A).

These statements can be made under the assumption that a Sith or a Jedi cannot noticably increase their midi-chlorian count at will.

So we find Anakin first converts to the dark side, then says, "Hey, let's bring some balance here.  Let's kill some Jedi."

Yet even in the face of the facts that Anakin will bring "balance to the force", many of the Jedi don't believe that they will die.  Clearly they will die or convert to the dark side.  They, at some point, must realize they are only elements of sets.

Dont' forget that when a Jedi or a Sith dies, their force power just goes back into the universe.  Contrast this with the Highlander model where in death, a being's lifeforce is transferred to the victor in battle.  But that is another math problem.

posted Wednesday, May 25, 2005 8:07 PM by optionsScalper with 2 Comments

Crypto: Steganography - Not a dinosaur version of crypto

When discussing cryptography, i.e. the practice of hiding information, it is common to think of ciphers and public keys and other protective measures.  Steganography is another information hiding strategy.

Recall that in the discussion on parties, there was the "unaware party".  The unaware party was any party who does not know that information exists.  As a cryptographer, this is an ideal situation.  If an adversary is unaware that I have information to convey, there is less risk to that information.  The "closed and determined adversary" will never be unaware, because they believe that all information from an originator is valuable in some form.  How can I in the face of a closed and determined adversary, still convey information that they are unaware of because they capture all of my communications?

Steganography is one answer to this problem.  Steganography is the hiding of information by means, such that, no one other than the receiver of the information is aware of the existence of the message.  So you may ask "Hey Scalper, ummmmmmmm, if someone is already capturing all of my communications (information), aren't they aware of all of the information?"  The answer is no.  Consider the simple example.

I (originator) would like to email the following message to my friend (recipient):

"Don't buy xyz stock today.  Bad news."

I also don't want any one to know of the existence of this message.  Through another means of communications, whether it be in person, on the phone or otherwise, we agree, that we'll transmit any hidden messages of this nature as the low order bit in jpg images that we transmit.

So now I need to send this message.  Lets' assume that I have a jpg of my kid and I want to hide the above message in that picture.  I construct a simple program which takes the ASCII (Unicode or otherwise agreed upon encoding for the exchange) and I take the bits from each character, one at a time, and substitute them into the jpg picture's pixels, one at a time, in the lowest order bit for red, green, blue or some other agreed upon position.  I modify the jpg so that it retains its integrity for display in any other informational fields due to changes that I have introduced.  Because I'm modifying the low order bits of the colors, the image maintains its fidelity (to the naked eye).

With my image modified, I compose the email:  "Dear so-and-so,  Here's my kid in her latest karate match.  She permanently injured her opponent.  See you this weekend.  -O" and attach the picture.

The recipient receives the email.  Without prompting, the recipient runs a program that looks at the low order bits of the pixels of the jpg and attempts to construct a message.  In doing so, the recipient can recover the message.  If there is no message, the results will be garbage.

So in doing this communications transaction, the closed and determined adversary can assume that my kid is a mean karate student and can even see my kid in a picture.  But the adversary is unaware that I'm transmitting another hidden message.

If I constantly send pictures through email to the same recipient, the adversary will likely find this communications to be consistent and will likely not look for steganography.  If on the other hand, I had a large message to send, e.g. source code and binaries for a project, and used steganography to hide the information, it may require 20-30 pictures.  An email with 20-30 pictures in a channel where the typical email has between 0 and 2 pictures would raise suspicion with an adversary.

NOTE:  Any stego that I transmit would be done in clear text in my writings here.  With "verbose=on" there is plenty of room for hiding.

posted Wednesday, May 25, 2005 7:58 PM by optionsScalper with 6 Comments

Bananas for the Honey Monkeys

If you live in an ant-hill or 5,000 feet below the surface of the ocean and you have not heard of the Honey Monkeys, I'll forgive you.  Otherwise, I'll expect that you know of these creatures.  They are socially responsible bots that live on the internet looking for spyware and viruses.  As a user of the internet, I truly appreciate what these guys are doing on my behalf.  Let's introduce the monkeys and then let's have a discussion on how to feed them bananas, i.e. simple EC methods for distracting the monkeys from doing their jobs.  Disclaimer:  I'm not a distributor nor author of spyware or viruses.  I'm not a proponent of spyware and viruses.  I hate this stuff.  I'm a socially responsible citizen (or at least that's where I try to keep my moral compass) in the physical world and on the internet.  Let's further be clear that this information is not a comprehensive document on the work that was done, nor was this information put under any scrutiny by any group other than JJBR (public and termites).

Static Nature of the Monkeys

the crazy one and I actually have a few ECs that measure many things about networks.  We are both TOC guys, i.e. networks have complexities and can be classified.  We thought it would be fun to put some monkeys on the simulators and see what happened.  While we remained unbiased for the experiments, we have agents that act like the monkeys (with evolvable strategies), so we had a pretty good idea of how the monkeys would respond.

A quick review of the monkeys reveals that they perform very simple tasks (observable facts):

  • Acquire URLs from HTML pages.
  • Navigate to the URLs
  • Determine if the URL planted a virus or spyware.  If so, perform some reporting activity of the URL.

Unknowns:

  • Do the monkeys always originate from the same IPs/domain?
  • Do the monkeys always present the same browser capabilities to the host of the URL?
  • Do the monkeys have any limitations on the number of URLs (pages) that they will scan for a particular domain?
  • Do the monkeys have any limitations on the consecutive number of URLs (pages) that they will visit for a particular domain?
  • Do the monkeys have any adaptive behaviors (observable or through induction)?

Since we don't have answers on the unknowns, we'll assume for the moment that the answer is no to each question.

A note on Framing:  A frame is temporal window in a run of an environment of an EC experiment.  A run contains a collection of frames, ordered in time.  A frame contains a record of all activities of the EC for that frame's frame index.  Those unfamiliar with the term decile, it means to break into 10 groups (quantile is the general term).  So when referring to a decile of frames, take the current number of frames, divide it by ten and then collect the frames in each decile.  For an experiment in progress, the only observable fact is the current frame, so the process of decilization can only occur for studies on this fact, e.g. if an experiment is on frame index 820, then the size of the decile is 82; frames 1-82 are considered the first decile, frames 83-164 are the second decile and so on.

Still in the City: No Monkeys, No Banana Trees

What we wanted to observe was whether the static monkeys would provide any advantage in the desired OEB.  The OEB we thought would be useful would be:  the monkeys must sustain a maximum infection rate below 5% (of all vertices) for an indefinite period of time (at least the last four deciles of framing of any experimental run).  As my OEB entry already has some of the basic concepts, let's copy it and expand on it.

Terms

  • Vertex (Vertices) - Each vertex of the graph in the EC represents a computer, enumerated as workstations, servers and bigservers.
  • Infection - A program that runs on a vertex.  The existence on the vertex of one or more infections marks the vertex as infected.  Note:  For these experiments, we use isolated infections, an infection cannot contact another vertex on its own and infect that vertex.
  • Disinfection - A program that identifies a single infection and provides a strategy for removal and protection of the infection.  Used by the disinfector.
  • Infections - Property of a vertex.  Implemented as a collection, infections are placed here.  If the count of the infection property of the vertex is greater than 0, the vertex is infected.
  • Occupation - A vertex must have at least one occupation (work to do and reason for doing it).  Occupations are from the enumeration: Infection Author, Infection Distributor, Disinfection Author, Disinfection Distributor, Disinfector, Page Distributor, End User.
  • Programs - Property of a vertex.  Implemented as a collection, programs are executed by users, i.e. Authors, Distributors, Disinfectors, End Users.
  • Infection Author - An agent that runs a program that occupies a vertex and is run for the purpose of the construction of a new infection.  An infection author occupies only one vertex at a time.  Infection authors carry their programs and may use any vertex marked as workstation.
  • Disinfection Author - An agent that runs a program that occupies a vertex and is run for the purpose of the construction of a disinfection.  A disinfection author occupies only one vertex at a time.  Disinfection authors carry their programs and may use any vertex marked as workstation.
  • User - An agent that runs programs on a vertex, demands disinfection from disinfection distributors and demands pages from page distributors.
  • Page - A unit of data requested by a user delivered by a page distributor.
  • Page Distributor - An agent on a vertex that provides on-demand pages to other vertices.
  • Disinfection Distributor - An agent on a vertex that contains a large number of disinfections and is responsible for distributing the disinfections to other computers.
  • Infection Distributor - An agent on a vertex that acts as a page distributor and contains a large number of infections.  It is responsible for distributing the infections with pages to other computers.
  • Disinfector - A program that runs on a vertex.  This program is responsible for infection identification, proactive infection protection and removal of infections.

A control environment that contains these items is set up and will have parameters set such that the network will be 100% saturated with infections (every vertex has an infection count > 1) at or before frame 10,000.  The control environment is run 100 times.  The mean frame for saturation is frame 9,994.  The last four decile infection mean is 81%.

Note that the population of agents remains constant.  All strategy for this control group is held constant.

Hey, hey we're the Monkeys

Another environment is set up with the same control set in the same experimental EC.  The following is added:

  • Monkey - An agent that acts like a user and spends 100% of it's vertex time demanding pages.  If a fulfilled page delivery results in an infection count increase on the monkey's vertex, the monkey notifies the police station.  One note: the monkey receives its vertex and page(s) information initially from a pages from "probabilistic infection distributors" and then from the last demanded unvisited page as determined by the current page.  If a monkey has no more pages to search, it either sits idle, or is awoken with a new list from another monkey.  Monkey communication is enabled.
  • Police Station - Message manager that receives notification from monkeys and keeps state of a collection of police units.
  • Police Unit - An agent that when notified visits a node and if present, removes and destroys an infection author or an infection distributor.  A police unit has statuses: idle, in transit, destroying.  Exactly one frame is required for "Notification to destroy" message to be sent from the station to a police unit.  Exactly one frame is required upon receipt of "Notification to destroy" to travel (in transit) to a vertex.

Everything else is held constant from the control group and this group is labeled the monkey group.  This group is run 100 times.  The mean frame for saturation is not measurable (through frame 100,000).  The last four decile infection mean is 61%.  No saturation and lowered four decile infection mean, i.e. the monkeys reduce the infections considerably.  We are not near the 5% OEB that we had hoped for, but we could try to improve the monkeys.  Without a full discussion, the monkeys were not able to improve the four decile infection mean to a value below 50% without significant bias in the model.  At this point, we have a failed hypothesis for the OEB.  Let's see if we can't make matters worse.

Knock knock; who's there; banana banana

Another environment is set up with the monkey group in the same experimental EC.  The following is added:

  • Banana Tree - A program on a vertex that is a page distributor.  The banana is a vertex page pair that points to a banana program.
  • Banana Program - A program on a vertex that generates fake vertex page pairs and allows for delivery of fake vertex page pairs.

Everything else is held constant from the monkey group and this group is labeled the banana group.  This group is run 100 times.  The mean frame for saturation is 66,442, i.e. the monkeys reduce the infections considerably, but the network still becomes saturated.  Also notice that the police are incapable of recognizing or shutting down a banana tree.

Toxic banana trees

Another environment is set up with the banana group in the same experimental EC.  The following is added:

  • Authors are allowed to spawn, i.e. evolvable strategy for authoring of infections and disinfections (in equal amounts) that "jumps" from agent to agent.  An author can "infect" another agent on the network with authoring skills.
  • Distributors are allowed to spawn (asexually reproduce) and move from vertex to vertex.
  • Vertex creation is allowed at probabilistic rates of growth with probabilistic programs and agents at initialization.  A vertex will always be created without infections and with disinfections.
  • Infection distributors are allowed to spawn vertices with banana trees that share vertex page pairs for page delivery, i.e. a banana tree can reference another banana tree for a contained page destination.
  • Monkeys are held constant, because they are static.

Everything else is held constant from the banana group and this group is labeled the Toxic group.  This group is run 100 times.  The mean frame for saturation is 3,019.  The last four decile infection mean is 86%.

Conclusions

The honey monkeys are a great idea.  When adaptive behaviors are introduced, the static monkeys become less effective, as expected.  This likely means either:

  1. All other experimental data must be held constant for effective honey monkeys.
  2. The honey monkeys must become adaptive.

Let's hope we hear some goodness from the researchers that brought us the monkeys.

posted Wednesday, May 25, 2005 7:45 PM by optionsScalper with 7 Comments

Powered by Community Server, by Telligent Systems