Saturday, November 17, 2012

Examining suspected voter fraud in Chicago


I read an interesting article today suggesting that there could have been some fraud in the Presidential election taking place in Chicago and Philadelphia.  Each of these cities reported double-digit precincts where Romney received zero votes.  This seems rather odd, and it has incited some people to decry voter fraud in these precincts.  Some have responded by saying that there are probably plenty of precincts where Romney received all of the vote.  I decided to see if I could find any information about precincts where Obama received zero votes.

I did find some.  This article reports there were a few precincts in Utah where Obama receive zero votes.  The article gives us some good information about a few of the precincts where Romney won by a score of 14-0, 17-0, and 14-0.  The article also mentions that in some counties Romney received more than 84% of the vote.

Since this has happened to both candidates, it must be a statistical anomaly; one of those strange things that looks weird at first, but makes perfect sense.  Clearly there is no voter fraud in Chicago, or Philadelphia, or Cleveland.  Or is it?

The difference between the Utah precincts and the Chicago precincts is the number of people voting.  What is the probability that Obama would receive zero votes in a precinct of 14 voters if he has a 16% chance of getting a vote in that county?  I asked R.  dbinom(0,14,.16) = .087.  An 8% chance is hardly a freak coincidence.  Overall, Obama took 27.8% of the vote in Utah, so the probability of him getting 0 votes in a precinct of 14 voters is almost right on 1%.  And according to the Utah election results, there are 2332 precincts in Utah.  I don't know how many of them have 14 voters, but considering that most of the state is rural it doesn't seem outlandish to think that there are a lot.  And if there are at least 300 such precincts then it would be downright expected for us to find 3 precincts where Romney collected 100% of the vote.

Now let's look at the 15th precinct in Chicago's 7th Ward where Obama won by a score of 526-0.  It is reported in the article that Obama received 98.6% of the vote in that Ward.  What is the probability of getting all 526 votes in a precinct if that is the case?  dbinom(0,526,.014) = 0.00060153 or about 6 one-hundredths of a percent.

The 17th Ward in Chicago went 98.3% to Obama.  What is the probability that the 38th precinct would go 576-0?  5.138381e-05, about 5 one-thousandths of a percent.

How about that 3rd precinct in the 27th Ward?  That ward went 89.3% to Obama and the precinct went 381-0 in favor of Obama.  What is the probability of that?  1.881074e-19

So I don't think that the fact that Romney won 100% of the vote in any of those Precincts in Utah is nearly as extraordinary as what Obama managed to pull of in Chicago.  It's interesting to see how the number of voters can make such a big difference in the likelihood of each of these events happening.  When you've only got 14 voters, pulling of a shutout isn't all that spectacular.

Wednesday, November 7, 2012

Evidence-based spending decisions?


You know that I'm a big supporter of evidence-based risk management.  And since risk management is basically a decision support system, I'm really advocating for evidence-based decision making.  This is an example of something I was thinking about that doesn't have to do with information security, but does involve making a decision based on evidence.  It's also an excuse to play around in R.

I came across an article tonight about the relationship between spending on education (per-pupil) and a state's overall rank in education.  Often times I hear that my state, Minnesota, was one of the best states for education in the nation and that now we are slipping.  And I often hear that the solution to the problem is that we should spend more money on education.  But a few weeks ago I was wondering if that is really true.  Should we spend more money on education so that Minnesota can be the best in the nation?

The way I see it, states (like people) can be good at some things and poor at other things.  It would be very expensive for us to try to be the best at everything, so we need some way to decide which things we're going to be the best at and which ones we're going to try to suck less at but not compete for #1.  My idea was that instead of looking at per-pupil spending alone, we should look at state rank in relation to per-pupil spending and see if there is any evidence that we're better at it than other states.  But I was busy and I was in the car so I didn't get around to doing it.

Well tonight I read the article I mentioned which talked about the relationship between per-pupil spending and state rank.  The link is here: http://nationaljournal.com/thenextamerica/education/analysis-how-much-states-spend-on-their-kids-really-does-matter-20121016

After I read the article I went to kidscount.org, which seems to be an initiative of the Annie E. Casey foundation.  There were some things that pleasantly surprised me and some which disappointed me.  For example, I was impressed to se that I can look up a variety of indicators about state education ranking and download them in CSV format for easy number crunching.  I was disappointed that while we can find a 2012 ranking of each state's education quality, we can only get 2009 per-pupil spending.  But I decided to work with what I had rather than hunt down another data source because I was feeling lazy tonight.  So I took a couple spreadsheets, combined them together and then ran this R code to look at the data

edu <- ducation_rank.csv="ducation_rank.csv" header="T)</font" read.csv="read.csv">
reg <- edu="edu" font="font" lm="lm" rank2="rank2" spending="spending">
plot(edu$spending,edu$rank2,xlab="Spending",ylab="Rank")
abline(reg)
identify(edu$spending, edu$rank2,edu$Location)

I picked out a few points to label and came up with this graphic.


One thing worth mentioning is that I reversed the states ranks because I wanted the regression line to point up instead of down.  I just think it looks prettier that way.  So the state ranked last (50th) got changed to 1 and 1 got changed to 50.  The regression line shows basically how much a state's rank should move in relationship to the per-pupil spending.  Some states, like Texas and Kentucky, are right on the regression line which means they are getting the average bang for their education buck.  Then there are the states that are above the line.  These are states that are ranking higher than what their spending suggests they should be.  In other words, these states appear to have some competitive advantage which allows them to get higher quality for fewer dollars.  These are the states that we should be looking to for ideas on how to improve education everywhere.  The distance between the line and a state's point shows just how much better they are.  Colorado, for example, is way above the line.  So whatever they're doing in Colorado, Massachusetts, and (yes) Minnesota they are doing very well.  States like Alaska and West Virginia, are very far below average.  West Virginia spends nearly the same amount of money per-pupil as Pennsylvania and yet has one of the worst rankings.  These states can pour more money into education, but the graphic suggests that there are efficiency problems and that they would have to put a tremendous amount of money into education if they want to move up in the standings.  These states should probably put money into researching why they are so inefficient.  Just looking at the states I can't help but wonder if population density is part of the problem.  Maybe Alaska spends a lot of money getting teachers to really remote parts of the state, for example.

So should Minnesota spend more on education?  I think the data says that we should if being the number one state is really something that we want to do.  If we want those bragging rights then we can probably achieve it because we appear to have a competitive advantage over other states.  But if Colorado decides to get serious about its rank it could potentially dominate in the standings.  Colorado's competitive advantage is strong enough that they could take Massachusetts' rank while spending Delaware's dollars.

Sunday, November 4, 2012

My personal experience with the fog of passion


A couple years ago I got conned into working on a political campaign.  This wasn't my first time volunteering on a campaign, but it was the first time that I worked with someone who I didn't know personally from before the campaign.  We worked our butts off, put in all kinds of time that I'll never get back.  Sadly, though, my record of having never won a campaign still stands.

About halfway through the summer I got curious about some things and I decided to try to model the outcome of the election.  I set up my spreadsheet with columns for each of the precincts that would be reporting.  Then I dug through the Secretary of State's website to get information about the minimum and maximum number of voters in each precinct and how much each one favors one party over the other.  My approach was very similar to the process that Nate Silver has used to make his now-famous forecast that there is a 70% chance of Barak Obama winning a second term.  And my results back then were very similar to what Silver is reporting today.  My candidate was probably going to lose.

The reason I'm thinking about this today is because of an article I read about betting on the outcome of our models.  The article can be found here: http://marginalrevolution.com/marginalrevolution/2012/11/a-bet-is-a-tax-on-bullshit.html  

The article has a profound thought that I had to copy up to Twitter about a bet being a tax on bullshit paid by the bullshitter to people with genuine knowledge.  But right now as I'm writing this I'm more taken in by this quote: "A blind trust bet creates incentives for Silver to be disinterested in the outcome but very interested in the accuracy of the forecast."  The idea is that person making the bet should be disinterested in the outcome in order to ensure that passion doesn't start to make the odds seem different. It's like our passion about the outcome forms a fog that makes it difficult to see the mistakes we're making or the biases we're introducing.

After I saw that my candidate was about to lose something inside of me knew that I had wasted a lot of time, and that if I didn't quit the campaign I was going to waste a lot more time.  I also knew that it would be very unprofessional to quit the campaign so I was pretty much stuck wasting my time.  And I was also a little disappointed because I didn't WANT to lose.  And then something happened; I added another parameter to the model.

You see, that year the conventional wisdom was that people from our political party were going to do much better than usual.  So I added a variable which was how favorable our party turnout was going to be.  Then I correlated that variable with our party turnout in each of the precincts.  So if my random variable came up that we were going to have a good or very good year (which was more than half of the time) it would adjust the parameters for each precinct to give us a more favorable outcome.  I ran my simulation again, and things got better.  But we were still most likely going to lose.

Then I added another variable, this time to reflect how hard we had worked in a precinct.  Surely if we were working hard in a precinct we would get a bump there too.

On and on this went, until I was exhausted.  And at some point I realized that my projections had turned to pure shit.  I had skin in the game; I was not a disinterested better, and I needed to find a way to justify my bet.  So I kept skewing the model to make it more likely that I hadn't wasted a bunch of time.  

I never shared my projections with the candidate, and when election day came we got kicked about as hard as my original forecast had suggested.  In a sense I could have been right if I had just stopped with my first model.  And since I never went on to publish or promote the monster that my forecast mutated into, I was never wrong.  Instead I was almost right and almost wrong.  But I think this story illustrates the value of getting parameters and projections from dispassionate people that don't have any personal interest in the outcome.  

Sunday, October 21, 2012

On Security Evangelists and Thought Leaders


A couple days ago at work a few of us were debating whether a person was a "thought leader" or a "security evangelist."  It seems to me that most of the time when I hear someone use one of those words they are using to describe someone that has a lot of fans but is not well-liked by the person using that term.  But the conversation was interesting because we were discussing what makes a person an actual thought leader.  I had a few ideas that I'd like to share with the Internet.

I think defining a thought leader is actually pretty easy.  Lots of people will say they are thought leaders or will have others call them thought leaders.  But in my opinion, you're not a thought leader unless you meet these criteria that I'm going to lay out.  First of all, you can't be a thought leader unless you have original ideas.  Now I'm not saying that none of the ideas can be derivative, but your conclusions should be your own.  If I'm just rewording and repeating everything that Wade Baker says then I'm not a thought leader.  So thought leaders have thoughts, and those thoughts are their own.  Thought leaders also need to have followers.  You can't lead if nobody is following.  So when Wade Baker talks about Evidence Based Risk Management and I say "ERMAGERD!  He's totally right" then I'm a follower of Baker's.  If you get enough followers and enough original thoughts then you're a thought leader.  People that meet my criteria to be called thought leaders are rare, but they do exist and should be recognized as such.

Notice that I didn't say anything about the quality or correctness of your thoughts or followers.  I don't think that being right is a requirement for thought leadership.  You can be a bad thought leader.  The fact is we need to have thought leaders that are saying a variety of things some of which are destined for failure if we're going to have a security ecosystem that produces good work.  This is one of the main points of a book that I read recently called "Adapt: Why Success Always Starts with Failure."

Further down the hierarchy we have "security evangelists."  For me, the evangelist serves a vital role for the thought leader.  It is the job of the security evangelist to bring the ideas of the thought leader to the masses, just like a religious evangelist brings the gospel to remote parts of the world.  Security evangelists may have many people that listen to them, or just a few.  When you go to the application developers at work and start talking to them about how to integrate security into their development lifecycle you're being a security evangelist.  And in some offices the work is just as dangerous.  There are application developers that will kill a security evangelist for coming to their village.  Security evangelists are typically evangelizing a message that was created by a thought leader.  A thought leader can be his or her own evangelist, but needs to get some other evangelists to prevent becoming irrelevant and not have enough followers to be called a thought leader anymore.  So the security evangelist is not usually preaching his own ideas, but a good evangelist will know how to craft the message to the audience.  It is fairly common for a security evangelist to be credited as a thought leader.

Now I know that some people loathe the term "security evangelist."  There is an article by Bill Brenner [1] where he talks about the gut wrenching feelings that some people get from the word, in part because information security is not a religion and we shouldn't be using religious terms like evangelist.  Kevin Riggins said very nicely that you didn't write any of the Gospels [2].  Kyle Maxwell told me that the term really rubs him the wrong way.  I get that, I really do.  But I think it's kind of like arguing about the terms "Hacker" and "Cracker."  I know a lot of people really wanted cracker to win out when describing malicious computer users, but it didn't.  Everyone uses hacker and we have largely come to accept that.  I think that security evangelist is winning the war in terms of what words are being used and we should just accept that and move on.

Next up we have the security practitioners.  These are the people that are just working their job and trying to make things a little better in their organization.  They listen to the evangelists and try to decide which ones they're going to pay more attention to.  They are going to try out the ideas that the evangelists bring to them and either accept them or reject them.  If an evangelist is rejected by too many practitioners then the evangelist may stop evangelizing for that thought leader.

What happens to a thought leader that can't get enough evangelists and followers?  Or thought leaders that lose all of their followers because their ideas didn't work out?  Well they have several choices.  A thought leader can continue to hold on to the idea serving as his own evangelist while fewer people pay attention to him.  At that point he becomes a kook.  Or, the thought leader can invent a new thought and start recruiting a new group of followers.  And of course some of them will just go back to being practitioners or evangelists.

So there you have it, my definitions for thought leader, security evangelist, and practitioner.

Friday, July 20, 2012

git branch is ahead of master

I ran into this problem today.  Not really a problem, but a bit of a head scratcher.  I wrote some code pushed it to github and then went over to a production server and pulled the code from github.  Then I did a git status and it said that I was 23 commits AHEAD of master.

So the obvious answer is that I must have committed code 23 times in this directory and not yet pushed it up to github, but this directory is only used for pulling.  I wouldn't write code in the production folder.  A quick git diff origin/master seemed to show that almost all of the code in my folder was going to be pushed to github if I did a push.

I needed to run git fetch.  I did some searching on the Internet and found where someone said that my refs could be out of date.  I don't know what a ref is, but I ran git fetch anyway.  That caught my refs up because git status then said I was up to date.

Tuesday, July 17, 2012

Programming languages as past relationships

This is an odd thing to talk about, but one of my coworkers got into a conversation with me about programming languages from our past.  We started to use a metaphor to describe each of them, that being what kind of relationship we had with each language.  So I thought I would share the list of programming languages I've known and what kind of relationship we've had.  Just for fun.

LanguageRelationship
PascalThe girl I fooled around with in high school that wouldn't let me get past second base.
C++The girl I dated in college. We had some good times and we still see each other from time to time, but we've both moved on.
PerlThat crazy chick I hooked up with a couple times. I tried to avoid her, but she left some of her stuff at my place.
PythonMy first love. It didn't last and I'm with somebody new now, but I'll always have fond feelings for her.
SharepointI know, not a programming language per se, but this is my blog post so it's going in here. Sharepoint is the annoying girl that I never wanted to go out with but I also never had the heart to tell her that it's never going to happen.
Ruby/RailsThe love of my life. We're going to raise kids together and be happy forever.
VB ScriptThe dude that sits in the cubicle across from me farting and picking his nose all day. I'm that turned off by VB Script

So there you have it.  As I think of other languages I might go back and fill this in.  Of all these Perl is the only one I hope I never see again.  That bitch will stab you.

Thursday, May 31, 2012

And now for something new

I'm a little late to bring this to the blog, but about a month ago I left my cushy job at Minnesota State University, Mankato to join the Verizon Risk Team.  I guess I haven't really left MSU though because I am still going to teach a class for at least one more semester.

For me one of the most exciting things about a new job is the incredible new things that I learn.  When I came to MSU my understanding of linux, scripting, and development grew by leaps and bounds.  It's probably too early to know what I'm going to latch onto with the new job, but there are a lot of opportunities.  Hopefully after I've learned something cool, I'll have more to put on the blog.  I try hard not to write updates unless I have something fresh and interesting so it might be a while.