Maybe you are a hedge fund that wants to identify who might be a good trader. One idea is to hire everyone who applies and let them trade for a year. At the end of the year, you evaluate everyone’s trading performance and fire everyone who is not good at trading.
Is this a good idea?
Well, firing is expensive. The average severance pay at Meta for the 11,000 employees laid off in 2022 was $88,000. Letting people trade who are not good at trading is also pretty expensive. They can lose you $7 billion like a 31-year older trader did in 2008.
What’s a better idea?
I’ve never run a hedge fund but it seems to me that the central aim should be to try and get a sense of whether someone is any good at trading before you give them a lot of money to trade. This could look like a lot of things. Maybe you give them only a little bit of money initially at first. Maybe you’re selective in who you hire. Maybe you have people trade in a simulated environment, and then predict how good they are in the real world. Or maybe all three?
Regardless of the specific approach, it seems reasonable to assume that you’ll want to use data at some level to make a prediction about someone’s performance. Whether it’s using data to assess how important GPA is trading success or exploring the relationship between simulated trading and real world trading. So we’re going to learn how to think about data so that hopefully you can make good decisions with it.
Our initial focus in this class has been on how to get Python code to run. Someone’s given us a list, for example, and they want us to return to them the second element of list. Not a particularly interesting problem, and potentially quite frustrating — why does Python start counting at zero? But we’ve tackled it, and we did so by learning the python syntax: cars[1]
cars = ['Toyota Corolla', 'Honda Civic', 'Ford Mustang', 'Chevrolet Camaro', 'Tesla Model S']
Now getting Python code to run is certainly not an easy accomplishment. But it’s also not our only aim. Ideally we’d like to get to the level where we could also pose questions. Not questions like which element of a list to select but rather questions about what type of statistical analysis to run. To get to this level, to ask good questions about data, it will be beneficial to understand how to think about probability.