Robert Fustero's Blog


Summaries and Sarcasm

Tiny deadlines

This project was what I like to call sneaky hard. On the surface it looked pretty simple- analyze the dataset and come up with some questions and answer them. But coming up with good questions was more difficult than answering them.


Learning from numbers

This was an interesting project. As a person who likes numbers and earning money, I appraoched this project as an opportunist. I wanted to see if there were effective insights I could gather from the data. The two things I knew about relistate before I started Flatiron were 1. Size matters 2. Location, Location , Location Using linear regression and size as a predictor of housing prices is pretty simple to set up. Price = m x Size + b. Simple, but how do I incorperate the other important factor, location, into this equation? That was the challenge of this project.
The challenge is - location is a category. There isnt a linear relationship between location and price. But location can tell us something else - the AVERAGE price of the houses per zipcode. I got inspired by watching sklearn videos and learning about KNN. I didn’t use KNN - but I made a ranking system based on the average price per zipcode. I binned up my houses into 6 different parts based on the average housing price distribution per zipcode. I then created a pandas series based on the zipcode ranks and added it to my database. I WAS SO HAPPY. Linear regression only works well when the variables are independent. Multicolinearity will screw up your model when you have multiple predictors that are all highly correlated with eachother. My neighborhood ranking system was not highly correlated with housing size - this is what made my model give me a good R2 value. So zipcode was important for the model, but when I looked at the houses resold - it gave me insight into why certain people made money and lost money. The lesson I learned from the data is -If you want to flip a house, it’s better to buy a small house for cheap in a good neighborhood, then it is to buy a big house in a bad neighborhood. A person made over half a million dollars by doing just that - and another person lost 40 grand by making the 2nd mistake.


Why Data Science?

I’m sure my story is somewhat familar to a lot of people here. I originally went to college for music composition. But as my program went a long, I realized this isnt the program for me. I wanted to learn music psychology, music physics, music programming, etc.. but the program was more focused on valid aspects of musicanship and being a music teacher. Nothing wrong with the school, but my mind was onto bigger things.
The Carl Sagan quote about how we live in a world ran by technology barley anyone understands influenced me to get an Assicotes in Electrial Engineering at my communitty college. My original plan was to graduate with a Bachelors in Computer Engineering with a concentration in Digital Signal Processing. Unfortunetly, my university experience was very dissapointing. I won’t get into too much detail - but looking at the quality of service I was getting vs the price I was paying per semester- It was not worth it.
What motivated me the most to switch from a DSP focus to a ML focus was Deep Mind’s Alpha Zero. This is google’s AI which taught itself how to play chess in 4 hours and beat the best chess computer at the time(Stockfish 8). When I learned about reinforcement learning and bayesian networks and markov processes.. I just wanted to keep learning about it. I loved how Fourier analysis came up again as well. Just so many things I found interesting through my academic career and on my own googling came into play in one subject matter. I had a feeling deep down that this is what I want to dedicate my life to. One thing to mention. I came up with this cool math formula in the summer of 2016. It represents a series of ratios that when multiplied together always equal a constant. The cool thing about it is that for the x values 0 - 6.. it shows all the pythagorean ratios (notes in the music scale). I gave a tedx talk about it - but I did more work on it since then and went on to discover it’s fourier series expansion. I would love to do more with it if I can, but for now I’m ready to publish my findings. I’m looking forward to this bootcamp and can’t wait to apply what I learn!