November 20, 2019
Greg Faletto is a research assistant and PhD student in the Statistics group of the Department of Data Sciences and Operations at the University of Southern California Marshall School of Business. Greg’s research broadly focuses on feature selection, machine learning, and applied statistical methods for business and social science. Over the past decade Greg has developed original data science models for companies like Live Nation and ZipRecruiter, and his team won “Best Model” at the Orange County R Users Group Hackathon 2019 for their predictive model that correlated regional health outcomes with the presence of water pollutants in California.
1) What does data science mean to you?
To me, data science is about developing models to predict outcomes. That could mean predicting whether a new medicine is effective, what sales revenue will be next month for a retail store, or whether a government program successfully reduces homelessness. Data science can help not just with making accurate predictions, but also with understanding which data is important for predictions and how the predictors affect the response. Creating visualizations, working with data pipelines, and countless other roles are important to data science too, but turning data into predictions is my favorite part.
2) What do you think is the biggest challenge facing data science today?
We need to understand algorithmic fairness. Algorithmic bias can come about when machine learning models are trained on biased data. For example, Amazon developed an algorithm to screen resumes. Most of the examples of successful employees that the algorithm saw were men. The algorithm filtered out qualified female candidates, because it mistakenly correlated being male with being a successful employee. (Amazon acted admirably and discontinued the program. They never used the algorithm in hiring.) We don’t yet have clear, universal, actionable standards for algorithmic fairness. We need them to ensure fair outcomes and build societal trust in data science.
3) How did you get started in the data world?
I started by taking a popular, free online data science class. There are a lot of great free online resources for people looking to get into data science. I had taken AP Statistics in high school and several college math classes, and I was interested in data science after reading books and blog articles by data scientists like Nate Silver and Christian Rudder. Eventually I downloaded R and saw that these kinds of models weren’t so far away and scary—they are freely available for anyone who is interested and has a computer with internet access.
4) What is your work day like at work?
I’m a Ph.D. student in statistics, so my days usually involve class, research, or both. The classes I take cover traditional probability and statistics, but also modern data science. One of my current research projects has to do with choosing which predictors to include in your model when you have many available, but many of them are unimportant to the response (“feature selection.”) The other is an applied project developing a model for a tech company. My days involve a lot of reading, coding, doing math, or talking to people about all of this, which I love!
5) What is your favorite childhood toy? And why?
My favorite toy as a kid was the first video game system I ever got, a Nintendo 64. I had a ton of fun exploring other worlds and playing games with friends. I don’t have as much time for video games these days, but I can still hold my own in Super Smash Bros.