Deep Data
Data science has become a key component of business and research success across industries. This semester, six Conn students were recognized for their own prowess at delving into large, unwieldy data sets with an award presented by DataFest, an American Statistical Association competition for undergraduate students with an interest in data and applied mathematics.
The Conn team—which won in the “best business application” category—was accompanied by Priya Kohli, associate professor of statistics and assistant chair of the Mathematics and Statistics Department.
The contest began when teams from Conn, Wesleyan University, Yale University, Bentley University and Trinity College were presented with a surprise data set and asked to analyze the data before sharing a two-slide presentation with a panel of judges on Sunday afternoon.
The data this year related to PlayForward: Elm City Stories, a video game that was developed to promote risk behavior reduction, and in turn reduce HIV infection rates, among minority youth. Players aged 11 to 14 use an in-game avatar to navigate certain life decisions, see how those choices affect their future, and then decide whether to “go back in time” to make a different choice. The primary data set included information on how long players remained in the mobile game. There was a secondary, self-reported set of data on the same topic.
According to Linh-Chi Pham ’24, a statistics major, the inconsistencies in the data proved the greatest challenge.
“There were a huge number of observations and missing values,” said Pham, who signed up for DataFest after hearing about the promise of working with untouched data. “It took us around 4.5 hours to come up with a metric to transform the time and evaluate the data.”
Kohli stressed that Conn’s team approach was key to success.
“The data was saying that the game was working how it should,” Kohli explained. “But the survey, which was a self-assessment, was going in the other direction. I think most other groups had completely ignored this supporting data set, but we didn’t—and that’s what won us the prize.”
The other winning tactic was the team’s “clarity of ideas and the simplicity of visualization,” she added.
Lindsay Salvati ’22, a mathematics major with a statistics concentration, helped craft a final presentation that was crisp.
“Some other groups had so much information that it was confusing to listen to their presentations,” said Salvati. “We stuck with a few simple ideas and that helped us keep ours clear and concise.”
Salvati, Pham and team members Wenjie Wang ’23, Long Ta ’22, Isabelle Patino ’22 and Theodora Moldovan ’23 used their presentation to tie data related to time spent in the game to survey data about how likely a player was to refuse a “bad decision,” such as an offer of alcohol, in the real world.
“We won because we provided practical design recommendations to the game creators and backed those up with solid data,” Pham said.
For Salvati, the experience will be invaluable as she begins her doctorate in biostatistics next year.
“In grad school, you have to work with messy data and try to figure out what you are going to do with it, so having some more experience under my belt can never hurt!”