“Quarantine, a time some people might view as a downtime, really was used for creativity and advancement of knowledge for us because that’s when the lab was developed,” said Megan Stubbs-Richardson, an assistant research professor at the Social Science Research Center. (SSRC).

The Data Science for the Social Sciences Laboratory (DS3), one of the SSRC’s newest labs, is working with that creativity and knowledge to build a multidisciplinary team ready to explain problems from different viewpoints and technologies as they approach their first projects.

At the SSRC’s retreat in February of 2020, discussion of developing the lab began. Stubbs-Richardson and John Edwards, an associate research professor and director of the SRCC’s Wolfgang Frese Survey Research Laboratory, led the charge of bringing in mentors, students, and colleagues from computer science, sociology, public administration, criminology, and other disciplines. The team received its first funded project in July of 2020.

Coming from different disciplines we have a unique way of looking at the same idea,” said Edwards.

Stubbs-Richardson also notes the benefits of having seasoned researchers like SSRC Director Art Cosby, along with student researchers, and how that builds the dynamics of the group.

The students are equally creative to all of us, and so, they bring their own fresh ideas to the project. For example, Shelby Gilbreath, an undergraduate student, is our hashtag trending expert for our COVID project. She keeps track of all things related to COVID and lets us know the latest hashtags and information across demographics,” she said.

Edwards had interest in the work, partly because he had participated in a previous lab at the SSRC working with social media and machine learning. The Innovative Data Laboratory was in many ways a forerunner for the DS3, collecting social media data and using it to identify trends or aid communities.

At the time we were looking exclusively at Twitter data, and we were one of a select few research institutions doing that. We received a grant from the National Oceanic Atmospheric Administration that ran from 2014-2015. My thought all along has been ‘how can we do some good with this data?’,” said Edwards.

He saw how to enact good when using machine learning and social media with their work five years ago. The team collected tweets from people who didn’t evacuate during Hurricane Sandy, and the group at the time programmed computers to search those tweets for images with damage. From the images, the team learned that those residents were identifying damage publicly before first responders could even get to the area. Many times, the tweets were so exact the team knew precisely what street or address had a fallen tree or other major debris. They saw how analyzing these tweets could aid first responders in getting to the hardest hit areas first.

Stubbs-Richardson also sees the insights open source data can provide, especially when combined with data science approaches.

We’re taking a big data approach with machine learning. More computer science methods that can take in huge amounts of data and make sense of it rather quickly,” she said.

The DS3 hopes to grow projects that will work with social media, machine learning, and other big data sources. Currently, they have a collection of data from Twitter geolocated tweets that the team has been amassing since 2015. Researchers have used the tweets on work concerning Hurricane Sandy, the Flint Water Crisis, and other societal and cultural issues.

The researchers see the opportunities within open source data like social media in part because of the instantaneous nature of many of the platforms. As Edwards explains, with a survey you can get targeted direct answers, but social media opens the door to thoughts the audience might not share in the more formal setting of a telephone-based survey. Both are necessary but can give different insights based on your research needs.

The technical members of the group write programs and algorithms to pull posts from social media sites that use key phrases, words, or locations that may be of interest. The team is currently researching emotional expressions as tied to various social institutions such as family, economy/work, government, education, religion, and healthcare associated with the COVID-19 pandemic across ten to fifteen different social media platforms for the National Science Foundation (NSF) RAPID grant (Analyses of Emotions Expressed in Social Media and Forums During the COVID-19 Pandemic). To accomplish this, they used the data pulled from the platforms then research assistants like Shelby Gilbreath and others with the team sort through the posts to categorize them.

We have for starters all of the geolocated tweets since 2015 and a lot of what we can do with that is link some community factors to the geo coordinate level, and they can assist us in explaining public opinion or whatever topic we’re approaching at the time,” said Stubbs-Richardson.

In addition to the Twitter data collection that has been in place for years, the team’s programmers take an application programming interface, an endpoint made available by various websites for programmers to connect to the platforms. With this connection, programmers write code to pull data from the platforms.

Platforms are interested in giving you the application programming interface because they don’t want you scraping data off their platform and slowing down both the website and your system,” explained Stubbs-Richardson.

Stubbs-Richardson adds that one aspect of the current NSF grant is that it is a RAPID project, meaning the team is analyzing the data of a current event, the pandemic in this case so that it might help people better intervene in the event.

We’re really trying to capture all of the emotions that are out there related to coping and not just the negative like panic and fear or anxiety, but we’re also getting into some positives people are experiencing,” she said. “We’re building a data set that captures all of those emotional responses and then looking at how it’s associated with the different institutions.”

She continues that they are seeing how the pandemic may be affecting food concerns for those who have unstable employment or if families are experiencing additional stress as they are spending more time at home and indoors. These may all be part of the final dataset they produce over the next few months.

The issues of stress and security are common themes in Stubbs-Richardson’s work, and she and others in the team are also looking at sentiments toward police officers and racial tensions right now following many of the current societal events.

We all continue to advance our skill sets in the methods and different data analytics that we can apply to this large data because there will always be some kind of social problem that we can approach as a group,” said Stubbs-Richardson, “and we have the perfect environment because of all the different disciplines and the creative energy that each faculty, staff, and student brings to the mission of the DS3 laboratory.”

For more information, visit ds3.ssrc.msstate.edu/.

Digging into Big Data: How the SSRC’s newest lab is using open source data and computer science
Skip to content