by PublicABCP
Translated and reviewed by Matheus Lucas Hebling
The study The Interfaces Twitter Elections Dataset: Construction Process and Characteristics of Big Social Data During the 2022 Presidential Elections in Brazil, published in PLOS One by the INTERFACES – Center for Sociopolitical Studies on Algorithms and Artificial Intelligence (UFSCar), documents the creation of a large-scale database of interactions on Twitter (now X) during Brazil’s 2022 presidential elections.
The article is authored by Sylvia Iasulaitis, leader of the INTERFACES group, in co-authorship with Alan Demétrius Baria Valejo, Bruno Cardoso Greco, Vinicius Gonçalves Perillo, Guilherme Henrique Messias, and Isabella Vicari. The multidisciplinary team—composed of researchers in Political Science, Computer Science, and Information Science—received support from FAPESP under the project Analysis of Large Volumes of Political Data and Complex Networks.
The research spans the pre-election, election, and post-election periods, including the January 8, 2023 events, when protesters stormed the headquarters of the Executive, Legislative, and Judiciary branches in Brasília. The resulting dataset, named ITED-Br, contains over 282 million tweets, making it one of the largest political social data collections in the world. According to PLOS One editors, the dataset is of high scientific value for research in Social Sciences and Computational Politics.
To ensure the preservation and scientific usability of the data, the ITED-Br dataset has been made publicly available on the INTERFACES group’s GitHub, in accordance with platform terms of use. The dataset was released in a “dehydrated” format, containing only anonymized tweet and user identifiers. The “rehydration process”, which allows researchers to retrieve original tweet content through Twitter’s (X) API, is detailed in the accompanying documentation.
The primary objective of the study was to describe the process of collecting and organizing the ITED-Br dataset, built from public Twitter interactions related to Brazil’s main presidential candidates in 2022. For this purpose, the team developed specific data collection strategies, combining keyword searches, profile tracking, and post-based queries.
To overcome technical restrictions imposed by the platform—such as API rate limits—the researchers developed a proprietary algorithm called token farm, which automatically managed multiple academic access keys, ensuring continuous data collection despite these limitations.
Data collection lasted one year, involving the storage and processing of an extensive volume of information, which required the development of custom technical solutions for organization, structure, and analysis. Limited infrastructure posed additional challenges, addressed through the use of open-source libraries and optimized Python programming environments.
According to the authors, working with big social data requires interdisciplinary expertise and a balance between technical and sociopolitical knowledge—conditions essential to extract informational and analytical value from large-scale datasets.
Among the study’s key findings, the authors highlight that the discontinuation of Twitter’s academic API, announced after the platform’s acquisition by Elon Musk, makes it unlikely that future data collections of similar scale will be feasible for research institutions. Based on current API pricing, reproducing the ITED-Br dataset would cost over 1.5 million Brazilian reais, underscoring its scientific and historical significance.
The study also draws attention to the limits of public access to digital data: while Twitter interactions are technically public, transforming them into meaningful information requires specialized expertise and substantial infrastructure.
Author Profiles
Sylvia Iasulaitis holds a PhD in Political Science from UFSCar and is a Professor at the Federal University of São Carlos (UFSCar). She is a permanent faculty member in the Graduate Programs in Science, Technology, and Society and in Information Science, and currently coordinates the Social Sciences program. She leads the INTERFACES – Center for Sociopolitical Studies on Algorithms and Artificial Intelligence, certified by CNPq. Her work focuses on Computational Social Science and Social Data Science.
Alan Demétrius Baria Valejo is an Associate Professor and researcher in the Department of Computing at UFSCar. He earned his Bachelor’s in Computer Science (ICMC-USP, 2012), Master’s (2014), and PhD (2019) in Computer Science and Computational Mathematics from ICMC-USP. In 2020, he completed a Postdoctoral Fellowship at USP (FFCLRP-USP), funded by FAPESP.
Bruno Cardoso Greco works in the fields of Information and Computer Science, with an emphasis on Software Engineering and Information Theory. He is a member of INTERFACES, certified by CNPq.
Vinicius Gonçalves Perillo is an undergraduate student in Computer Science at UFSCar, specializing in Machine Learning and Data Science. He is a research assistant at the INTERFACES group.
Guilherme Henrique Messias is an undergraduate student in Computer Science at UFSCar.
Isabella Vicari is a PhD candidate in Political Science at UFSCar, with a Master’s in Science, Technology, and Society (2024) and a Bachelor’s in Social Sciences (2021), both from UFSCar, with a dual concentration in Political Science and Sociology.
Technical Information
Title: The Interfaces Twitter Elections Dataset: Construction Process and Characteristics of Big Social Data During the 2022 Presidential Elections in Brazil
Authors: Sylvia Iasulaitis, Alan Demétrius Baria Valejo, Bruno Cardoso Greco, Vinicius Gonçalves Perillo, Guilherme Henrique Messias, and Isabella Vicari – INTERFACES Group
Year: 2025
Published in: PLOS One, vol. 20, no. 2
Dataset: ITED-Br, available on GitHub




