Beyond the technology

Improving qualitative data collection: KoboToolbox’s advanced features for transcription, translation, and analysis

The challenge

To rapidly assess the scope and scale of needs in affected communities during an emergency, data collectors in the field often have to prioritize quantitative data over qualitative data. The urgency of a crisis requires quickly gathering crucial information from as many people as possible. Although qualitative data can provide more detailed answers and more nuanced insights, the resources needed for its collection and analysis in large volumes are often lacking.

Data collection practitioners regularly face the choice between quantitative and qualitative data, trading breadth for depth in survey coverage. Multiple choice questions allow us to rapidly collect basic information from a large number of respondents, while open-ended questions capture more complex details. However, recording responses by hand or transcribing audio recordings is time intensive, limiting us to collecting qualitative answers from only a handful of respondents. As a result, we have less coverage and representation than can be achieved with quantitative data.

Similarly, when qualitative methods are used, a significant amount of the data collected is never analyzed due to the resources required. Crucial information is therefore missed, and valuable insights are lost. This limits our understanding of complex issues and lessens our ability to make an impact. Given the limited resources available for collecting and analyzing large amounts of qualitative data, organizations typically opt for the wider scope of quantitative data when faced with the dilemma of choosing between rich qualitative insights from few or broader quantitative data from many. In aiming to respond efficiently to the needs of affected communities, immediate numerical data is prioritized. Still, most practitioners agree that qualitative data collection provides valuable insights on complex topics, including the detailed information needed to better target interventions and understand the experiences of affected communities.

The solution

At Kobo, our team has been working to address this problem head-on. We’ve developed new natural language processing tools and analysis features to overcome the challenges of collecting and analyzing qualitative data. Integrated seamlessly into the KoboToolbox interface, users can now leverage these language processing functions to collect, manage, and analyze qualitative data more effectively. These intuitive tools include automatic speech-to-text transcription, machine translation, and qualitative analysis features—for improved data quality, greater efficiency, and richer insights.

Use automatic transcription to efficiently produce full transcripts of audio responses.

Features and functionalities

KoboToolbox’s language processing tools offer significant benefits for organizations working with qualitative data. These tools use speech recognition technology to turn audio into text and machine translation to translate text into over one hundred languages, with the added capability of using advanced features to perform qualitative analysis.

Key features and functionalities include:

  • Enhanced audio recording: Capture more detailed, nuanced data with features for recording audio responses to individual questions and conducting full interviews.
  • Automatic transcription: Quickly create complete transcripts of audio recorded responses for more efficient qualitative data processing.
  • Machine translation: Produce full translations of transcripts in different languages using automated translation, or use the manual feature for languages not yet available.
  • Wide language coverage: Automatically transcribe audio from 72 languages, including 138 regional variants, and use machine translation for 106 languages.
  • Advanced editing features: Improve data quality with intuitive editing features for reviewing transcript accuracy and making corrections as needed.
  • High volume data processing: Easily transcribe and translate thousands of audio responses using integrated speech recognition and machine translation.
  • Advanced qualitative analysis: Create tags to identify themes and efficiently produce response summaries to generate insights from qualitative data.
New features allow you to easily review and edit automatic transcriptions.

Getting started with automatic transcription and automated translation

With our new language processing features, you can easily transcribe audio files collected in your surveys and then translate them into multiple languages. To get started, simply choose the audio response that you want to transcribe and begin the transcription. Once audio files have been transcribed to text, you can quickly translate them into different languages with the automated translation feature. You can also add multiple translations of the same audio response, improving access to data for multilingual teams and allowing for multiple target languages. The original transcript and any translations are then added to the data table as new columns and are readily available for downloading.

The automated translation feature allows you to easily select the target language and rapidly produce translations of your transcripts in over 100 languages.

Enhanced qualitative data analysis

Our team has also developed advanced features for analyzing qualitative data collected with audio questions. These new features help to streamline the process of coding and summarizing in-depth qualitative data. Users can easily create tags, summaries, and preset data assessment questions to identify themes and patterns in open-ended responses. With the enhanced qualitative analysis tools, users will be able to produce detailed results more effectively.

These analysis features drastically improve efficiency and give us accurate real-time qualitative insights on complex topics. In less time and using fewer resources, users can reliably categorize responses and analyze larger quantities of qualitative data to produce rich insights, so interventions can make an even greater impact.

Create tags, summaries, and assessment questions to streamline the analysis of open-ended responses.

Why it matters

Advanced functions for qualitative data can transform the work of social impact organizations. By making it possible to collect and analyze qualitative data from a greater number of respondents with greater efficiency, these tools offer a solution that supports both breadth and depth in survey coverage. The integration of transcription, translation, and analysis allows data collectors to capture rich qualitative data from thousands of respondents. The ability to quickly create transcripts and translations in hundreds of languages further streamlines the data analysis process.

With these tools, we can achieve the benefits of rich, detailed qualitative responses collected from a larger number of respondents. Simultaneously collecting more nuanced and diverse data becomes feasible, even with limited resources. These improved capacities for collecting and analyzing qualitative data help us produce more accurate assessments of complex realities on the ground. With these insights, we gain a better understanding of the needs and priorities of affected communities, which leads to more effective social impact initiatives.

What’s next

The automated language processing features for collecting, transcribing, and translating qualitative data are just the beginning. To further enhance these new tools, our team is developing integrated open source Large Language Models (LLMs) to support transcription and translation for more languages. This includes languages not currently covered by commercial speech recognition providers that are critically needed for data collection during emergency response efforts. We are also working with LLMs and other language processing methods to develop innovative new tools for automatic qualitative analysis that will further advance our ability to categorize and summarize open-ended responses at scale.

Learn more about using the new features for qualitative data analysis and transcription and translation.

Note: These features are not yet available to all users. We are currently seeking beta testers for these features. If you’d like to join the beta testing program, please send us a message with your username, the name of the server your account is on, and a brief description of the project you would like to use these features for. We are very interested in hearing user feedback about these new features, so thanks in advance for testing them!

Join the effort for inclusive data collection

This is an example of the work we do and the impact we enable. We believe high quality data collection tools can change the lives of millions for better, and we work hard to make that possible.

But we can’t do it alone.

Help us to continue making high quality data collection tools accessible to everyone—especially under-resourced organizations. Support our work—and their impact—with a donation amount of your choice.