KoBoSync User Guide

KoBoSync is a post processor and is used after data collection to synchronize data from the Android devices onto a local computer, and then to aggregate individual survey records into a simple database. The Comma Separated Value data can be imported into many kinds of analytical software packages (R, SPSS, or SAS). The CSV can be viewed easily in Excel or another spreadsheet software. Previously, the Post Processor was a standalone Java application, but it is now packaged along with the the KoBoForm Builder to simplify your development environment.    If you want the standalone version of the Post Processor, it can be downloaded from the KoBo Code Site.   

Mac users should use the standalone JAR file. Download.

Introduction

Whether you are using KoBoSync as a standalone application, or built in to KoBoForm, the usage is the same except for how you launch it. There is a "Post Process" button in the navigation bar of KoBoForm which will start KoBoSync, or you can launch the standalone application from your programs menu.

This app syncs all completed forms from whatever directory you point it to into a directory of your choosing to store files from multiple phones into a single location on the computer on which the app is run. It is recursive so it doesn't worry that ODK puts all the forms into separate folders, you just point it at the top level folder. The XML storage directory will be populated with surveys taken from the XML source directory. The individual surveys are renamed based on the survey instance name, DeviceID of the phone used to collect the data, and the time at which the survey was started to the millisecond. This combination of data is used as a unique key throughout the process of backing up and transcribing the surveys into CSV and allows surveys from multiple phones to be collected into one location without having to worry about losing or overwriting existing data.

From the storage directory, it then aggregates all the records into a single CSV file, and places that file in a directory of your choosing. The application uses the instance name, DeviceID, & Start time data combination to create a unique key for each record within the CSV file. It is smart enough to handle changes in the schema over time, so that if you add a question to your survey, it won't confuse the sync. There will be a new column in the CSV for the new datafield, records collected before that question was added will have a null  in the column for the new datafield.

The Java app can be run from the command line, or you can just double click on the JAR. Either way, there is a GUI. The GUI will remember your selected folders the next time you run it. It works offline with no trouble.

Mac Users: When you mount the SDcard, you will be able to find your XML source directory under

/Volumes/SDcard/okd/instances

Synchronize Data to Computer

We needed a way to synchronize data from the phones without using any kind of connectivity to internet, just a laptop. KoBoSync allows you to go from data collected on the phone to an aggregated CSV containing all the records. Many researchers use SPSS for analysis, so a CSV is perfect for import, and if you need to import more CSVs over time, it is easy to merge the data. For those who do not use SPSS, almost any database or analysis software will happily import the CSV file.

Most users should be able to run KoBoSync by clicking on the JAR file. If you have used the Windows or Unix installer, KoBoSync will be available as a program in your Start menu. If either of these options do not work, any user should be able to run the app from the command line.

There is a link in the sidebar for downloading the app as an installer or a standalone JAR. Please get KoBoSync before continuing.

Start KoBoSync by selecting it in the start menu, or by clicking on the downloaded JAR file.

If you have any trouble, you can run from the command line. In a command terminal type:

java -cp KoboSync_0.93.jar org.oyrm.kobo.postproc.ui.KoboPostProcFrame

This will give you a little GUI. It is titled "Kobo Post Processor". The GUI has five buttons.

Change XML source directory

Use to set the directory where you completed instances are stored. If you have an Android plugged in this will be \SDcard\odk\instances\

Change XML storage directory

Use to set the directory where you would like to copy all the completed forms. I do this so that you can store your completed forms off the phone in case you lose it or it gets stolen. Also, then you can delete them off the phone. App is smart enough not to duplicate what it has already copied previous forms.

Sync XML surveys

This is the button that actually backs up all your surveys from the phone to the hard disk, assuming you have set the directories using the previous 2 buttons. Click it and don't blink, there is a progress bar but this operation is very fast. Now, your forms are backed up to your machine. The storage directory is where the Transcriber will look when you tell it to make a CSV.

Change CSV storage Directory

Use this to set the directory where the CSV will be placed after the Transcriber does it's business with whatever forms it finds in the storage directory.

Transcribe XML to CSV

Just what is says on the tin. Click this and the app will look in the XML storage directory, it will read each completed form, it will scrape out the schema and write headers to the first line of the CSV, then it will write a record in the next line, then it will open each additional XML form and give each one a row in the CSV. The CSV will be stored in the directory you chose for storage. XML files which had been previously written to CSV will not be rewritten to the CSV and so there is no need to sanitize your XML storage directory between runs of the transcriber. If the Transcriber comes across an XML form whose schema is different than the others, it is able to handle that by modifying the schema to include new fields and inserting blank fields for that column in all the previous records. While these headers are updated to accommodate changes to the schema the existing data will not be altered or deleted by the transcription process.

Random Record Generator

I also needed a way to test my system with lots and lots of records. My survey has more than 300 questions, so it is unwieldy and making fake records takes forever. So, to be able to test the system under the weight of hundreds of records, we included a randomizer. You point it at an completed ODK form and tell it to make random records, how many and where to put them, and it churns them out. In a second it will make you 100 fake records. 100 is the default, but you can do more. The data is random strings, longs, ints, & date stamps which are generated by inspecting the completed ODK form to determine the data type for each question.

From the command line:

java -cp KoboSync_0.93.jar org.oyrm.kobo.postproc.test.KoboXMLGen <XMLFile> <DestinationDirectory>

XMLFile should refer to an instance, a completed form as is stored in the ODK/Instances/directory after a survey is completed.

DestinationDirectory should refer to any nice empty folder. It doesn't have to be empty, but this will make 100 files, so it's a good idea to put them someplace.

Example: 

java -cp KoboSync_0.93.jar org.oyrm.kobo.postproc.test.KoboXMLGen ../Instance/CAR_2009-09-27_17-41-53.xml ../test/

MULTI Select with the Post Processor

Here are some instructionsfor taking advantage of the MULTI select behavior:

The new Multi Select handling feature requires that you create a select, the name of which ends "_MULTI_", exactly as quoted. When processed through the CSV transcriber a new column is created for each answer existing in the gathered survey record. The new column will be named with the convention :questionname_answer

So, if you add a question "myquestion_MULTI_" and a record contains an answer set "1 9 7 3" then the CSV file will have column headers reading : myquestion_MULTI_1, myquestion_MULTI_9, myquestion_MULTI_7, myquestion_MULTI_3 If a given survey record contains an answer for one of the multi columns then a 1 is recorded in it's csv colum for that answer. If not, a 0 is recorded. Proceeding with the example we'd see that the CSV file contains something like :

myquestion_MULTI_1, myquestion_MULTI_9, myquestion_MULTI_7

1, 1, 1

If a new survey comes up with other multi answers logged by ODK, then new columns are created and 0s are inserted where appropriate and 1s inserted likewise where apprpriate. So if we get a second survey with answers 2 and 4 we'd end up with :

myquestion_MULTI_1, myquestion_MULTI_9, myquestion_MULTI_7, myquestion_MULTI_2, myquestion_MULTI_4

1, 1, 1, 0, 0

0, 0, 0, 1, 1

and so on.

Catches :

If nobody ever answers a question with a given selection, then that column will never be created. In the above example if valid answers included 1, 2, 3, 4, 5, 6, 7, 8. or 9, but we'd only received the two records previously described, then there would never be created columns for 3, 5, 6, or 8

Finally, the only way to distinguish answers from a select is by relying on the idea that the answers were separated by a space, so, no answers including spaces or else the whole thing will be fucked up. Use any labels you want, display any text to anyone at all, and use answers for other question types with spaces and you'll be fine. But the answers that you code for select statements must not contain spaces.