Mind the Mode

Settling the (Food Consumption) Score in South Sudan

POC3_Nektarios_Markogiannis

POC 3
Photo: UNMISS/Nektarios Markogiannis

For the second installment of our ‘Mind the Mode’ series, we’re taking you to Juba, South Sudan, where we previously conducted a mode experiment. What we wanted to see was how food security indicators compare when data is collected face-to-face and through operators over the phone.

South Sudan is a complex setting for mobile surveys to begin with. The country has low cell phone penetration- it’s estimated to be only 20%. Network quality is a problem, often calls don’t go through or audio is poor.  Last, but not least, the country has been extremely unstable. While we have been using key informant phone interviews to date, we are investigating the feasibility of conducting phone surveys to collect household food security indicators. Given the complexities, starting with a test to evaluate biases related to survey mode seemed prudent.

Methodology

The mode experiment took place in “POC 3”, a Protection of Civilians (POC) camp in Juba near the main UN compound. POC 3 is the largest of three camps at the UN House site in Juba, with an estimated population of 20,000 people, according to the International Organization for Migration. People in the POC are there in search of protection against the violence and conflict that South Sudan has been experiencing. We’re hoping to use mobile phones to monitor food security indicators in POC communities. POC 3 happens to have good cell phone coverage – a 2014 survey estimated that some 70% of households in the camp had access to a phone.  

 

Photo: WFP/Silvia Passeri

Photo: WFP/Silvia Passeri

We evaluated how mode effects the Food Consumption Score (FCS), which measures the frequency of consumption of different food groups consumed by a household during the 7 days before the survey. A higher score means a better level of the respondent’s household food security. The FCS is a commonly used proxy for household food security.

We carried out two rounds of data collection, round 1 in March and round 2 in May 2016. In round 1, half of the respondents received a voice call survey and the other half participated in an identical interview face-to-face. The ‘treatment’ (voice call) was random. In round 2, some of the respondents that received a voice call took the exact same survey face-to-face, and vice versa.

There were challenges relating to security in the POC and some of the respondents from March were not found in the camp when we conducted the second round in May. As a result, we had 132 voice and 333 face-to-face interviews in round one, but 138 voice and only 117 face-to-face surveys in round 2. This sample size is smaller than we would have liked, but we think it’s indicative enough to tell us how responding to a phone survey differs from one that took place face-to-face.

Calls were placed by operators that were ‘converted’ enumerators – field monitors who usually carry out WFP’s post-distribution monitoring but were new to phone-based surveys. This meant that they were already familiar with the food security indicators and the camp community, but needed training on the protocol for phone-based surveys.

Results

We observed substantial mode effects in round 1. We obtained a mean FCS of 34 via face-to-face surveys, but a much higher score of 45  through voice calls. Our regression analysis shows that mode alone accounted for 7 points in the difference in a household’s response (p<0.01), with other factors accounting for the remainder of the difference. This means that a voice survey would inflate the FCS by 20%, leading to a gross underestimation of the severity of food insecurity in the population of interest. During round 1, the voice FCS question behaved as an almost binary variable – we would get 1s and 7s, but very few 2,3,4,5 answers. That means a lot of people said they ate a given food item one day or every day, but that very few other answers were being recorded.

FCS results, round 1

FCS results, round 1

In round 2, the difference between voice calls and face to face surveys diminished substantially. Also, the difference was not statistically significant. In fact, the slight remaining difference between the two groups was due to respondent households’ socio economic profile, not because of the mode we used to collect data.

 

R2

FCS results, round 2

Lessons learned

For the food consumption score, the differences between voice and face-to-face due to the mode effect were large in round 1, but vanished in round 2. This is a positive finding for us as we are seeking to rigorously test and validate the data collected through mobile and reporting on the results with some degree of confidence. We want to highlight a few lessons here that could help guide others into the right direction.

Lesson 1: Practice makes perfect.  We suspect that the poor quality of the data collected in round 1 is due to our call center being brand new, and experiencing ‘teething’ problems. When an in-house call center is first set up, it tends to be small scale comprising of one or two operators. With resources permitting (and provided there is increased information needs) the call center may be expanded with additional operators who will receive regular training and coaching. Our analysts have been saying anecdotally that data quality improves as time goes by and the system becomes more established. We have a good illustration of the phenomenon here in South Sudan.

Lesson 2: Close supervision is required! Although our operators were familiar with data collection, it took time to train them to implement surveys by phone with quality.  This again shows that operator selection, training, and supervision are key to obtaining good quality data.

Lesson 3: Work with professional call centers. Overall, this encourages us to continue working with professional call centers when possible, and avoid the temptation to do things in-house in a hurry – something that can be all too tempting in an emergency setting.

We also think the method used in South Sudan could be applied elsewhere to help evaluate mode effects. We will post the survey design on the mVAM Resource Center for others to use.

Mind the Mode

IVR vs SMS in Zimbabwe

img_0046It’s all in the mode. Or is it? Would your response over the phone be different than when you had a person in front of you asking a question?  When answering a question over the phone would you respond differently if you were speaking to a friendly operator or a recorded voice or were replying by SMS? These are pretty key considerations when you are in the business of asking people questions from afar, and we get asked about it a lot.

So, welcome to our first edition of our ‘Mind the Mode‘ series. We have been conducting some mode experiments to find out whether people respond differently to different survey modes: live calls, IVR (Interactive Voice Response- that recorded voice asking you to press 1 for English or 2 for Spanish), SMS, or face-to-face. In this first edition, we look at IVR and SMS in Zimbabwe.

You might never have thought about it before, but it turns out that IVR and SMS compete. In the automated data collection space, there are two schools of thought: one favors data collection via SMS, the other IVR. The SMS advocates argue that a respondent can take the survey at the time of their choice and at their pace. Proponents of IVR point to the fact that voice recordings are easier to understand than a text message because you don’t need to be literate to take the survey.  It’s therefore the more ‘democratic’ tool.

At mVAM, we’ve mostly been using SMS but in Zimbabwe, we had the opportunity to compare these two modes. Food security data was collected by both SMS and IVR in August 2016. IVR responses were received from 1760 randomly selected respondents throughout Zimbabwe and 2450 SMS responses were received from a different set of random respondents stratified by province. Most responses came from Manicaland, Harare, Masvingo and Midlands for both types of surveys due to higher population densities, better network coverage and higher phone ownership in these areas.

Respondents were asked pretty similar questions in both surveys. Both surveys asked:

  • demographic and location questions such as the age and gender of the respondent, the gender of the head of household, and the province and district that they lived in
  • type of toilet in their house (to gain a rough estimate of socio-economic status);
  • daily manual labour wage and
  • whether they used any of the five coping strategies (a proxy for food insecurity
    1.  Rely on less preferred or less expensive food due to lack of food or money to buy food?
    2. Borrow food, or rely on help from a friend or relative due to lack of food or money to buy food?
    3. Reduce the number of meals eaten in a day due to lack of food or money to buy food?
    4. Limit portion sizes at mealtime due to lack of food or money to buy food?
    5. Restrict consumption by adults so children could eat

However, there were a few aspects where the surveys were slightly different. The SMS survey gave an incentive of USD 0.50 airtime credit to respondents who completed the survey whilst there was no incentive to do the IVR one. In the IVR survey, respondents could choose between English or Shona (most respondents chose to take it in Shona) whereas the SMS survey was only conducted in English.

So, what have we learned?

IVR and SMS reach different demographics.

Our IVR and SMS surveys reached different demographics. A higher proportion of IVR responses came from the worse-off households, i.e. those with no toilets or with pit latrines compared to SMS responses. Similarly, a higher proportion of households headed by women participated in the IVR survey than the SMS survey. WFP generally finds that households headed by women usually are more food insecure. So IVR surveys appear have greater reach to worse-off households. This may be because they do not require literacy or knowledge of Englishas with SMS surveys.

zimblog-1b

Fig. 1a: IVR respondents by toilet type

zimblog-1

Fig. 1b: SMS respondents by toilet type

zimblog-1c

Fig. 1c: IVR respondents by head of household sex

zimblog-1d

Fig. 1d: SMS respondents by head of household sex

 

 

 

 

 

 

 

 

 

 

 

 

 

IVR surveys give higher food insecurity estimates than SMS. Spoiler: The reason is unclear.

In general, we found that IVR responses showed higher coping levels than SMS responses. The mean reduced coping strategy index (rCSI) is used as a proxy for food insecurity. A higher rCSI means people have to cope more in response to lack of food or money to buy food, meaning they are more food insecure. In Zimbabwe, mean rCSI captured through IVR (21.9) was higher than that captured through SMS (18.3) for the entire country. This difference in mean rCSI was consistent across cross-sections by the sex of the household head and by province (Figs. 2 and 3).

zimblog-2

Fig. 2: rCSI by sex of household head

zimblog-3

Fig. 3: Mean rCSI by province

However, when the data was analysed by toilet type, which was used as the proxy indicator for wealth, we saw a slightly different pattern. Flush toilets are considered as a proxy for the best-off, followed by Blair pit latrine (a ventilated pit latrine), then pit latrine and then no toilets. We also asked about composting toilets but too few households had them to make any meaningful comparisons. The mean rCSI was only significantly different for households with flush toilets and with pit latrines (in both cases IVR responses had higher rCSI). The mean rCSI results for the other two toilet categories (Blair pit latrine and no toilet) were not significantly different in the two types of surveys. Therefore, the commonly observed difference between IVR and SMS responses is not observed across all wealth groups (Fig. 4).

zimblog-4

Fig. 4: rCSI by toilet type

This suggests that the higher overall mean rCSI in IVR respondents compared to SMS respondents is not be coming from the fact that IVR reached more worse off households. However, we say this with a big caveat. Toilet type as we said above is a rough indicator and it might not be an accurate indication of which households are worse off.  It’s possible that we would have seen different results if we had used a different type of proxy indicator for wealth groups.

When we examine this a bit further and break down the rCSI into the individual coping strategies in Figure 5, we see that IVR respondents use more coping strategies more frequently than SMS respondents. This make sense because the individual coping strategies are what are used to calculate the rCSI and we already observed higher mean rCSI in IVR respondents.

zimblog-5

Fig. 5: Percentage of households using different coping strategies

However, we also noticed something else when looking at responses to each coping strategy.  There is a much higher variation in coping strategy use within SMS respondents compared to IVR respondents (see Figure 5). This suggests that respondents may be ‘straightlining’, i.e. providing the same response to every question. Straightlining suggests that people just don’t respond well to a recorded voice over the phone. While SMS is not good for literacy reasons, it does give the respondent more control over the pace of the survey. With SMS, respondents have as much time as they want to read (or re-read) the whole text and respond. With IVR, people have to go at the speed of the questions. They could get impatient waiting to hear all the answers to a question or they might not have enough time to understand the question. In both cases, they might just start pressing the same answer to get to the next question. Thus IVR might not give quality results.

Interestingly, we saw a similar pattern in Liberia during the Ebola epidemic. We used both SMS and IVR to collect information during the emergency. IVR results showed very high rCSI with limited variation. SMS data consistently produced lower (and more credible) rCSI estimates, and the variation in the data was greater (perhaps a sign of greater data quality).

Different demographics or differences in user experiences (i.e. straightlining) could be contributing to different food security estimates in IVR and SMS.

The upshot is that different survey tools lead to different results, and we need to understand these differences as the use of automated mobile data collection expands. We are not sure whether the different demographics among IVR and SMS respondents are the cause of higher food insecurity estimates for IVR or whether the different user experiences are in play, especially that IVR respondents may be straightlining their answers and not accurately reflecting their coping levels. We suspect that a bit of both might be in play.

Stay tuned for the next editions of our ‘Mind the Mode’ series as we continue to document our learning on the mode experiments