Mind the mode:

Who's texting & who's talking in Malawi?

Malawi mVAM respondent WFP/Alice Clough

Malawi mVAM respondent
WFP/Alice Clough

It’s time for another installment of our Mind the Mode series. For those of you who follow this blog regularly, you know that the mVAM team is continually evaluating the quality of the data we collect. Past Mind the Mode blogs have discussed our work in Mali looking at face-to-face versus voice calls, our comparison of SMS and IVR in Zimbabwe and the differences in the Food Consumption Score (FCS) for face-to-face versus Computer-Assisted Telephone Interviews (CATI) interviews in South Sudan.

This month, we turn our attention to Malawi, where we recently completed a study analyzing the differences in the reduced Coping Strategies Index (rCSI) when it’s collected via CATI and SMS. This indicator helps measure a household’s food security by telling us what actions they might be taking to cope with any stresses such as reducing the number of meals a day or borrowing food or money from friends or family. From February to April 2017, around 2,000 respondents were randomly-selected for an SMS survey and 1,300 respondents were contacted on their mobile phones by an external call centre to complete a CATI survey.

People Profiling: who’s Texting and who’s Talking? 

Across all three rounds, a greater proportion of respondents in both modalities were men who lived in the South and Central Regions of the country and came from male-headed households. However, the respondents taking the SMS survey were much younger (average age 29) than those who took the CATI survey (average age 40). This probably isn’t surprising when you consider that young people across the world tend to be much more interested in new technologies and in Malawi are more likely to be literate.

The results from our mode experiment in Zimbabwe showed that IVR and SMS surveys reached different demographic groups so we figured we might see the same results in Malawi. However, this was surprisingly not the case: both CATI and SMS participants seemed to come from better-off households. In our surveys we determine this by asking them what material the walls of their home are made from (cement, baked bricks, mud, or unbaked bricks).

better off-worse off wall type malawi

More respondents (60%) said they have cement or baked brick walls as opposed to mud or unbaked brick walls, an indicator of being richer.

Digging into the rCSI

So what about the results observed for the rCSI between the two modes? The CATI rCSI distribution shows a peak at zero (meaning that respondents are not employing any negative coping strategies) and is similar to the typical pattern expected of the rCSI in face-to-face surveys (as you can see in the two graphs below).

Density plot for CATI Feb-April 2017

 

SMS rCSI

The SMS results, on the other hand, tend to have a slightly higher rCSI score than in CATI, meaning that respondents to the SMS survey are employing more negative coping strategies than households surveyed via CATI. This is counter-intuitive to what we might expect, especially since the data illustrates that these households are not more vulnerable than CATI respondents. Presumably, they would actually be better educated (read: literate!) to be able to respond to SMS surveys. We’re therefore looking forward to doing some more research in to why this is the case.

Box plot cati rcsi

It’s All in the Numbers

Some interesting patterns in terms of responses were also observed via both modalities. SMS respondents were more likely to respond to all five rCSI questions by entering the same value for each question (think: 00000, 22222…you get the idea!). At the beginning of the survey, SMS respondents were told that they would earn a small airtime credit upon completion of the questionnaire. We conjecture that some respondents may have just entered numbers randomly to complete the questionnaire as quickly as possible and receive their credit. Keep in mind that entering the same value for all five rCSI questions via CATI is a lot more difficult, as the operator is able to ask additional questions to ensure that the respondent clearly understands the question prior to entering the response.  For SMS, there’s no check prohibiting the respondent from dashing through the questionnaire and entering the same response each time.

We also saw that the percentage of respondents stating that they were employing between zero and four strategies was much lower among SMS respondents than CATI respondents across all three months of data collection. Conversely, more respondents (three out of five) in the SMS survey reported that they were using all five negative coping strategies than in the CATI survey. Again, this is counter-intuitive to what we would expect.  It might mean that SMS respondents didn’t always correctly understand the questionnaire or that they didn’t take the time to reflect on each question, completing questions as rapidly as possible to get their credit; or simply entered random numbers in the absence of an operator to validate their responses.  The graphs below illustrate the differences in rCSI responses between CATI and SMS.

Figure 3: Distribution of the number of coping strategies reported by SMS and CATI respondents by months

Figure 3: Distribution of the number of coping strategies reported by SMS and CATI respondents by months

From these results, you can see that we still have a lot to learn on how survey modality affects the results. This is just the start of our research; so expect more to come as the team digs deeper to better understand these important differences.

Mind the mode …. and the non-response

How voice and face-to-face survey data compares in Mali

This is the third entry in our ‘Mind the Mode’ series on the mVAM blog. We are constantly assessing our data collection modalities to better understand what produces the most-accurate results and what biases may be present. One of our recent experiments took us to Mali, where we were comparing the food consumption score between face-to-face (F2F) interviews versus mVAM live calls.

It’s all in the details
To do this, in February and March, the WFP team first conducted a baseline assessment in four regions of the country. As part of the baseline, we collected phone numbers from participants. Approximately 7-10 days later, we then re-contacted those households who had phones, reaching roughly half of those encountered during the face-to-face survey. We weren’t able to contact the other households. To ensure the validity of the results, we made sure the questionnaire was the exact same between the F2F and telephone interviews. Any differences in wording or changes in the way in which the questions were asked could adversely affect our analysis.

The findings from our analysis were quite interesting. We found that food consumption scores (FCS) collected via the mVAM survey tended to be slightly higher than those collected via the face-to-face survey. The graph below illustrates this shift to higher scores between the two rounds. Higher FCS via mVAM versus F2F surveys is not atypical to Mali. We’ve observed similar outcomes in South Sudan and other countries where mVAM studies have taken place.

mali dist

 

Why could this be? There are two main reasons that could explain this difference. Either it might be due to the data collection modality (i.e., people report higher food consumption scores on the phone)? Or, a perhaps a selection bias is occurring? Remember that we were only able to contact roughly half of the participants from the F2F survey during the telephone calls. So, it’s possible that people who responded to the phone calls are less food insecure, which could make sense, since we often see that the poorest of the poor either don’t own a phone or have limited economic means to charge their phone or purchase phone credit.

To test these hypotheses, we dug a bit deeper.

Same same…
Are people telling the same story on the phone versus face-to-face? Based on our results, the answer is yes! If we compare the same pool of respondents who participated in both the F2F and telephone survey rounds, their food security indicators are more or less the same. For example, the mean mVAM FCS was 56.21 while the mean F2F FCS was 55.65, with no statistically significant difference between the two.

But different…
So what about selection bias? In the F2F round, there are essentially three groups of people: 1) those who own phones and participated in both the F2F and mVAM survey; 2) people who own phones but didn’t participate in the mVAM survey, because they either didn’t answer the calls or their phone was off; and 3) people who do not own a phone and thus couldn’t participate in the mVAM survey.

People who replied to the mVAM survey have overall higher FCS than those that we were unable to contact. What we learned from this experiment is that bias does not only come from the households that do not own a phone but also from non-respondents (those households who shared their phone number and gave consent but then were not reachable later on for the phone interview). Possible reasons why they were not reachable could be that they have less access to electricity to charge their phone or that they live in areas with bad network coverage. The graph below illustrates the distribution by respondent type and their respective FCS.

mali boxp

When you compare the demographics of people in these three groups based on the data collected in the baseline, you can see that there are significant differences, as per the example below. Notice that the education levels of respondents varies amongst the three groups—those without a phone tend to be less educated than those who own a phone and participated in the mVAM survey.

mali profile

This study taught us a valuable lesson. While we are confident that there is no statistically significant difference between face-to-face and phone responses within the Mali context, there is a selection bias in mVAM-collected data. By not including those without phones as well as those who did not respond, we are missing an important (and likely poorer) subset of the population, meaning that the reported FCS is likely higher than it may be if these groups were included. One way to account for this bias is to ensure that telephone operators attempt to contact the households numerous times, over the course of several days. It’s important that they really try to reach them. The team is also studying how to account for this bias in our data analyses.

Mind the Mode

Settling the (Food Consumption) Score in South Sudan

POC3_Nektarios_Markogiannis

POC 3
Photo: UNMISS/Nektarios Markogiannis

For the second installment of our ‘Mind the Mode’ series, we’re taking you to Juba, South Sudan, where we previously conducted a mode experiment. What we wanted to see was how food security indicators compare when data is collected face-to-face and through operators over the phone.

South Sudan is a complex setting for mobile surveys to begin with. The country has low cell phone penetration- it’s estimated to be only 20%. Network quality is a problem, often calls don’t go through or audio is poor.  Last, but not least, the country has been extremely unstable. While we have been using key informant phone interviews to date, we are investigating the feasibility of conducting phone surveys to collect household food security indicators. Given the complexities, starting with a test to evaluate biases related to survey mode seemed prudent.

Methodology

The mode experiment took place in “POC 3”, a Protection of Civilians (POC) camp in Juba near the main UN compound. POC 3 is the largest of three camps at the UN House site in Juba, with an estimated population of 20,000 people, according to the International Organization for Migration. People in the POC are there in search of protection against the violence and conflict that South Sudan has been experiencing. We’re hoping to use mobile phones to monitor food security indicators in POC communities. POC 3 happens to have good cell phone coverage – a 2014 survey estimated that some 70% of households in the camp had access to a phone.  

 

Photo: WFP/Silvia Passeri

Photo: WFP/Silvia Passeri

We evaluated how mode effects the Food Consumption Score (FCS), which measures the frequency of consumption of different food groups consumed by a household during the 7 days before the survey. A higher score means a better level of the respondent’s household food security. The FCS is a commonly used proxy for household food security.

We carried out two rounds of data collection, round 1 in March and round 2 in May 2016. In round 1, half of the respondents received a voice call survey and the other half participated in an identical interview face-to-face. The ‘treatment’ (voice call) was random. In round 2, some of the respondents that received a voice call took the exact same survey face-to-face, and vice versa.

There were challenges relating to security in the POC and some of the respondents from March were not found in the camp when we conducted the second round in May. As a result, we had 132 voice and 333 face-to-face interviews in round one, but 138 voice and only 117 face-to-face surveys in round 2. This sample size is smaller than we would have liked, but we think it’s indicative enough to tell us how responding to a phone survey differs from one that took place face-to-face.

Calls were placed by operators that were ‘converted’ enumerators – field monitors who usually carry out WFP’s post-distribution monitoring but were new to phone-based surveys. This meant that they were already familiar with the food security indicators and the camp community, but needed training on the protocol for phone-based surveys.

Results

We observed substantial mode effects in round 1. We obtained a mean FCS of 34 via face-to-face surveys, but a much higher score of 45  through voice calls. Our regression analysis shows that mode alone accounted for 7 points in the difference in a household’s response (p<0.01), with other factors accounting for the remainder of the difference. This means that a voice survey would inflate the FCS by 20%, leading to a gross underestimation of the severity of food insecurity in the population of interest. During round 1, the voice FCS question behaved as an almost binary variable – we would get 1s and 7s, but very few 2,3,4,5 answers. That means a lot of people said they ate a given food item one day or every day, but that very few other answers were being recorded.

FCS results, round 1

FCS results, round 1

In round 2, the difference between voice calls and face to face surveys diminished substantially. Also, the difference was not statistically significant. In fact, the slight remaining difference between the two groups was due to respondent households’ socio economic profile, not because of the mode we used to collect data.

 

R2

FCS results, round 2

Lessons learned

For the food consumption score, the differences between voice and face-to-face due to the mode effect were large in round 1, but vanished in round 2. This is a positive finding for us as we are seeking to rigorously test and validate the data collected through mobile and reporting on the results with some degree of confidence. We want to highlight a few lessons here that could help guide others into the right direction.

Lesson 1: Practice makes perfect.  We suspect that the poor quality of the data collected in round 1 is due to our call center being brand new, and experiencing ‘teething’ problems. When an in-house call center is first set up, it tends to be small scale comprising of one or two operators. With resources permitting (and provided there is increased information needs) the call center may be expanded with additional operators who will receive regular training and coaching. Our analysts have been saying anecdotally that data quality improves as time goes by and the system becomes more established. We have a good illustration of the phenomenon here in South Sudan.

Lesson 2: Close supervision is required! Although our operators were familiar with data collection, it took time to train them to implement surveys by phone with quality.  This again shows that operator selection, training, and supervision are key to obtaining good quality data.

Lesson 3: Work with professional call centers. Overall, this encourages us to continue working with professional call centers when possible, and avoid the temptation to do things in-house in a hurry – something that can be all too tempting in an emergency setting.

We also think the method used in South Sudan could be applied elsewhere to help evaluate mode effects. We will post the survey design on the mVAM Resource Center for others to use.

Mind the Mode

IVR vs SMS in Zimbabwe

img_0046It’s all in the mode. Or is it? Would your response over the phone be different than when you had a person in front of you asking a question?  When answering a question over the phone would you respond differently if you were speaking to a friendly operator or a recorded voice or were replying by SMS? These are pretty key considerations when you are in the business of asking people questions from afar, and we get asked about it a lot.

So, welcome to our first edition of our ‘Mind the Mode‘ series. We have been conducting some mode experiments to find out whether people respond differently to different survey modes: live calls, IVR (Interactive Voice Response- that recorded voice asking you to press 1 for English or 2 for Spanish), SMS, or face-to-face. In this first edition, we look at IVR and SMS in Zimbabwe.

You might never have thought about it before, but it turns out that IVR and SMS compete. In the automated data collection space, there are two schools of thought: one favors data collection via SMS, the other IVR. The SMS advocates argue that a respondent can take the survey at the time of their choice and at their pace. Proponents of IVR point to the fact that voice recordings are easier to understand than a text message because you don’t need to be literate to take the survey.  It’s therefore the more ‘democratic’ tool.

At mVAM, we’ve mostly been using SMS but in Zimbabwe, we had the opportunity to compare these two modes. Food security data was collected by both SMS and IVR in August 2016. IVR responses were received from 1760 randomly selected respondents throughout Zimbabwe and 2450 SMS responses were received from a different set of random respondents stratified by province. Most responses came from Manicaland, Harare, Masvingo and Midlands for both types of surveys due to higher population densities, better network coverage and higher phone ownership in these areas.

Respondents were asked pretty similar questions in both surveys. Both surveys asked:

  • demographic and location questions such as the age and gender of the respondent, the gender of the head of household, and the province and district that they lived in
  • type of toilet in their house (to gain a rough estimate of socio-economic status);
  • daily manual labour wage and
  • whether they used any of the five coping strategies (a proxy for food insecurity
    1.  Rely on less preferred or less expensive food due to lack of food or money to buy food?
    2. Borrow food, or rely on help from a friend or relative due to lack of food or money to buy food?
    3. Reduce the number of meals eaten in a day due to lack of food or money to buy food?
    4. Limit portion sizes at mealtime due to lack of food or money to buy food?
    5. Restrict consumption by adults so children could eat

However, there were a few aspects where the surveys were slightly different. The SMS survey gave an incentive of USD 0.50 airtime credit to respondents who completed the survey whilst there was no incentive to do the IVR one. In the IVR survey, respondents could choose between English or Shona (most respondents chose to take it in Shona) whereas the SMS survey was only conducted in English.

So, what have we learned?

IVR and SMS reach different demographics.

Our IVR and SMS surveys reached different demographics. A higher proportion of IVR responses came from the worse-off households, i.e. those with no toilets or with pit latrines compared to SMS responses. Similarly, a higher proportion of households headed by women participated in the IVR survey than the SMS survey. WFP generally finds that households headed by women usually are more food insecure. So IVR surveys appear have greater reach to worse-off households. This may be because they do not require literacy or knowledge of Englishas with SMS surveys.

zimblog-1b

Fig. 1a: IVR respondents by toilet type

zimblog-1

Fig. 1b: SMS respondents by toilet type

zimblog-1c

Fig. 1c: IVR respondents by head of household sex

zimblog-1d

Fig. 1d: SMS respondents by head of household sex

 

 

 

 

 

 

 

 

 

 

 

 

 

IVR surveys give higher food insecurity estimates than SMS. Spoiler: The reason is unclear.

In general, we found that IVR responses showed higher coping levels than SMS responses. The mean reduced coping strategy index (rCSI) is used as a proxy for food insecurity. A higher rCSI means people have to cope more in response to lack of food or money to buy food, meaning they are more food insecure. In Zimbabwe, mean rCSI captured through IVR (21.9) was higher than that captured through SMS (18.3) for the entire country. This difference in mean rCSI was consistent across cross-sections by the sex of the household head and by province (Figs. 2 and 3).

zimblog-2

Fig. 2: rCSI by sex of household head

zimblog-3

Fig. 3: Mean rCSI by province

However, when the data was analysed by toilet type, which was used as the proxy indicator for wealth, we saw a slightly different pattern. Flush toilets are considered as a proxy for the best-off, followed by Blair pit latrine (a ventilated pit latrine), then pit latrine and then no toilets. We also asked about composting toilets but too few households had them to make any meaningful comparisons. The mean rCSI was only significantly different for households with flush toilets and with pit latrines (in both cases IVR responses had higher rCSI). The mean rCSI results for the other two toilet categories (Blair pit latrine and no toilet) were not significantly different in the two types of surveys. Therefore, the commonly observed difference between IVR and SMS responses is not observed across all wealth groups (Fig. 4).

zimblog-4

Fig. 4: rCSI by toilet type

This suggests that the higher overall mean rCSI in IVR respondents compared to SMS respondents is not be coming from the fact that IVR reached more worse off households. However, we say this with a big caveat. Toilet type as we said above is a rough indicator and it might not be an accurate indication of which households are worse off.  It’s possible that we would have seen different results if we had used a different type of proxy indicator for wealth groups.

When we examine this a bit further and break down the rCSI into the individual coping strategies in Figure 5, we see that IVR respondents use more coping strategies more frequently than SMS respondents. This make sense because the individual coping strategies are what are used to calculate the rCSI and we already observed higher mean rCSI in IVR respondents.

zimblog-5

Fig. 5: Percentage of households using different coping strategies

However, we also noticed something else when looking at responses to each coping strategy.  There is a much higher variation in coping strategy use within SMS respondents compared to IVR respondents (see Figure 5). This suggests that respondents may be ‘straightlining’, i.e. providing the same response to every question. Straightlining suggests that people just don’t respond well to a recorded voice over the phone. While SMS is not good for literacy reasons, it does give the respondent more control over the pace of the survey. With SMS, respondents have as much time as they want to read (or re-read) the whole text and respond. With IVR, people have to go at the speed of the questions. They could get impatient waiting to hear all the answers to a question or they might not have enough time to understand the question. In both cases, they might just start pressing the same answer to get to the next question. Thus IVR might not give quality results.

Interestingly, we saw a similar pattern in Liberia during the Ebola epidemic. We used both SMS and IVR to collect information during the emergency. IVR results showed very high rCSI with limited variation. SMS data consistently produced lower (and more credible) rCSI estimates, and the variation in the data was greater (perhaps a sign of greater data quality).

Different demographics or differences in user experiences (i.e. straightlining) could be contributing to different food security estimates in IVR and SMS.

The upshot is that different survey tools lead to different results, and we need to understand these differences as the use of automated mobile data collection expands. We are not sure whether the different demographics among IVR and SMS respondents are the cause of higher food insecurity estimates for IVR or whether the different user experiences are in play, especially that IVR respondents may be straightlining their answers and not accurately reflecting their coping levels. We suspect that a bit of both might be in play.

Stay tuned for the next editions of our ‘Mind the Mode’ series as we continue to document our learning on the mode experiments