Appendix containing the robustness checks.

The econometric models used in the core article to which is appendix is related were adapted from a previous article about the Province of Quebec’s elections (Sanger & Warin, 2018). We obtained for each day the main topic of the conversation (one of the four main topics). Further on, for each topic and for each day, we observed which candidate was mentioned the most (either Jonathan or Buhari). Finally, we segmented our dataset in four periods of time (from March 1st to 6th ; from March 7th to 13th ; from March 14th to 20th and from March 21st to March 27th ).

In this context, the question is to select a logistic estimation that will extract high quality information. We have to decide first between a discrete modelisation with only two outcomes or a discrete modelisation with more than two outcomes. The choice of two outcomes infers that we create four variables capturing the 4 categories of our initial dependent variable.

Secondly, if we decide to keep our category variable with 4 categories, then another decision has to be made: choosing between a multinomial logistic estimation or an ordered logistic estimation. The issue is not trivial. Indeed, although the demarcation line is often clear, in our context, we are in a grey area.

Due to the nature of the study, it is not clear what estimation technique is best. Indeed, in a traditional setting, a ranking can be done. In this example, there is no order. Here, we collect tweets and we aggregate the number of tweets at the end of the day per category. In our context, a person can tweet about a topic and then tweet at another time of day about another topic.

In many regards, it is like looking at a permanent poll, which raises interesting statistical questions and thus requires or allows for new techniques or protocols. Indeed, even if the categories we chose have no order, in fact the persons tweeting during the day make a choice like in a poll and we can assume that if they tweet more about a topic, it is because they do believe this topic matters more to them than another one.^{1} If this hypothesis is right, then the next question is to know the order. A specific logistic estimator is in fact designed for this kind of characteristics: the stereotype logistic estimator. Unlike ordered logistic models, stereotype logistic models do not impose the proportional-odds assumption. Stereotype logistic models are often used when subjects are requested to assess or judge something. For these validity tests, we propose here:

- A plain-vanilla - unordered - multinomial estimation
- A mixed-ordered estimation: a stereotype-ordered logistic estimation
- The multinomial logistic estimation fits maximum likelihood models with discrete dependent variables when the dependent variable takes on more than two outcomes and the outcomes have no natural ordering (Greene, 2012; Hosmer Jr, Lemeshow, & Sturdivant, 2013; Long, 1997; Long & Freese, 2014; Treiman, 2009).

Finally, we will also estimate a model that is a compromise between the ordered and unordered logistic estimations: the aforementioned stereotype logistic estimation (Anderson, 1984; Greenland, 1985), because there is an uncertainty about the relevance of the ordering.

The multinomial estimator assumes that there is no order in the different categories used for the coding of the dependent variable. But for the stereotype estimator, it relies on one hypothesis: in fact the persons tweeting during the day make a choice like in a poll and we can assume that if they tweet more about a topic, it is because they do believe this topic matters more to them than another one. But unlike the ordered logit estimator, we make the reasonable assumption that we do not know all the latent variables to make a proper ranking.

In the following table, we calculate the relative risk ratios. Compared to the base outcome (social category), Mr. Jonathan is less likely to be associated with the integrity category than Mr. Buhari. The same is true for the economy category.

**Table 1.** Dependent variable: topic {social; integrity; economy; geopolitics}

Model: multinomial logit |
Coef. |
Relative Risk Ratios |
Coef. |
Relative Risk Ratios |

Independent variables | ||||
---|---|---|---|---|

Topic 1 (social) |
base outcome |

Topic 2 (integrity) | ||||
---|---|---|---|---|

Jonathan | -0.0027595*** | 0.9972443*** | -0.0028211*** | 0.9971828*** |

Buhari | 0.0011461* | 1.001147* | 0.0011777* | 1.001178* |

PDP | 0.0031242** | 1.003129** | 0.0031851** | 1.00319** |

APC | -0.0010757 | 0.9989249 | -0.001078 | 0.9989226 |

Periods (ref = Period 1) | ||||

____Period 2 | -0.030186 | 0.970265 | ||

____Period 3 | 0.0424364 | 1.04335 | ||

____Period 4 | 0.0435374 | 1.044499 | ||

Constant | 0.0271056 | 0.0124693 |

Topic 3 (economy) | ||||
---|---|---|---|---|

Jonathan | -0.001625*** | 0.9983764*** | -0.0016934*** | 0.998308*** |

Buhari | 0.0013541** | 1.001355** | 0.0014034** | 1.001404* |

PDP | -0.0014103 | 0.9985907 | -0.0012894 | 0.9987115 |

APC | 0.004636** | 1.004647** | 0.004534** | 1.004544** |

Periods (ref = Period 1) | ||||

____Period 2 | -0.1076294 | 0.8979603 | ||

____PPeriod 3 | -0.000384 | 0.9996161 | ||

____PPeriod 4 | -0.0027362 | 0.9972675 | ||

Constant | -0.3239868** | -0.294147 |

Topic 4 (geopolitics) | ||||
---|---|---|---|---|

Jonathan | 0.000212 | 1.000212 | 0.0002218 | 1.000222 |

Buhari | 0.0005398 | 1.00054 | 0.0005451 | 1.000545 |

PDP | -0.0004569 | 0.9995432 | -0.0003948 | 0.9996053 |

APC | 0.0055712*** | 1.005587*** | 0.0055733*** | 1.005589*** |

Periods (ref = Period 1) | ||||

____Period 2 | 0.097285 | 1.102174 | ||

____Period 3 | -0.0254721 | 0.9748496 | ||

____Period 4 | -0.2292305 | 0.7951452 | ||

Constant | -0.5721883*** | 0.5426475* |

Predicted probabilities | Coef. | Coef. | ||
---|---|---|---|---|

Pr(y=1) | 0.251581*** | 0.2520359*** | ||

Pr(y=0) | 0.748419*** | 0.7479641*** | ||

Topic = 1 (social) | 0.251581*** | 0.2520359*** | ||

Topic = 2 (integrity) | 0.2182243*** | 0.2184104*** | ||

Topic = 3 (economy) | 0.2602211*** | 0.2602559*** | ||

Topic = 4 (geopolitics) | 0.2699736*** | 0.2692978*** |

Statistics | ||||
---|---|---|---|---|

Number of observations | 540 | 540 | ||

LR chi2 | 96.93 | 98.79 | ||

Prob > chi2 | 0.0000 | 0.0000 | ||

Pseudo R2 | 0.0647 | 0.0660 | ||

Log likelihood | -700.1345 | -699.20412 | ||

P-value: \(*<0.1\), \(**<0.05\), \(***<0.01\) |

We compute the marginal effects as a robustness check. The following table shows that Mr. Jonathan is more associated with the conversations about the social and geopolitics categories.

**Table 2.** Dependent variable: topic {social, integrity, economy, geopolitics}

Model: multinomial logit |
Marginal Effects |
|||
---|---|---|---|---|

Independent variables | Social |
Integrity |
Economy |
Geopolitics |

Jonathan | 0.0002435*** | -0.000391*** | -0.000171* | 0.0003185*** |

Buhari | -0.0001882* | 0.0000868 | 0.0001577** | -0.000563 |

PDP | -0.000482 | 0.00064*** | -0.0004168 | -0.000175 |

APC | -0.0006228* | -0.000775** | 0.0005621** | 0.0008357*** |

Statistics | ||||
---|---|---|---|---|

Number of observations | 540 | |||

LR chi2 | 96.93 | |||

Prob > chi2 | 0.0000 | |||

Pseudo R2 | 0.0647 | |||

Log likelihood | -700.1345 | |||

P-value: \(*<0.1\), \(**<0.05\), \(***<0.01\) |

Now, let us present the results based on the stereotype logistic regressions. Stereotype logistic models are used in particular when categories may be indistinguishable. The stereotype logistic model should be seen as a restriction on the multinomial model.

**Table 3.** Dependent variable: topic {social, integrity, economy, geopolitics}

Model: stereotype ordered logit |
Without constraint |
With constraint |

Independent variables | Coef. | Coef. |

Jonathan | 0.0011052*** | 0.000968** |

Buhari | -0.0001663 | 0.0000215 |

PDP | -0.0013216** | -0.0012697 |

APC | 0.0036409** | 0.0055193*** |

/phi1_1 | 1*** | 1*** |

/phi1_2 | 1.856209*** | 1*** |

/phi1_3 | 0.5181356** | 0.3122322*** |

/phi1_4 | ||

/theta1 | 0.4423166*** | 0.659083*** |

/theta2 | 0.607536*** | 0.659083*** |

/theta3 | 0.2966307** | 0.2649479* |

/theta4 | 0 | |

(category 4 is the base outcome ) |

Statistics | ||
---|---|---|

Number of observations | 540 | |

Wald chi2 | 17.69 | |

Prob > chi2 | 0.0014 | |

Log likelihood | -715.05661 | |

P-value: \(*<0.1\), \(**<0.05\), \(***<0.01\) |

In the previous table, we can observe that Mr. Jonathan is more associated with the conversations about the geopolitics category (\(coef.=0.001^*\)), as well as the APC party (\(coef.=0.0036\)). Those are interesting results since they validate the ones we got with the plain multinomial logit estimator, although being a little less focused in terms of interpretation since they have to be interpreted vis-a-vis the base category (here, geopolitics). It is interesting anyway to be able to use a stereotype logistic estimator based on our dataset. Indeed, our framing strategy seems to allow us to perform the latter analysis and provide some robustness to the analysis of conversations on Twitter.

Anderson, J. A. (1984). Regression and Ordered Categorical Variables. Journal of the Royal Statistical Society. Series B (Methodological), 46(1), 1–30.

Dupont, W. D., & Dupont, W. D. (2009). Statistical modeling for biomedical researchers: a simple introduction to the analysis of complex data. Cambridge University Press.

Gould, W. (2000). Interpreting logistic regression in all its forms. Stata Technical Bulletin, 9(53).

Greenland, S. (1985). An Application of Logistic Models to the Analysis of Ordinal Responses. Biometrical Journal, 27(2), 189–197. http://doi.org/10.1002/bimj.4710270212

Greene, W. H. (2012). Econometric analysis. Boston; London: Pearson.

Hilbe, J. M. (2009). Logistic regression models. CRC press.

Hosmer Jr, D. W., & Lemeshow, S. (2004). Applied logistic regression. John Wiley & Sons.

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.

Kleinbaum, D. G., & Klein, M. (2010). Logistic Regression. New York, NY: Springer New York.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. SAGE.

Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using stata. Stata Press.

Pagano, M., Gauvreau, K., & Pagano, M. (2000). Principles of biostatistics (Vol. 2). Duxbury Pacific Grove, CA.

Pampel, F. C. (2000). Logistic regression: A primer (Vol. 132). Sage.

Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. Wiley.

For a clear introduction to logistic regression, see Hosmer Jr & Lemeshow (2004), Pagano, Gauvreau, & Pagano (2000), or Pampel (2000); for a non mathematical presentation of logistic regression, see Kleinbaum & Klein (2010); and for a thorough presentation, more formal, see Hosmer Jr, Lemeshow, & Sturdivant (2013). Consider also Gould (2000), Dupont & Dupont (2009) or Hilbe (2009) for an interpretation of the results.↩

For attribution, please cite this work as

Sanger & Warin, "SKEMA Global Lab in AI: Nigeria’s 2015 Presidential Election: A Spatial and Econometric Perspective Based on a Framing Strategy - Online Appendix", Figshare, 2019

BibTeX citation

@article{sanger2019NigeriaElection, author = {Sanger, William and Warin, Thierry}, title = {SKEMA Global Lab in AI: Nigeria’s 2015 Presidential Election: A Spatial and Econometric Perspective Based on a Framing Strategy - Online Appendix}, journal = {Figshare}, year = {2019}, note = {https://skemagloballab.io/posts/2019-04-12-nigerias-2015-presidential-election-online-appendix/}, doi = {10.6084/m9.figshare.7990835.v2} }