Preparation for the first day of the Project Weekend
Computational Analysis of Communication — Spring Term 2024
To prepare for the next session, please complete the following tasks. Please submit this task until April 26th via ILIAS.
- Familiarize yourself with research on the dynamics of parasocial phenomena
- Specify intimacy and intensity as characteristics of (para)social relationships
- Brainstorm ideas for measurement through content analysis
3. Brainstorm ideas for measurement through content analysis
Based on what you have learned so far about (para)social relationships as well as the characteristics of relationship intimacy and intensity, think about at least one way how these constructs could be observed or measured from written text (i.e., interpersonal communication). This could be for example by training a machine learning model or fine-tuning a LLM, or something completely different.
- Think about and discuss for example what training data would we need, how could this data be generated, what potential problems could arise, …
- Present your idea(s) as well as potential challenges in at least around 300-500 words.
- In addition, you can and should also conduct another literature search to see if any research papers have applied a similar approach or anything that we could use for inspiration (or maybe even utilize a model or data from this research). Add any relevant finding to your submission.
As additional resources, you can find some (sometimes outdated) methods literature below.
Basic Principles of Content Analysis
- Krippendorff, K. (2018). Content Analysis: An Introduction to its Methodology (4th edition). Sage.
- Krippendorff, K., & Bock, M. A. (Eds., 2009). The Content Analysis Reader. Sage.
- Neuendorf, K. A. (2017). The Content Analysis Guidebook (2nd edition). Sage. https://dx.doi.org/10.4135/9781071802878 (licensed access via Mannheim University Library)
Epistemolgy of Automated Media Content Analysis
- Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. https://doi.org/10.1080/21670811.2015.1096598
- Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028
- Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-assisted text analysis for comparative politics. Political Analysis, 23(2), 254–277. https://doi.org/10.1093/pan/mpu019
- Trilling, D., & Jonkman, J. G. F. (2018). Scaling up content analysis. Communication Methods and Measures, 12(2–3), 158–174. https://doi.org/10.1080/19312458.2018.1447655
- Zamith, R., & Lewis, S. C. (2015). Content analysis and the algorithmic coder: What computational social science means for traditional modes of media analysis. The ANNALS of the American Academy of Political and Social Science, 659(1), 307–318. https://doi.org/10.1177/0002716215570576
Data Preprocessing in R & Python
- Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168–189.
- McKinney, W. (2018). Python for Data Analysis: Data Wrangling with pandas, NumPy, and IPython (2nd edition). O’Reilly. (licensed access via Mannheim University Library)
- Wickham, H. (2019). Advanced R (2nd edition). CRC Press. https://adv-r.hadley.nz/ (open access)
- Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly. https://r4ds.had.co.nz/ (open access)
Conducting Automated Media Content Analysis
- Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated text classification of news articles: A practical guide. Political Analysis, 29(1), 19–42. https://doi.org/10.1017/pan.2020.8
- Burscher, B., Vliegenthart, R., & De Vreese, C. H. (2015). Using supervised machine learning to code policy issues: Can classifiers generalize across contexts? The ANNALS of the American Academy of Political and Social Science, 659(1), 122–131. https://doi.org/10.1177/0002716215569441
- Chan, C., Bajjalieh, J., Auvil, L., Wessler, H., Althaus, S., Welbers, K., van Atteveldt, W., & Jungblut, M. (2021). Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: A large-scale p-hacking experiment. Computational Communication Research, 3(1), 1–27. https://doi.org/10.5117/CCR2021.1.001.CHAN
- Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition). O’Reilly. (licensed access via Mannheim University Library)
- Lind, F., Eberl, J.-M., Heidenreich, T., & Boomgaarden, H. G. (2019). When the journey is as important as the goal: A roadmap to multilingual dictionary construction. International Journal of Communication, 13, 4000–4020.
- Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754
- Muddiman, A., McGregor, S. C., & Stroud, N. J. (2019). (Re)claiming our expertise: Parsing large text corpora with manually validated and organic dictionaries. Political Communication, 36(2), 214–226. https://doi.org/10.1080/10584609.2018.1517843
- Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly. https://www.tidytextmining.com/ (open access)
- van Atteveldt, W., Trilling, D., & Arcíla, C. (announced for 2021). Computational Analysis of Communication: A Practical Introduction to the Analysis of Texts, Networks, and Images with Code Examples in Python and R. Wiley.
Quality Control for Automated Content Analysis
- Boukes, M., van de Velde, B., Araujo, T., & Vliegenthart, R. (2020). What’s the tone? Easy doesn’t do it: Analyzing performance and agreement between off-the-shelf sentiment analysis tools. Communication Methods and Measures, 14(2), 83–104. https://doi.org/10.1080/19312458.2019.1671966
- Chan, C., & Sältzer, M. (2020). Oolong: An R package for validating automated content analysis tools. Journal of Open Source Software, 5(55), 2461. https://doi.org/10.21105/joss.02461
- Chan, C. (2022). Sweater: Speedy word embedding association test and extras using R. Journal of Open Source Software, 7(72), 4036. https://doi.org/10.21105/joss.04036
- Nelson, L. K. (2019). To measure meaning in big data, Don’t give me a map, give me transparency and reproducibility. Sociological Methodology, 49(1), 139–143. https://doi.org/10.1177/0081175019863783
- Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociological Methods & Research, 50(1), 202–237. https://doi.org/10.1177/0049124118769114
- Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H. G. (2020). In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550–572. https://doi.org/10.1080/10584609.2020.1723752
- van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, 15(2), 121–140. https://doi.org/10.1080/19312458.2020.1869198
- Vijayakumar, R., & Cheung, M. W.-L. (2021). Assessing replicability of machine learning results: An introduction to methods on predictive accuracy in social sciences. Social Science Computer Review, 39(5), 768–801. https://doi.org/10.1177/0894439319888445