Preparation for the first day of the Project Weekend

Computational Analysis of Communication — Spring Term 2024

CCS

Exercise

Data Science

Author

Felix Dietrich

To prepare for the next session, please complete the following tasks. Please submit this task until April 26th via ILIAS.

Familiarize yourself with research on the dynamics of parasocial phenomena
Specify intimacy and intensity as characteristics of (para)social relationships
Brainstorm ideas for measurement through content analysis

1. Familiarize yourself with research on the dynamics of parasocial phenomena

To get a better idea of what entertainment researchers mean when they talk about parasocial phenomena (i.e., parasocial interactions and parasocial relationships), please familiarize yourself with the following literature which will be provided via ILIAS:

Mandatory reading:

Walter, N., Andrews, E. A., & Tukachinsky Forster, R. (2023). Initiation and evolution of PSRs. In R. Tukachinsky Forster (Ed.), The Oxford Handbook of Parasocial Experiences (1st ed., pp. 125–146). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780197650677.013.5
Leith, A. P. (2021). Parasocial cues: The ubiquity of parasocial relationships on Twitch. Communication Monographs, 88(1), 111–129. https://doi.org/10.1080/03637751.2020.1868544

Supplementary literature:

Bond, B. J. (2021). The development and influence of parasocial relationships with television characters: A longitudinal experimental test of prejudice reduction through parasocial contact. Communication Research, 48(4), 573–593. https://doi.org/10.1177/0093650219900632
Fahr, A., & Früh, H. (2021). Entertainment is a journey, not just a destination: Process perspectives in entertainment theories. In P. Vorderer & C. Klimmt (Eds.), The Oxford Handbook of Entertainment Theory (pp. 22–44). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190072216.013.2
Hartmann, T., & Goldhoorn, C. (2011). Horton and wohl revisited: Exploring viewers’ experience of parasocial interaction. Journal of Communication, 61(6), 1104–1121. https://doi.org/10.1111/j.1460-2466.2011.01595.x
Horton, D., & Wohl, R. (1956). Mass communication and para-social interaction: Observations on intimacy at a distance. Psychiatry, 19(3), 215–229. https://doi.org/10.1080/00332747.1956.11023049
Kurtin, K. S., O’Brien, N., Roy, D., & Dam, L. (2018). The development of parasocial interaction relationships on YouTube. The Journal of Social Media in Society, 7(1, 1), 233–252. https://www.thejsms.org/index.php/JSMS/article/view/304
Rubin, A. M., Perse, E. M., & Powell, R. A. (1985). Loneliness, parasocial interaction, and local television news viewing. Human Communication Research, 12(2), 155–180. https://doi.org/10.1111/j.1468-2958.1985.tb00071.x
Tukachinsky, R., & Stever, G. (2019). Theorizing development of parasocial engagement. Communication Theory, 29(3), 297–318. https://doi.org/10.1093/ct/qty032

2. Specify intimacy and intensity as characteristics of (para)social relationships

Conduct a literature search (e.g., by combining Google Scholar search and databases such as Communication and Mass Media Complete or PSYNDEX) and identify one theoretical model of (para)social relationships that describes the role of intimacy and intensity as important characteristics of (para)social relationships.

Note down in a few sentences how this theory or model defines these constructs as well as other theoretical considerations of interest (e.g., how do they relate to each other, how do they develop over time, …).
Also identify at least three empirical studies that examine these constructs as defined in the theoretical model. Note down how they were operationalized (e.g., as dependent or independent variable; experimentally manipulated; measured through self-report; measured in some other way; …) as well as what the main findings of these studies were.

3. Brainstorm ideas for measurement through content analysis

Based on what you have learned so far about (para)social relationships as well as the characteristics of relationship intimacy and intensity, think about at least one way how these constructs could be observed or measured from written text (i.e., interpersonal communication). This could be for example by training a machine learning model or fine-tuning a LLM, or something completely different.

Think about and discuss for example what training data would we need, how could this data be generated, what potential problems could arise, …
Present your idea(s) as well as potential challenges in at least around 300-500 words.
In addition, you can and should also conduct another literature search to see if any research papers have applied a similar approach or anything that we could use for inspiration (or maybe even utilize a model or data from this research). Add any relevant finding to your submission.

As additional resources, you can find some (sometimes outdated) methods literature below.

Basic Principles of Content Analysis

Krippendorff, K. (2018). Content Analysis: An Introduction to its Methodology (4th edition). Sage.
Krippendorff, K., & Bock, M. A. (Eds., 2009). The Content Analysis Reader. Sage.
Neuendorf, K. A. (2017). The Content Analysis Guidebook (2nd edition). Sage. https://dx.doi.org/10.4135/9781071802878 (licensed access via Mannheim University Library)

Epistemolgy of Automated Media Content Analysis

Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. https://doi.org/10.1080/21670811.2015.1096598
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028
Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-assisted text analysis for comparative politics. Political Analysis, 23(2), 254–277. https://doi.org/10.1093/pan/mpu019
Trilling, D., & Jonkman, J. G. F. (2018). Scaling up content analysis. Communication Methods and Measures, 12(2–3), 158–174. https://doi.org/10.1080/19312458.2018.1447655
Zamith, R., & Lewis, S. C. (2015). Content analysis and the algorithmic coder: What computational social science means for traditional modes of media analysis. The ANNALS of the American Academy of Political and Social Science, 659(1), 307–318. https://doi.org/10.1177/0002716215570576

Data Preprocessing in R & Python

Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168–189.
McKinney, W. (2018). Python for Data Analysis: Data Wrangling with pandas, NumPy, and IPython (2nd edition). O’Reilly. (licensed access via Mannheim University Library)
Wickham, H. (2019). Advanced R (2nd edition). CRC Press. https://adv-r.hadley.nz/ (open access)
Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly. https://r4ds.had.co.nz/ (open access)

Conducting Automated Media Content Analysis

Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated text classification of news articles: A practical guide. Political Analysis, 29(1), 19–42. https://doi.org/10.1017/pan.2020.8
Burscher, B., Vliegenthart, R., & De Vreese, C. H. (2015). Using supervised machine learning to code policy issues: Can classifiers generalize across contexts? The ANNALS of the American Academy of Political and Social Science, 659(1), 122–131. https://doi.org/10.1177/0002716215569441
Chan, C., Bajjalieh, J., Auvil, L., Wessler, H., Althaus, S., Welbers, K., van Atteveldt, W., & Jungblut, M. (2021). Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: A large-scale p-hacking experiment. Computational Communication Research, 3(1), 1–27. https://doi.org/10.5117/CCR2021.1.001.CHAN
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition). O’Reilly. (licensed access via Mannheim University Library)
Lind, F., Eberl, J.-M., Heidenreich, T., & Boomgaarden, H. G. (2019). When the journey is as important as the goal: A roadmap to multilingual dictionary construction. International Journal of Communication, 13, 4000–4020.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754
Muddiman, A., McGregor, S. C., & Stroud, N. J. (2019). (Re)claiming our expertise: Parsing large text corpora with manually validated and organic dictionaries. Political Communication, 36(2), 214–226. https://doi.org/10.1080/10584609.2018.1517843
Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly. https://www.tidytextmining.com/ (open access)
van Atteveldt, W., Trilling, D., & Arcíla, C. (announced for 2021). Computational Analysis of Communication: A Practical Introduction to the Analysis of Texts, Networks, and Images with Code Examples in Python and R. Wiley.

Quality Control for Automated Content Analysis

Boukes, M., van de Velde, B., Araujo, T., & Vliegenthart, R. (2020). What’s the tone? Easy doesn’t do it: Analyzing performance and agreement between off-the-shelf sentiment analysis tools. Communication Methods and Measures, 14(2), 83–104. https://doi.org/10.1080/19312458.2019.1671966
Chan, C., & Sältzer, M. (2020). Oolong: An R package for validating automated content analysis tools. Journal of Open Source Software, 5(55), 2461. https://doi.org/10.21105/joss.02461
Chan, C. (2022). Sweater: Speedy word embedding association test and extras using R. Journal of Open Source Software, 7(72), 4036. https://doi.org/10.21105/joss.04036
Nelson, L. K. (2019). To measure meaning in big data, Don’t give me a map, give me transparency and reproducibility. Sociological Methodology, 49(1), 139–143. https://doi.org/10.1177/0081175019863783
Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociological Methods & Research, 50(1), 202–237. https://doi.org/10.1177/0049124118769114
Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H. G. (2020). In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550–572. https://doi.org/10.1080/10584609.2020.1723752
van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, 15(2), 121–140. https://doi.org/10.1080/19312458.2020.1869198
Vijayakumar, R., & Cheung, M. W.-L. (2021). Assessing replicability of machine learning results: An introduction to methods on predictive accuracy in social sciences. Social Science Computer Review, 39(5), 768–801. https://doi.org/10.1177/0894439319888445