Preparation for the first day of the Project Weekend

Computational Analysis of Communication — Spring Term 2024

CCS
Exercise
Data Science
Author

Felix Dietrich

To prepare for the next session, please complete the following tasks. Please submit this task until April 26th via ILIAS.

  1. Familiarize yourself with research on the dynamics of parasocial phenomena
  2. Specify intimacy and intensity as characteristics of (para)social relationships
  3. Brainstorm ideas for measurement through content analysis


1. Familiarize yourself with research on the dynamics of parasocial phenomena

To get a better idea of what entertainment researchers mean when they talk about parasocial phenomena (i.e., parasocial interactions and parasocial relationships), please familiarize yourself with the following literature which will be provided via ILIAS:

Mandatory reading:

Supplementary literature:

2. Specify intimacy and intensity as characteristics of (para)social relationships

Conduct a literature search (e.g., by combining Google Scholar search and databases such as Communication and Mass Media Complete or PSYNDEX) and identify one theoretical model of (para)social relationships that describes the role of intimacy and intensity as important characteristics of (para)social relationships.

  • Note down in a few sentences how this theory or model defines these constructs as well as other theoretical considerations of interest (e.g., how do they relate to each other, how do they develop over time, …).
  • Also identify at least three empirical studies that examine these constructs as defined in the theoretical model. Note down how they were operationalized (e.g., as dependent or independent variable; experimentally manipulated; measured through self-report; measured in some other way; …) as well as what the main findings of these studies were.

3. Brainstorm ideas for measurement through content analysis

Based on what you have learned so far about (para)social relationships as well as the characteristics of relationship intimacy and intensity, think about at least one way how these constructs could be observed or measured from written text (i.e., interpersonal communication). This could be for example by training a machine learning model or fine-tuning a LLM, or something completely different.

  • Think about and discuss for example what training data would we need, how could this data be generated, what potential problems could arise, …
  • Present your idea(s) as well as potential challenges in at least around 300-500 words.
  • In addition, you can and should also conduct another literature search to see if any research papers have applied a similar approach or anything that we could use for inspiration (or maybe even utilize a model or data from this research). Add any relevant finding to your submission.

As additional resources, you can find some (sometimes outdated) methods literature below.

Basic Principles of Content Analysis

  • Krippendorff, K. (2018). Content Analysis: An Introduction to its Methodology (4th edition). Sage.
  • Krippendorff, K., & Bock, M. A. (Eds., 2009). The Content Analysis Reader. Sage.
  • Neuendorf, K. A. (2017). The Content Analysis Guidebook (2nd edition). Sage. https://dx.doi.org/10.4135/9781071802878 (licensed access via Mannheim University Library)

Epistemolgy of Automated Media Content Analysis

  • Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. https://doi.org/10.1080/21670811.2015.1096598
  • Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028
  • Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., & Tingley, D. (2015). Computer-assisted text analysis for comparative politics. Political Analysis, 23(2), 254–277. https://doi.org/10.1093/pan/mpu019
  • Trilling, D., & Jonkman, J. G. F. (2018). Scaling up content analysis. Communication Methods and Measures, 12(2–3), 158–174. https://doi.org/10.1080/19312458.2018.1447655
  • Zamith, R., & Lewis, S. C. (2015). Content analysis and the algorithmic coder: What computational social science means for traditional modes of media analysis. The ANNALS of the American Academy of Political and Social Science, 659(1), 307–318. https://doi.org/10.1177/0002716215570576

Data Preprocessing in R & Python

  • Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168–189.
  • McKinney, W. (2018). Python for Data Analysis: Data Wrangling with pandas, NumPy, and IPython (2nd edition). O’Reilly. (licensed access via Mannheim University Library)
  • Wickham, H. (2019). Advanced R (2nd edition). CRC Press. https://adv-r.hadley.nz/ (open access)
  • Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly. https://r4ds.had.co.nz/ (open access)

Conducting Automated Media Content Analysis

  • Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (2021). Automated text classification of news articles: A practical guide. Political Analysis, 29(1), 19–42. https://doi.org/10.1017/pan.2020.8
  • Burscher, B., Vliegenthart, R., & De Vreese, C. H. (2015). Using supervised machine learning to code policy issues: Can classifiers generalize across contexts? The ANNALS of the American Academy of Political and Social Science, 659(1), 122–131. https://doi.org/10.1177/0002716215569441
  • Chan, C., Bajjalieh, J., Auvil, L., Wessler, H., Althaus, S., Welbers, K., van Atteveldt, W., & Jungblut, M. (2021). Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: A large-scale p-hacking experiment. Computational Communication Research, 3(1), 1–27. https://doi.org/10.5117/CCR2021.1.001.CHAN
  • Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition). O’Reilly. (licensed access via Mannheim University Library)
  • Lind, F., Eberl, J.-M., Heidenreich, T., & Boomgaarden, H. G. (2019). When the journey is as important as the goal: A roadmap to multilingual dictionary construction. International Journal of Communication, 13, 4000–4020.
  • Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754
  • Muddiman, A., McGregor, S. C., & Stroud, N. J. (2019). (Re)claiming our expertise: Parsing large text corpora with manually validated and organic dictionaries. Political Communication, 36(2), 214–226. https://doi.org/10.1080/10584609.2018.1517843
  • Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly. https://www.tidytextmining.com/ (open access)
  • van Atteveldt, W., Trilling, D., & Arcíla, C. (announced for 2021). Computational Analysis of Communication: A Practical Introduction to the Analysis of Texts, Networks, and Images with Code Examples in Python and R. Wiley.

Quality Control for Automated Content Analysis

  • Boukes, M., van de Velde, B., Araujo, T., & Vliegenthart, R. (2020). What’s the tone? Easy doesn’t do it: Analyzing performance and agreement between off-the-shelf sentiment analysis tools. Communication Methods and Measures, 14(2), 83–104. https://doi.org/10.1080/19312458.2019.1671966
  • Chan, C., & Sältzer, M. (2020). Oolong: An R package for validating automated content analysis tools. Journal of Open Source Software, 5(55), 2461. https://doi.org/10.21105/joss.02461
  • Chan, C. (2022). Sweater: Speedy word embedding association test and extras using R. Journal of Open Source Software, 7(72), 4036. https://doi.org/10.21105/joss.04036
  • Nelson, L. K. (2019). To measure meaning in big data, Don’t give me a map, give me transparency and reproducibility. Sociological Methodology, 49(1), 139–143. https://doi.org/10.1177/0081175019863783
  • Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociological Methods & Research, 50(1), 202–237. https://doi.org/10.1177/0049124118769114
  • Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H. G. (2020). In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550–572. https://doi.org/10.1080/10584609.2020.1723752
  • van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, 15(2), 121–140. https://doi.org/10.1080/19312458.2020.1869198
  • Vijayakumar, R., & Cheung, M. W.-L. (2021). Assessing replicability of machine learning results: An introduction to methods on predictive accuracy in social sciences. Social Science Computer Review, 39(5), 768–801. https://doi.org/10.1177/0894439319888445