141028 Parlor Slides
-
Upload
brian-larson -
Category
Documents
-
view
14 -
download
1
description
Transcript of 141028 Parlor Slides
Imag
e: ©
flic
kr/s
rqpi
x C
C B
Y 2.
0
GENDER/GENRE: GENDER DIFFERENCES IN PROFESSIONAL WRITING
Brian N. Larson 29 October 2014
Current Research in Writing Studies
www.Rhetoricked.com @Rhetoricked
Housekeeping
• www.Rhetoricked.com (these slides + some additional)
• Communicate with me: – @Rhetoricked – [email protected]
• Research supported by: – Graduate Research Partnership Program fellowship (U of M
CLA), 2012 – James I. Brown Summer Research Fellowship, 2014
www.Rhetoricked.com @Rhetoricked
Gender, sex, and research constructs
• When I talk about my own data, I’ll refer to – Gender F authors/writers – Gender M authors/writers
• These categories may or may not correspond to other researchers’ – {woman, female, feminine} – {man, male, masculine}
• That’s the subject of another talk (or for Q&A)
www.Rhetoricked.com @Rhetoricked
Many researchers have asked
• Do men and women communicate differently?
• Much work inspired by Robin Lakoff (1975) • Scholarly and popular works by Deborah
Tannen (e.g. 1990[2001]) and others • Much of this research in oral/face-to-face
communication
www.Rhetoricked.com @Rhetoricked
Writing: Process and product
• In writing studies, we can (roughly) divide process and product – Do men and women produce writing using
different processes? – Is the writing they produce distinguishable
based on author gender?
www.Rhetoricked.com @Rhetoricked
Previous studies: Process research
• Focus on interpersonal communications in mixed-gender contexts – Lay, 1989 (Schuster); Rehling, 1996; Raign
& Sims, 1993; Ton & Klecun, 2004; Wolfe & Alexander, 2005; Brown & Burnett, 2006; Wolfe & Powell, 2006, 2009.
www.Rhetoricked.com @Rhetoricked
Previous studies: Product research
• In technical and professional communication – Sterkel, 1988 (20 stylistic chars) – Smeltzer & Werbel, 1986 (16 stylistic and
evaluative measures) – Tebeaux, 1990 (quality of responses) – Allen, 1994 (markers of authoritativeness)
• Manual methods, small samples
www.Rhetoricked.com @Rhetoricked
Enter computational methods
• Natural language processing (NLP) • Allows processing of large quantities of
text data • Study that attracted my attention
– Koppel, Argamon & Shimoni, 2002 (machine-learning algorithms)
– Argamon et al., 2003 (statistical analysis) – I’ll focus on Argamon et al. in this talk
www.Rhetoricked.com @Rhetoricked
Argamon et al. 2003
• Used 500 published texts from BNC • Mean 34,000 words (‘tokens’) per text • Statistical analysis showed
correspondence to Biber’s (1995) “informational/involved” dimension
www.Rhetoricked.com @Rhetoricked
Gender in computer-mediated communication (CMC)
• CMC popular for NLP studies – Data are readily available – Data are voluminous
• Examples – Herring & Paolillo, 2006 (blog posts, stat analysis) – Yan & Yan, 2006 (blog posts, MLA analysis) – Argamon et al., 2007 (blog posts, MLA analysis) – Rao et al., 2010 (Twitter, MLA analysis) – Burger et al., 2011 (Twitter, MLA analysis)
www.Rhetoricked.com @Rhetoricked
Rationale: Why is the question important?
• Lend support to one or more theories of gender – ‘Two cultures’ (Maltz & Borker, 1982) – ‘Standpoint’ (Barker & Zifcak, 1999) – ‘Performative’ (Butler 1993, 1999, 2004) – Others
• Sorting out methodological problems, particularly use of gender as a variable
www.Rhetoricked.com @Rhetoricked
Study design goals
• Research questions – Did Gender F and Gender M writers in a disciplinary
genre in which they are being trained use lexical and quasi-syntactic stylistic features with relative frequencies that varied with their genders?
– If so, did the differences appear in interpretable patterns?
• Examine a corpus of texts – All of the same genre – Where we can be confident of single authorship – Where author gender is self-identified
www.Rhetoricked.com @Rhetoricked
Data collection
• Major writing project at end of first year of law school – Students address hypothetical problem
(writing in same ‘genre’) – Students not allowed to collaborate – Plagiarism difficult (but still possible)
• Students self-identified gender* • 193 texts (mean word tokens = 3764) *This study IRB-approved (UMN Study #1202E10685)
www.Rhetoricked.com @Rhetoricked
Text genre: Memorandum regarding motion to dismiss
• Written to hypothetical court • Supporting or opposing a motion before
the court • High-level organization is formulaic
www.Rhetoricked.com @Rhetoricked
Memorandum Sections
• Caption** • Introduction/summary* • Facts • Legal standard of review* • Argument • Conclusion • Signature block**
* Not always present. **I did not analyze (content is highly formulaic)
www.Rhetoricked.com @Rhetoricked
Feature (“variable”) selection
• For now, those of Argamon et al. 2003 • Relative frequencies of
– 429 “function words” (Argamon used 405) – 45 parts of speech from the Penn
Treebank tagset (Argamon used 76 BNC POS tags)
– 100 common part-of-speech bigrams – 500 common POS trigrams
www.Rhetoricked.com @Rhetoricked
‘Part-of-speech’ tags? ‘Bigrams & trigrams’?
• First, ‘tokenize’ each sentence (automated): – ‘My aunt’s pen is on the table.’
www.Rhetoricked.com @Rhetoricked
POS tags
• Purple words are function words
• Tag the parts of speech (automated) • Then calculate relative frequency of
function words and POS tags (automated)
www.Rhetoricked.com @Rhetoricked
POS bigrams and trigrams • A bigram or trigram is a 2- or 3-token
‘window’ on the sentence. – Automated calculation
www.Rhetoricked.com @Rhetoricked
Feature (“variable”) selection
• First-person pronouns (total) – Singular: I, me, my, mine, myself. – Plural: We, us, our, ours, ourselves.
• Second-person pronouns: You, your, yours, yourself. • Third-person pronouns (total)
– Singular (total) • Feminine: She, her, hers, herself. • Masculine: He, him, his, himself.
– Plural: They, them, their, theirs, themselves. • Contractions: Including all instances of n’t, ’ld, ’ve, etc. • All relative frequencies calculated (automated)
www.Rhetoricked.com @Rhetoricked
Each student’s text is represented by variables
• A series of numerical values expressing each feature (variable), i.e., the relative frequency of: – Function words / total tokens – POS tags / total tokens – Bigrams / total bigrams* – Trigrams / total trigrams* – Pronouns – Automated calculation
*Multiplied by a factor.
www.Rhetoricked.com @Rhetoricked
Example 1
• Tokens of the function word-type “all” in paper 1007 account for less than 7/100 of 1% of all tokens in that paper.
www.Rhetoricked.com @Rhetoricked
Example 2
• Bigrams made up of a plural common noun (NNS) followed by a coordinating conjunction (CC) accounted for 1/10 of 1% of bigrams in paper 1009.
www.Rhetoricked.com @Rhetoricked
Mean relative frequencies calculated
• For each feature – Mean frequency (SD) for Gender F authors – Mean frequency (SD) for Gender M
authors – Statistical significance assessed with
Mann-Whitney U test (expressed as p-value)
• A priori threshold for significance: 0.05
www.Rhetoricked.com @Rhetoricked
What Argamon et al. 2003 found: Men
• Males used significantly more – Determiners, a, the, these – Determiner+noun bigrams: the books, a
dog, these Tories – Attributive-adjective+noun bigrams: great
leaders, old form – Prepositions: at, from, for, of, behind – Its
www.Rhetoricked.com @Rhetoricked
What Argamon et al. 2003 found: Women
• Females used significantly more – Pronouns (all)
• 1st person sing.: I, my, mine • 2nd person: you, yours • 3rd person: they, them, theirs
– Present tense verbs: walks, eradicates – Contractions – Negation with “not”
www.Rhetoricked.com @Rhetoricked
Informational/involved
• Biber (1995) labeled this a dimension of register variation after doing cluster analyses on frequencies to identify co-varying features as “dimensions”
• Consistent with popular conceptions and works such as Tannen (1990 [2001]) that characterize women as “affiliative” and men as “informative”
www.Rhetoricked.com @Rhetoricked
What I found: Nouns & determiners
• Nouns – Some categories showed non-significant
Gender F preference (weakly contradicting Argamon)
• Determiners and determiner+noun – Only significant: DET-NNP (proper noun) – But all showed non-significant Gender M
preference – (Overall, weakly supporting Argamon)
www.Rhetoricked.com @Rhetoricked
What I found: Adjectives & prepositions
• Attributive-adjective+noun – Non-significant Gender M preference
(weakly supporting Argamon) • Prepositions
– Non-significant Gender M preference (weakly supporting Argamon)
www.Rhetoricked.com @Rhetoricked
What I found: Pronouns (i.e., a mess)
• All pronouns: Non-significant Gender M preference (weakly contradicting Argamon)
• 1st p sing., 2nd p., 3rd p. overall, 3rd s. fem: Non-significant Gender F preference (weakly supporting Argamon)
• 3rd p. plural: Significant Gender M preference (contradicting Argamon)
• Its: Non-significant Gender F preference (weakly contradicting Argamon)
www.Rhetoricked.com @Rhetoricked
What I found: Verbs, contractions, “not”
• Present-tense verbs – Significant Gender M preference for 3rd p.
singular (contradicting Argamon) – Non-significant Gender M preference for the
rest (weakly contradicting Argamon) • Contractions: Non-significant Gender F
preference (weakly supporting Argamon) • Negation with “not”: (weakly supporting
Argamon)
www.Rhetoricked.com @Rhetoricked
The take-away?
• Statistics: The non-significant differences should probably be regarded as non-significant – In that case, M-informational/F-involved is not
confirmed in this study • If the non-significant differences are real,
evidence for M-informational/F-involved is still mixed, especially in pronouns and present-tense verbs
www.Rhetoricked.com @Rhetoricked
Explaining the findings with relevance theory
• Relevance theory (Sperber & Wilson 1995) recognizes the effects of habituation
• If boys and girls are acculturated to writing in certain genres and certain topics in their youths . . .
• . . . they may unconsciously habituate to certain (appropriate) word choices
• . . . and may not be completely free to vary their word choices consciously later.
www.Rhetoricked.com @Rhetoricked
Situating the findings within gender & language theories
• Findings weakly support or contradict – Two sociolinguistic cultures view (Maltz &
Borker 1982; Tannen 1990 [2001]) – Intersectionality/performativity views (Barker &
Zifcak 1999; Butler; many others) • Some gendered linguistic habits appeared
to resist retraining and conscious efforts to conform to register conventions . . .
• . . . others were apparently overcome.
www.Rhetoricked.com @Rhetoricked
I’m left with more questions than answers . . .
• But you are entitled to ask some questions now . . .
www.Rhetoricked.com @Rhetoricked
THANK YOU!
• www.Rhetoricked.com (these slides + some additional)
• Communicate with me: – @Rhetoricked – [email protected]
• Research supported by: – Graduate Research Partnership Program fellowship (U of M
CLA), 2012 – James I. Brown Summer Research Fellowship, 2014
www.Rhetoricked.com @Rhetoricked
Works cited Allen, J. (1994). Women and authority in business/technical communication scholarship: An analysis of writing... Technical Communication Quarterly, 3(3), 271. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3), 321–346. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the Blogosphere: Age, gender and the varieties of self-expression. First Monday, 12(9). Retrieved from http://firstmonday.org/issues/issue12_9/argamon/index.html Armstrong, C. L., & McAdams, M. J. (2009). Blogs of information: How gender cues and individual motivations influence perceptions of credibility. Journal of Computer-Mediated Communication, 14(3), 435–456. Barker, R. T., & Zifcak, L. (1999). Communication and gender in workplace 2000: creating a contextually-based integrated paradigm. Journal of Technical Writing & Communication, 29(4), 335. Biber, D. (1995). Dimensions of register variation : a cross-linguistic comparison. Cambridge ;;New York: Cambridge University Press. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python (1st ed.). O’Reilly Media. Brown, S. M., & Burnett, R. E. (2006). Women hardly talk. Really! Communication practices of women in undergraduate engineering classes (pp. T3F1–T3F9). Presented at the 9th International Conference on Engineering Education, San Juan, Puerto Rico: International Network for Engineering Education & Research. Retrieved from http://ineer.org/Events/ICEE2006/papers/3219.pdf Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Bedford, MA: MITRE Corporation. Retrieved from http://www.mitre.org/work/tech_papers/2011/11_0170/
Butler, J. (1993). Bodies that matter: on the discursive limits of“ sex.” New York: Routledge. Butler, J. (1999). Gender trouble. New York: Routledge. Butler, J. (2004). Undoing gender. New York: Routledge. Cunningham, H., Maynard, Diana, Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., … Peters, W. (2012, December 28). Developing Language Processing Components with GATE Version 7 (a User Guide). GATE: General Architecture for Text Engineering. Retrieved January 1, 2013, from http://gate.ac.uk/sale/tao/split.html Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Computational Biology, 9(2), e1002854. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10–18. Herring, S. C., & Paolillo, J. C. (2006). Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4), 439–459. Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401 –412. Lakoff, R. T. (1975/2004). Language and Woman’s Place: Text and Commentaries. (M. Bucholtz, Ed.) (Revised and expanded ed.). New York: Oxford University Press.
www.Rhetoricked.com @Rhetoricked
Works cited Lay, M. M. (1989). Interpersonal conflict in collaborative writing: What we can learn from gender studies. Journal of Business and Technical Communication, 3(2), 5–28. Maltz, D. N., & Borker, R. (1982). A cultural approach to male-female miscommunication. In J. J. Gumperz (Ed.), Language and social identity (pp. 196–216). Cambridge U.K.: Cambridge University Press. Pakhomov, S. V., Hanson, P. L., Bjornsen, S. S., & Smith, S. A. (2008). Automatic classification of foot examination findings using clinical notes and machine learning. Journal of the American Medical Informatics Association, 15, 198–202. Raign, K. R., & Sims, B. R. (1993). Gender, persuasion techniques, and collaboration. Technical Communication Quarterly, 2(1), 89–104. Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in Twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37–44). Toronto, ON, Canada: ACM. Rehling, L. (1996). Writing together: Gender’s effect on collaboration. Journal of Technical Writing and Communication, 26(2), 163–176. Smeltzer, L. R., & Werbel, J. D. (1986). Gender differences in managerial communication: Fact or folk-linguistics? Journal of Business Communication, 23(2), 41–50. Sperber, D., & Wilson, D. (1995). Relevance: Communication and Cognition (2nd ed.). Wiley-Blackwell. Sterkel, K. S. (1988). The relationship between gender and writing style in business communications. Journal of Business Communication, 25(4), 17–38. Tannen, D. (2001). You Just Don’t Understand: Women and Men in Conversation. William Morrow Paperbacks. Tebeaux, E. (1990). Toward an understanding of gender differences in written business communications: A suggested perspective for future research. Journal of Business and Technical Communication, 4(1), 25–43.
Tong, A., & Klecun, E. (2004). Toward accommodating gender differences in multimedia communication. Professional Communication, IEEE Transactions on, 47(2), 118–129. Wolfe, J., & Alexander, K. P. (2005). The computer expert in mixed-gendered collaborative writing groups. Journal of Business and Technical Communication, 19(2), 135–170. Wolfe, J., & Powell, B. (2006). Gender and expressions of dissatisfaction: A study of complaining in mixed-gendered student work groups. Women & Language, 29(2), 13–20. Wolfe, J., & Powell, E. (2009). Biases in interpersonal communication: How engineering students perceive gender typical speech acts in teamwork. Journal of Engineering Education, 98(1), 5–16. Yan, X., & Yan, L. (2006). Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 228–230).