Postings on science, wine, and the mind, among other things.
The act of sex is an essential element of life. Nearly all of Earth's plants and animals owe their existence to sexual reproduction. For (most) humans, sex is an intensely pleasurable activity. Desire for sex motivates, directly or indirectly, a wide range of important social, cultural, and economic behaviors.
Despite the importance of sexual activity to human life, shockingly little academic research is devoted to the subject and much of that is epidemiological in nature. Given the commonality and centrality of sex, it seems obvious that a scientific understanding of its psychological aspects would be an excellent investment in increasing human happiness.
Unfortunately, in many human societies sex is still to some degree taboo. Here in the US, half the population seems eager not to admit it happens and most of the rest fear offending such people. As a result, candid discussion, never mind research, of sex remains very much on the national backburner. Sex researchers are few in number, poorly funded, and frequently ridiculed.
While my own primary research interests are at best tangentially related, this blog post represents an effort on my behalf to expand what we collectively know about sex. Using Python and Beautiful Soup, I scraped ~300,000 erotic stories from Literotica.com, a popular site for sharing self-authored pieces. Each of these stores belonged to one of several categories discussed below. I removed entries from a number of non-story or catch-all categories, resulting in a final set of 23. I also scraped the public profiles of the authors (~65K) of these stories, obtaining a variety of useful self-reported information from many of them including sex, age, sexual orientation, and relationships status.
Shown below is a comparison wordcloud based on the text from these stories. I processed this text in a similar fashion to that in my previous post on what interests reddit. The Literotica categories are denoted by labels and the colors of words in their respective segments of the cloud. The size of the words scales with the degree to which they are disproportionately used in stories of that category (relative to the others). The figure was generated using R with the wordcloud package. Click here to view full size.
As you can probably tell, this method does a fairly good job of producing words associated with each category. However, the remainder of this post will be devoted to analyzing the associations between (self-reported) demographics and propensity to write stories in these categories. A more in-depth exploration of the text will have to wait until later.
Before we examine the associations between demographics and fantasy, it's probably a good idea to think about some basic descriptive statistics. The results reported below are derived from a dataset composed of 290,844 erotic stories by 62,789 authors. Of those authors, 33% were female, 44% were male, 2% were a couple sharing an account, .5% were transsexual, and the rest did not report their sex. Below you can see how these categories break down by age. Note that throughout all demographic explorations, I stick to labels used by Literotica.
Perhaps the most obvious trend here is the divergence of men and women in the later age groups, with female authors rapidly declining in frequency after age 40. Note that I decided to omit authors who listed themselves as transsexual from analysis. I was conflicted about excluding this group because on one hand, I did not want to contribute to the marginalization of people who already suffer from significant stigma. However, on the other hand, there were so few individuals in this sample (100s vs. 10,0000s for female and male) that I was worried that the results would not be at all representative of the larger transsexual population. What ultimately decided me in favor of omitting them was the fact that I lacked information to disambiguate the ultimate self-identified sex of these individuals. In addition to failing to respect such a central issue as identity, analyzing their data without such information would have made it very difficult to interpret.
The other two cases we focus on here are sexual orientation and relationships status. Breakdowns of these variables (with respect to sex) are shown below.
Straight men are far and away the most common group of authors on the site. Authors who identify as gay, meanwhile, are surpisingly uncommon. I've been told since the original publication of this post that this trend is likely due to the fact that homosexual authors tend to selectively patronize other sites.
Here we can see that most authors of both sexes report themselves as "attached", with single trailing close behind in frequency, swingers a distant but not insubstantial third.
Our tool of choice for visualizing these associations will be mosaic plots from the vcd package in R. The dimensions of the boxes in these plots scale with the proportion of stories in the respective cells of the data table. Throughout the rest of the article, gold boxes indicate disproportionately many stories while the blue green boxes indicate disproportionately few. If that doesn't make sense yet, don't worry - it should become clear in the interpretation of the first case below. Of course, bear in mind a couple of caveats while interpreting the results below: first, these results are ultimately correlational, and second, the sample (and the information they provide) is ultimately self-selecting and thus potentially biased. I also take it for granted that people write stories that largely reflect their own personal fantasies. With that in mind, let's get started!
The first case we will examine is the association between sex and erotic story category. Note that throughout this post, story number will be taken as a measure of interest in a particular subject. This practice deliberately conflates the number of people (within a group) writing in a particular category with the number of stories per capita that they produce. Since both seem like reasonable measures of enthusiasm, I simply collapse across them here.
Let's review how to interpret this type of chart again, now that we have a concrete example in front of us. On the x-axis, we have Female and Male - these correspond to the left-right position of the boxes throughout the chart. Thus the box on the left (regardless of size) always refers to stories written by women and the box on the right refers to stories writen by men. The length of the boxes (along the x-axis) indicates the proportion of stories within each category written by each sex. Similarly, the position of boxes on the y-axis correspond to the category of erotic story. The height of the boxes (y-axis) corresponds to the proportion of all stories that were in that category. Finally, the color of the boxes has nothing to do with sex - it has to do with expectations. We can calculate an expectation for how large each box 'should' be by looking at the overall proportion of stories written by women (or men) and the overall number of stories within a particular category. If the actual number of stories in that category written by women exceeds the expectation, the box is colored gold (bolder gold indicating a greater divergence). Similarly, boxes that have fewer stories than expected are colored blue green. The lines around the boxes also indicate whether the box is larger (solid) or smaller (dotted) than we should expect - an aid to colorblind readers.
Here we can see the categories that females and males write more about. Women seem to have a preference for BDSM, Erotic Couplings, Erotic Horror, Interracial Love, Lesbian Sex, NonConsent/Reluctance, NonHuman, Romance, and Toys & Masturbation stories. Meanwhile men are more interested in stories featuring Anal, Fetish, First Time, Gay Male, Group Sex, Humor & Satire, Incest & Taboo, Loving Wives, Mature, Mind Control, Sci-Fi & Fantasy, and Transsexuals and Crossdressers. Stories involving Celebrities and Exhibitionist & Voyeur appear to be relatively equally appealing to both sexes. In general, men write about 2/3rds of the stories, hence why the grey bars above do not meet in the center of the graph.
In all of the following cases, I separate effects by female and male. It is quite possible, and indeed likely, that there are additional interactions between the other demographic groups considered here. However, given the difficulty of visualizing and interpreting three-way categorical interactions, I confine myself to the most obvious two-way interactions here.
How does growing older affect one's sexual interests? Literotica data provides ample opportunity to investigate this question.
A few trends are easily observable amongst aging women. Older women seem to particularly prefer to write stories in the BDSM, Erotic Couplings, Group Sex, Loving Wives, and Mature categories. Meanwhile younger women seem to have the strongest preference for Interracial Love, NonConsent/Reluctance, NonHuman, Romance, and Sci-Fi & Fantasy genres. A couple of categories have notable peaks in the middle age bins that span the late 20s and early 30s: Celebrities & Voyeur. Note that although Literotica does have a 60+ range, I omit it for the women because there are so few stories in that bin. One definite priority for future exploration of the text will be to disambiguate roles in various categories. For example, does increasing interest in BDSM as women age indicate the rise of submissive or dominant tendencies?
The greying of men tells a rather different story from that of women. Older men seem to have few consistent preferences, but tend to focus on stories involving Romance, Loving Wives and Mature people. Younger men, too, only have a couple of clear preferences: Celebrities and Gay Male. All other story categories manifest quadratic of even more complicated trends. Amongst the former, middle aged men seem to be interested in Anal, BDSM, Erotic Couplings, Exhibitionism, Fetish, Interracial, Mind Control, NonConsent, Nonhuman, and Sci-Fi/Fantasy. Note that I retain the 60+ category here, and it is far from the least populated. This is particularly surprising in contrast to the results for women because it is likely that considerably more women than men are alive in that bracket.
Here we turn to an obviously relevant dimension: sexual orientation. Given the paucity of stories by gay authors, as noted above, I will focus on the "Straight" and "Bi" categories.
Bi women appear to write more stories in the BDSM, Celebrity, Exhibitionist, Fetish, Group Sex, Incest/Taboo, and Lesbian Sex categories. In contrast, Erotic Couplings, Interracial Love, Mature, NonConsent, NonHuman, and Romance categories appear to be dominated by straight women.
Stories in the categories of Anal, Fetish, Gay Male, Group Sex, Interracial, and Transsexuals & Crossdressers are disproportionately written by bisexual men, while nearly all of the other categories are strongly biased towards being written by straight men.
Our final case for this post will be relationship status. Literotica has four categories: Single, Swinger, Attached, and Curious. Since it's not clear to me precisely what the final category is mean to denote (curious about other types of relationship? if so, which kinds?), I omit it from the analysis below.
Single women tend to write mostly in the Celebrities, Incest/Taboo, Lesbian Sex, and Mature categories. Attached women populate the BDSM, Horror, Loving Wives, Romance, and Sci-Fi categories. Female swingers show particular interest in Group Sex and Interracial Love.
Single men write disproportionately in the categories Anal, BDSM, Celebrities, Horror, Gay Male, Incest/Taboo, Interracial Love, NonHuman and Sci-Fi & Fantasy. Attached men, meanwhile, show more interest in Exhibitionism, Loving Wives, and Romance. Swingers appear to show the greatest interest in Fetish and Group Sex categories.
I hope you've enjoyed this initial foray into understanding sexual fantasies with big data. In general I've deliberately refrained from commenting too much on the trends observed in the analyses above, leaving that as an exercise to the reader. However, it seem to me that on the whole the results support a number of established stereotypes. One of the more interesting aspects of the data are mismatches in interests. For example, BDSM seems to keenly interest women in relationships, but not their male counterparts. Similarly younger women seem interested in (presumably consensual) NonConsent/Reluctance, but have grown out of this by the time men have started to finding the idea exciting.
As I mentioned earlier, I've only begun to examine this data, and the text itself is virtually unexplored - so hopefully there will be even more surprising and salacious results to come. To help tide you over for the moment, however, please enjoy some additional mosaic plots showing associations between fantasies and weight (female & male), height (female & male), smoking (female & male), and drinking (female & male).
Want to learn more? Like about how certain sexual interests are associated with each other? Check out my next data blog on this topic here.
© 2015 Mark Allen Thornton. All rights reserved.