CCBC-Net Archives

Mathematical Necessity of Diversity

From: Charles Bayless <charles.bayless_at_ttmd.com>
Date: Wed, 27 Feb 2013 10:24:10 -0500

I was surprised by the degree of diversity revealed by the statistical analysis of 100 children's books. If I am a member of one of the most common (in terms of percentages) groups and there is only 15% matching, what is driving that degree of diversity?

Further consideration leads me to believe that once again I have missed the forest for the trees. I got around to thinking about how one would express this phenomenon of reading diversity in mathematical terms. Now I am no mathematician, so the following expression is, I think, directionally right but clunky at best. How would we determine the number of books in which a child is likely to encounter a character with whom they can identify (where we are defining identify as sharing specified attributes)? Here is what I came up with.

Assuming universal access to books and assuming that all attributes with which children identify are proportionally and randomly present in book characters as they are in real life, then - Number of books with self-identity Function of Number of Books Read X Number of Characters per Book X Number of Attributes by which Child Identifies X Probability of each of those Attributes.

SI F (B X C X P(A sub1-n)). When expressed this way, it becomes much easier to see that diversity of reading is a mathematical inevitability, particularly where self-identity involves more than a couple of attributes. This is predicated on the null hypothesis that the characteristics of a reader are equally randomly distributed in literature. What this exercise shows is that, even if the null hypothesis is true, and that all characteristics are represented proportionally in literature as they are in real life, a child has a low probability of encountering their matching attributes in literature. If the child self-identifies based on two or more attributes and there are more than two elements associated with the attribute (for example, Race as used in America, has five elements: White, Black, Asian, Native American and Pacific Islander), then there is mathematical certainty that each child will only have a small degree of matching in their reading. Here is the construction of the model.

SI F (B X C X P(A sub1-n))

Number of Books Read (B) - What is the average number of books a child reads (or has read to them) between 0 and 18? I have data that indicates for the top 10% of readers it is on the order of 800-1000 books. However, reading proclivity follows a pareto distribution (a few readers do most of the reading) not a normal distribution. Consequently we are more interested in the median rather than the mean. The median is the number of books that divides the top half of readers from the bottom half. I am simply guessing (given that a very high percentage of households have near zero elective reading rates) that that median number is likely to be in the 200-300 range. I have chosen to use the high number of 300. It may be much lower, it could possibly be higher but I doubt it.

Number of Characters(C ) - Given that children are sometimes just as likely to identify with a secondary or tertiary character as the protagonist, what is the average number of identifiable characters in a book? I have no idea. It is a determinable number, I just haven't sat down with a random sample of books to calculate it. I have taken ten as an easy number. Could be five, mighty be twenty or thirty.

Probability (P) - How prevalent is a particular Attribute in the population at large? So if I am interested in Gender it is basically 50%. If it is Race it is roughly 65:15:13:5:1.5:0.5. etc.

Number of Attributes (Asub1-N) - This is the really interesting and squirrely variable. I suspect that both the number and the type of variables differ over a reading life cycle. Most the elements of identity are culturally developed with variable ages at which they develop. For a given child at a given moment in time, what are the most important elements by which they identify themselves and to what degree? Traditional candidates include Race, Ethnicity, Culture, Gender, Age, Class, Religion, Income level, Morbidity, Orientation, Familial Status, Domicile (urban, suburban, rural), Sibling status (only, first, middle, youngest), Social Stereotype (Nerd, Popular Crowd, Wallflower, Jock, Joker, etc.), Nationality, Social status (inclusion/exclusion), etc. The insight is that the more Attributes necessary to define your distinctive Self-Identity, the less likely you are to encounter such a character, even if they are representatively available in literature. The second insight is that there is a mathematically disadvantaging ratchet effect when the proportion of that trait is less than 50% in the general population.

Let's run a couple of examples. Say I am a fifteen year old, white, male, Evangelical, of general northwestern European/British heritage, and these are the key Attributes by which I think of myself (doubtful that a young teen male is doing all that much self-reflection but let's go with the example). So five key self-identity attributes. I have deliberately chosen what are common categories. Applying the equation to calculate the probability for a reasonably numerous group, we get SI F (300 X 10 X 12% (percent of population that are teens) X 75% (percentage that are white, US census 2011) X 49% (percentage that are male) X 26% (percentage that are Evangelical) X 13% (percentage of population reporting ancestry from the British Isles) 4.5 books, i.e. 1.5% of all books read (4.5/300). Even though I self-identify only on five attributes, I am, in my reading life cycle 0-18, unlikely to encounter more than 5 books where there is a character with whom I can identify based on just those five attributes.

At the other end of the spectrum, let's look at a more particular array of attributes that are also less common. Say I am 8 years old, Native American, female, Catholic, Rural. Again five attributes. Applying SI F (B X C X P(A sub1-n)), we get SI F (300 X 10 X 13% (percent of population 0-9 years old) X 1.5% (percent of population claiming Native American ancestry) X 25% (percent of population that are Catholic) X 51% (females) X 17% (percent of population rural) 0.12 books, i.e. 0.04% of all books read.

So in one scenario I encounter myself in five books over a childhood of reading and in the second, I encounter myself in a couple of chapters. One is much less well represented than the other but they are both hardly represented at all.

It is important to distinguish what this equation tells us and does not tell us. What it tells is that in a hypothetically perfectly represented world, where every attribute is proportionately represented in literature, there will never by a significant amount of matching if there are more than a couple of attributes by which a child self-defines and if there are more than a couple of elements to each attribute. Every child is always reading diversely, usually in a range above 80% non-matching. That's the perfect world outcome. This formula says nothing about the actual world of children's literature. It is unrealistic to expect that all attributes are proportionately represented, even if we don't have measures of where and to what extent there are those over and under-representations. This formula sheds no light on the actual proportion. But it does usefully tell us that even were we to achieve perfect representation, we would mathematically not see more than 20% matching and that the average degree of matc hing across a heterogeneous population of children with widely varying definitions of self-identity, would likely be the low single digits.

What does this equation tell us? Four key takeaways, 1) a confirmation that everyone is reading with great diversity, 2) the more attributes by which you define yourself, the less likely you are to see a match in literature, 3) If you are in the minority on any or most of the attributes by which you define yourself, the probability of randomly encountering a book which has a character that matches drops very close to 0, 4) Even if you are in the majority on most the attributes, you still have only a very small probability of matching.

What else can we glean from this exercise? First, I think it calls into question the rigid definition of identity attributes. I am guessing that at different stages in their life, kids will probably focus on no more than one or two key attributes, that those are not likely to be the ones about which we usually speak, and that those key attributes will change with some frequency. They are lonely, they feel unappreciated, they are bullied, they have high expectations of themselves, they are in love for the first time, they feel born in the wrong time period, they are worried about change, they seek admiration, etc. Those type of attributes are, I suspect, the attributes with which they identify and they will identify with characters that share those attributes regardless of the more obvious identities (ex. RCG). To some degree, this would be arguing that adults are imposing identities that don't fit, on children who don't care about those identities.

Second, it gives us the opportunity to more critically focus on what can be changed to improve the outcome. If we are bound and determined to increase SI matching, then what can we do? Let's unpack the equation, SI F (B X C X P(A sub1-n))

1 - Assumption 1: Universal access is not a reality, even if we are rapidly approaching it. For any particular population though, if they are not encountering themselves in literature, part of the solution is likely to increase their access, both real world and virtual.

2 - Assumption 2: The model is predicated on random selections. Change the parameters. The easiest, fastest way to increase identity matching is to move from random encounters with literature to directed encounters. In the portfolio of all available titles, almost regardless of how tightly you have defined your self-identity, there will be some number of books that match. They are out there, it is a matter of finding them. This becomes a task of definitions and matching: how do you define yourself and how can you effect a matching of that definition to the huge body of literature? It also becomes somewhat problematic from a philosophical and ethical perspective. To what extent is it appropriate for adults to discern how a child might be self-defining and then guiding them to those self-identity attributes, particularly given how volatile those attributes are likely to be. And which adults are doing the matching under what circumstances?

3 - SI: Focus on expanding the definition of self-identity. If you are defining yourself as having an expansive identity; "I am human", then you will have a very high matching ratio. The more you particularize your self-definition, the less likely you are to find yourself in the portfolio of literature. An MLK definition of defining people by the content of their character opens up far greater matching than if you are defining people by elements such as race or gender. Everyone or nearly everyone falls in love, wants to be admired for accomplishments, has their heart broken, has dark nights of doubt, thrills to discovery, etc. Not everyone can share the same race, culture, gender, etc.

4 - B: Increasing reading encounters and reading volume. The tighter your self-identity the less likely you are to find a match. In the absence of some efficient and effective market matching mechanism, then you only have two choices. First is to increase the volume of encounters or exposures. You are more likely to find a match among a population of 10,000 books than a collection of ten books. Second is to increase the volume of reading. If you read 1,000 books from 0-18, you are more than three times as likely to encounter a match than if you read 300.

5 - B: Reduce the barriers and increase the rewards of enthusiastic reading.

6 - C: Focus on books with a greater population of characters. Think Dickens and Tolstoy more than Crockett Johnson. Think histories rather than novels. The richer the character tapestry, the greater the probability for a match.

5 - P: Focus on pluralism; how am I like others, rather than how I am different.

6 - A: Focus on one or two attributes rather than many.

These are fairly fruitful questions and suggestive of many initiatives that might yield a much greater and richer reading experience for children.

Charles
Received on Wed 27 Feb 2013 10:24:10 AM CST