Words by Maugham

Sort through W. Somerset Maugham's 25 books, short stories and plays📖 Text Analysis Exercise > A Journalistic Story🦄️

By Eve Lu

August 8, 2022

There are times when I feel perplexed in life. It is almost a sense of obligation for me to feed myself with satire. This is how my life has been connected to Somerset Maugham, a great British novelist of the twentieth century who was known for his elaborate, lucid literary style but often with a wry and vicious tongue.

In this project, in order to practice my skill of text analysis, I scraped 25 accessible books including short stories and plays written by Maugham from Project Gutenberg, an online library of free electronic books where you will be provided with free access to plain text publications. I sorted through the publications with over 1.5 million words in total and analyzed: what are the words that were most and least commonly used by Maugham; and how he wrote about his female and male characters.

🔍 Hover and click the included books for further information👇

Note: there should be 27 e-publications in total that are currently available on Project Gutenberg. However, in my analysis, due to the big data concern, only 25 publications were included which appeared on the first page by searching author's name. Jack Straw: A Farce in Three Acts and The Unknown: A Play in Three Acts are excluded here.

Top 1000: What were Maugham's favorites?!

Hover to see how many times certain word has showed up 🧐

Note: this treemap includes top 1000 hot words that were mostly and possibly 'favored' by Maugham. In order to better focus on the data itself, I removed the low-level words by introducing NLTK to filter stopwords and manually added a few more by looking at the raw words collection list. If you want to check my customized stopwords list, please click here.

Words that only appeared once in all 25 publications

How many words that you have actually never seen before 🤭?

Note: the last 1000 words of the data are picked up here. Mis-splitting words by computation, misspelling or non-English words interference at original content and manual data processing might all possibly contribute to the limitation of this text analysis.

What were these novel characters doing?

Let's look at it by...gender in general?!

👆 Each bar represents one word. Explore the words that were hidden from the labels 🧐.

Depending on the large data amount, I only looked at the words that come after "he" and "she" and meanwhile that are counted over 17 times. 17 being picked up here is because it is the average counts (rounded) between males' avarage (19) and females' avarage (14).

For example, let's randomly look at a verb in the chart: "murmured".

When you click the word "murmured", a number "62.03" comes after it, which in this case, it actually means there is a 62.03% chance of the word being applied to female characters while males only take up 37.97% of the total counts.

Another fun fact to be aware is that, on the top of the chart, the word "wept" takes up 100% of the total counts, which means every time "wept" shows up among all the publications, it is only used for a female pronoun. Does it imply that male characters never weep in Maugham's works?! Well...probably🤠. Let's take a close look at the top 30 verbs that seperatly comes after "he" and "she" as shown below.

Note: for this part of the analysis, I manually removed the words that are not considered to be verbs instead of removing all the stopwords for better data accuracy. Each word's total counts vary from one to another. The chart is sorted by the percentage of the action performance by female characters.

Top 30: Gender Verbs Dynamics

Deep mining by gender🔧


Words such as "wept", "sobbed", "cried" are displayed with a relatively high percentage in females' action. When you go through all the words listed left, you will note most of them are meant to describe how a person emotionally express her feelings, instead of performing an action to interact with people. By contrast, verbs that applied to males are of more diversity. They are more than verbal behaviors description, which includes words such as "painted", "proposed", "wandered",etc.

However, we should note any text should be assessed within the context of its historical and cultural setting along together with the analysis and it is also unfair to discuss the phrasing appropriateness without understanding each protagonist's personality and his or her social status in different stories. Meanwhile, the limitation of times might also impact author's handling of characters of different genders. Therefore, to conduct an in-depth text analysis of all Maugham's works, the analysis should be applied separately on every sigle publication to ensure the rigor in this approach.