In search of the author
By Dr Laila Abdel Aal Alghalban
Professor of linguistics
Faculty of Arts
Kafrelsheikh University
Artificial intelligence (AI) technology is storming us with mind-boggling stories around the clock. Although I am not a technology nerd, l am always gripped by AI news. Indeed, AI has changed a lot of things in our lives, including the way we represent, discover, learn, process, use, analyse and even invest in our language. Within the fast-evolving field of digital humanities, the special interest and passion AI specialists have for the study of language is fuelled by the fact that language and cognition, in general, are key assets of human intelligence. It is no wonder, then, that AI, which is a simulation of human intelligence, places emphasis on language and teaches the machine to develop human-level natural language capabilities. For example, text generators are "trained" on a dataset of millions of web pages, and are able to "learn" how to generate new texts, abiding by the instructions or parameters of form and content fed to them by program developers. Furthermore, algorithms -computer mathematical formulae - are increasingly used: from simple things like sending an email where natural language processing system helps us check grammar, spelling, and filter spam, to Google search in which AI algorithms scan the web and start crafting personalized ads on products and services we might favor, to voice assistants, streaming services, using your past history to deliver suggestions for what you might want to read, watch and buy, to machine translation, and text-to-voice convertors, to machine-generated journalism and deep writing (to cite only a few); all are AI-powered. This week I came across an incredible story, at least to me.
Wasn't Shakespeare Shakespeare?
The story is about literary historians' conclusion that Christopher Marlowe was the co-author of Shakespeare's Henry VI; their conclusion is based on using computer-assisted parsing and textual analysis. That happened in 2016. But, for me, and possibly for many readers and Shakespeare's lovers worldwide, the news is shocking. It casts doubt on the marvelous works penned by the most celebrated bard and the biggest literary icon in human history. I kept tracing the issue and found out that doubts over whether Shakespeare really wrote his poetry and drama have spread around the academic circles and among literary historians since 1856. Before that, his authorship had never been questioned. Shakespeare authorship studies have singled out a lot of candidates including Francis Bacon, the Elizabethan diplomat Sir Henry Neville, the 17th Earl of Oxford Edward de Vere and Christopher Marlowe, the most popular candidates among his contemporaries, as the real authors of William Shakespeare's plays. Some have gone too far, proposing that Shakespeare was a fictional character, a pseudonym under which many talented and adventurous people wrote. Others wonder: Could he be an alien?!!
Fakespeare theory
Conspiracy theory advocates, including actors, critics and literary historians, claim that the themes, wisdom, lure, description of "the plays attributed to Shakespeare could only have been written by someone deeply familiar with court life, Elizabethan politics, Italy and France." Shakespeare's family background, the fact that he had been raised by a butcher, his father, and in the faraway Stratford-upon-Avon, his home, have ignited doubts over his authorship and served as the main grounds on which many doubters have built their arguments; "it was impossible for Shakespeare to have written all the plays because of his family background, and lack of opportunities," Richard Malim, general secretary of The De Vere Society, told BBC News Online. Lately, computation linguists and neuroscientists join the heated controversy, with focus being shifted from literature to algorisms, big data, statistics, natural language processing and machine learning. Welcome to the age of digital humanities! As a linguist, the story has left me chimed with more enthusiasm and passion for AI, though I still have some legitimate fears that the data, instructions and parameters of form and content used to feed algorithms might be biased towards or against Shakespeare or any one on the suspect list. At the end of the day, AI is a tool; it could be misleadingly used.
Author identification
To determine authorship, it was common to attribute works to authors based on the handwriting. With the advent of the internet, the need for authorship identification has become more stressful than ever. At the outset, some procedural preparations must be made, including data preparation, baseline performance observation, error analysis and data optimization. This is followed by administering the analysis to predict the author.
One computer-assisted textual analysis is called stylometric analysis. It is a sort of statistical stylistic analysis, utilizing linguistic features to identify authors. The idea is that writers, over their writing career, develop unconscious habits of using language: vocabulary, sentence structures, themes, punctuation, organization, etc. In other words, they develop idiosyncratic features and styles. Here are some of such features employed in stylometry to compare works and identify authors:
One is n-grams of texts. N-grams are sequences of n words in a corpus or collection of texts. Try Google n-gram viewer and search for the frequency of a word, a phrase, a collocate or a sequence of words over the last two centuries, N- grams will wow you with graphs illustrating the popularity rate and trajectory of the target word or structure. N-gram characters can help with topics. Function words such as 'the' and 'and' are key characters in determining style. Every writer has their own way of using parts of speech (POS), including the way words are used together, which is called bigrams and trigrams.
Another tool is Phraseology analysis; it investigates lexical diversity in the data, and "refers to the ratio of total number of words to the number of different unique word stems." Figurative language density is another parameter learned to algorisms; it is measured by comparing the ratio of figurative expressions, whose meaning is not compatible with the dictionary meaning, to the overall number of words in the data. Punctuation features also count in authorship investigation. Lexical usage analysis is a further principle taught to the machine, with special emphasis on the usage of certain words. The number of linguistic and stylistic variables that could be analysed is 'infinite'. The computed variables could cover a wide range of phonological, syntactic, semantic and lexical aspects of the investigated set of data. It is an exciting, yet complicated process, and it would be impossible to cover all of them.
For me, even if it turns out that the Shakespearean magic legacy is not his, I will continue to celebrate ' its unknown authors' regardless of who they are. My aha moments that I experience when reading Shakespeare will live on. Finally, Ronald Barthes' masterpiece 'The Death of the Author' might sound a sort of consolation to those who would not be able to cope with a different reality.