NLP — Blog — Language, culture, and data science

Language, culture, and data science

Posts in NLP

Emoji are great and/or they will destroy the world

Emoji are great and/or they will destroy the world

Outside of emoji researchers, lots of people still forecast disaster or dream of universal communication even if most of us are conﬁdent that neither is nigh. Despite our protests, emoji inspire visions of apocalypse and utopia.

As with many linguistic resources (sounds, words, syntax), people use emoji to grind all sorts of axes. For example, people who say that women use more emoji than men are usually making some point that the data don't support. The ﬁrst step in such an analysis is to ignore or discount the fact that, say, Snoop Dogg and Kyle MacLachlan are among the biggest emoji users in the world.

In this talk, I'll demonstrate how ideologies of emoji work themselves out across 870 journalists that political scientists have separately scored as liberal, conservative, or centrist. This lets us compare objective vs. subjective stances and inverts the idea that gender explains emoji to show how it is that emoji are a way that people "do" gender differently based on their political commitments.

Emoji, Gender, Language, Multimedia, NLPTyler SchnoebelenJune 25, 2018

Who do you trust most: a robot, an alarm clock or your partner?

Who do you trust most: a robot, an alarm clock or your partner?

I’ll kick off this post with a definition of trust, but focus on an analysis of trust in everyday and not-so-everyday situations: about 12,000 conversations among friends, family members and strangers. About 10% of all these conversations make some mention of trust. Then I’ll turn to more extreme situations exemplified by characters in 135 different TV shows, episodes are longer than conversations but on average 53% of TV episodes make at least one mention of trust.

ArtificialIntelligence, CorpusLinguistics, Language, NLP, Ethics, IndustryTyler SchnoebelenMay 9, 2018

Conversational AI and emoji

Conversational AI and emoji

A podcast you can listen to (or read the transcript of) about conversational AI's (chatbots!) and what we learn from thinking about emoji.

Emoji, ArtificialIntelligence, NLPTyler SchnoebelenAugust 18, 2017

The carrots and sticks of ethical NLP

The carrots and sticks of ethical NLP

Professions run into ethical problems all the time. Consider engineering: the US sold $9.9b worth of arms in 2016 ($3.9b in missiles). The most optimistic reading is that instruments of death prevent death. Consider medicine: Medical research is dominated by concerns of market size and patentability, leaving basic questions like “is this fever from bacteria or virus” unanswered for people treating illnesses in low-income countries. Consider law: Lawyers upholding the law can break any normal definition of justice. Even in philosophy, ethicists are not known to be more moral than anyone else.

ArtificialIntelligence, DataScience, Ethics, MachineLearning, NLPTyler SchnoebelenApril 12, 2017

Ethics in machine-learning, natural language processing, and AI

Ethics in machine-learning, natural language processing, and AI

This is the visual version of my 5-pg paper, “Goal-oriented design for ethical machine learning and NLP”, which you can find alongside a bunch of others by going to http://ethicsinnlp.com/program.

ArtificialIntelligence, DataScience, MachineLearning, NLP, EthicsTyler SchnoebelenMarch 27, 2017

Budgeting for Training Data

Budgeting for Training Data

Organizations build machine learning systems so that they can predict and categorize data. But to get a system to do anything, you have to train it. This post is meant to help you figure out a budget for training data based on best practices.

ArtificialIntelligence, DataScience, MachineLearning, NLPTyler SchnoebelenNovember 17, 2016

Trump does NOT talk like a woman (BREAKING NEWS: gender continues to be complicated and confusing)

Trump does NOT talk like a woman (BREAKING NEWS: gender continues to be complicated and confusing)

Tldr: gender doesn’t make for good soundbites if you’re doing it right.

Here’s a headline from Politico that is counter-intuitive, aggravating, and compelling: Donald Trump talks like a woman. I’d like to speak out on behalf of a bunch of linguists who say KNOCK IT OFF.

CorpusLinguistics, DataScience, Gender, NLP, PoliticsTyler SchnoebelenNovember 5, 2016

Extreme language in presidential debates: Reagan, Trump and everyone in betwee

Extreme language in presidential debates: Reagan, Trump and everyone in betwee

If you follow politics in America even a little bit, you know that Republicans talk a lot about taxes and that Donald Trump loves the word tremendous. But how do these rank relative to each other and to what Democrats (and Hillary Clinton, in particular) tend to talk about? Well, one finding is that over the years, Republican candidates have been even more preoccupied with Hillary Clinton than they have been with Ronald Reagan. Another finding is that the debates for the current election have been ~157% more negative than all previous debates.

CorpusLinguistics, Politics, NLP, EmotionTyler SchnoebelenOctober 19, 2016

U.S. presidential debates through the eyes of a computer

U.S. presidential debates through the eyes of a computer

This post wraps up a series I’ve been doing on using machine learning models to understand recent American political debates (here and here). By taking all the transcripts of the debates since last year, I show which words and phrases most distinguish debaters’ styles and issues. Training a computer to identify speakers is usually thought of as a way of doing forensics or personalization. But here, I’m interested in something closer to summarization. If you can pick one section of talk for each candidate from the last debate, which moments are most consistent with everything they’ve said up to then?

CorpusLinguistics, DataScience, MachineLearning, NLP, PoliticsTyler SchnoebelenOctober 13, 2016

The most Trumpian and Clintonesque moments in the debate (according to a computer)

The most Trumpian and Clintonesque moments in the debate (according to a computer)

Let’s teach a computer to guess who-said-what in the first US presidential debate between Hillary Clinton and Donald Trump. This is a way of finding out which moments the candidates were most like themselves — as well as when they were most like Bernie Sanders or Ted Cruz.

CorpusLinguistics, NLP, Politics, DataScience, MachineLearningTyler SchnoebelenSeptember 28, 2016

Why Technology Has Not Killed the Period. Period.

Why Technology Has Not Killed the Period. Period.

“Periods are not dead,” says computational linguist Tyler Schnoebelen, who turned to his own trove of 157,305 text messages to analyze how the final period—a period at the end of a thought or sentence—was being used and shared his initial results exclusively with TIME. “They’re actually doing interesting things.”

CorpusLinguistics, DataScience, NLP, PressTyler SchnoebelenSeptember 24, 2016

More data beats better algorithms

More data beats better algorithms

Most academic papers and blogs about machine learning focus on improvements to algorithms and features. At the same time, the widely acknowledged truth is that throwing more training data into the mix beats work on algorithms and features. This post will get down and dirty with algorithms and features vs. training data by looking at a 12-way classification problem: people accusing banks of unfair, deceptive, or abusive practices.

DataScience, MachineLearning, NLPTyler SchnoebelenSeptember 23, 2016

Nattering Nabobs of Negativity: Bigrams, “Nots,” and Text Classification

Nattering Nabobs of Negativity: Bigrams, “Nots,” and Text Classification

You can get pretty far in text classification just by treating documents as bags of words where word order doesn’t matter. So you’d treat “It’s not reliable and it’s not cheap” the same as “It’s cheap and it’s not not reliable”, even though the first is an strong indictment and the second is a qualified recommendation. Surely it’s dangerous to ignore the ways words come together to make meaning, right?

CorpusLinguistics, DataScience, Emotion, MachineLearning, NLPTyler SchnoebelenSeptember 8, 2016

Failed vs. fighting: the linguistic differences between speeches at the RNC and the DNC conventions

Failed vs. fighting: the linguistic differences between speeches at the RNC and the DNC conventions

We know that Republicans and Democrats talk differently, but what’s the best way to describe these differences? Commentators note the relative darkness of the Republican National Convention and the focus on optimism and higher production quality for the Democratic National Convention. Looking at the words speakers use helps–but you can’t just use simple frequency (for details, check out the methodology section at the bottom).

CorpusLinguistics, DataScience, NLP, PoliticsTyler SchnoebelenAugust 1, 2016

Searching For The Perfect Emoji For Any Occasion

Searching For The Perfect Emoji For Any Occasion

Thanks in part to the massive popularity of emojis, several tech companies are exploring ways not only to make finding emojis easier, but to predict which ones you may want to use.

Emoji, DataScience, Press, NLPTyler SchnoebelenJuly 11, 2016

The gender of artificial intelligence

The gender of artificial intelligence

There’s Apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa, and Nuance’s Nina. Sure, Facebook has “M”, Google has “Google Now”, and Siri’s voice isn’t always that of a woman. But it does feel worth noting that (typically male-dominated) engineering groups routinely give women’s names to the things you issue commands to. Is artificial intelligence work about Adams making Eves?

ArtificialIntelligence, DataScience, Gender, MachineLearning, NLPTyler SchnoebelenJuly 11, 2016

Getting a chatbot to understand 100,000 intentions

Getting a chatbot to understand 100,000 intentions

At their best, chatbots help you get things done. At their worst, they spew toxic nonsense. Whether we call them chatbots, intelligent agents, or virtual agents, the basic idea is that you shouldn’t need to bother with human interaction for things that computers can do quickly and efficiently: ask questions about a flight, manage your expenses, order a pizza, tell you the weather, and apply for a job. A lot of these are handy but may not feel quite like artificial intelligence–later in this post, we’ll tackle the relationship between detecting intentions, having conversations and building trust as the core pieces that make a chatbot feel more like artificial intelligence.

ArtificialIntelligence, DataScience, NLP, MachineLearningTyler SchnoebelenJune 29, 2016

Which new emoji will be the most popular?

Which new emoji will be the most popular?

June 21st is the release of Unicode 9, which will feature 72 new emoji–folks at Emojipedia have helpfully put them all together. The question in this blog post is: which ones will turn out to be the most popular? (Note that most people aren’t going to be able to use them immediately–you have to get an update of your phone/browser for them to show up and so will anyone you want to send them to.)

CorpusLinguistics, DataScience, Emotion, NLP, EmojiTyler SchnoebelenJune 20, 2016