AugLy : Facebook’s New Augmentation Library

Trying out Facebook’s AugLy for text augmentation

Shivam Sharma
5 min readJul 12, 2021

In June of 2021, Facebook open-sourced a new python library which can be used for augmentation.

AugLy is a novel open source data augmentation library that combines multiple modalities: audio, image, video, and text, which is increasingly important in many AI research fields

With over 100 different augmentation methods focused on things people do on internet, like combining text and image or image and audio on platforms like Facebook and Instagram, AugLy aims to provide a tool to improve the robustness of a model to AI researchers and practitioners. For example, the sentence “Look how many people love you” changes meaning when added to a photo of an empty, barren land.

An example of hateful meme from Facebook’s Hateful Memes Challenge

AugLy offer various interesting augmentation methods for images, videos and audio dataset, but in this article we are going to explore the text augmentation techniques provided in this library.

Installation

AugLy requires python-magic as one of it’s dependencies, due to which I would recommend to use an iOS or a Linux based system. I am using Google Colab to install and perform all the experiments.

pip install -U augly                       
sudo apt-get install python3-magic

Using the above lines of code you can easily install AugLy in your system. If you are using Google Colab like me remember to restart the Runtime before proceeding forward.

Text Augmentation

Example of few augmentations provided by AugLy

AugLy provides 11 different augmentation methods for textual data.

  1. Apply Lambda : Apply a user-defined lambda function on a list of text documents.
  2. Get Baseline : Generates a baseline by tokenizing and detokenizing the text.
  3. Insert Punctuation : Inserts punctuation characters in each input text.
  4. Insert Zero Width Character : Inserts zero-width characters in each input text.
  5. Replace Bi-Directional : Reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps.
  6. Replace Fun Fonts : Replaces words or characters depending on the granularity with fun fonts applied.
  7. Replace Similar Characters : Replaces letters in each text with similar characters.
  8. Replace Similar Unicode Characters : Replaces letters in each text with similar unicodes.
  9. Flip Upside Down : Flips words in the text upside down depending on the granularity.
  10. Simulate Typos : Simulates typos in each text using misspellings, keyboard distance, and swapping.
  11. Split Words : Splits words in the text into sub-words.

We will be discussing a few interesting functions from the above list.

1. Inserting Punctuation

Let us assume that the sentence we want to augment is “Hello World! How are we doing today?”

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.insert_punctuation_chars(input_text, granularity='word')

The above transformation changes the input_text to the following:

"H.e.l.l.o W,o,r,l,d,! H,o,w a-r-e w'e d'o'i'n'g t...o...d...a...y...?"

Since we added the “word” level granularity each word has a different type of punctuation in it. If I replace the granularity with “all”, we get the following result

'H!e!l!l!o! !W!o!r!l!d!!! !H!o!w! !a!r!e! !w!e! !d!o!i!n!g! !t!o!d!a!y!?'

2. Replace Fun Fonts

This function replaces the text with words with different fonts.

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.replace_fun_fonts(input_text, granularity='word')

The above code gives us the following results

Hello World! How are we 𝔡𝔬𝔦𝔫𝔤 𝔱𝔬𝔡𝔞𝔶?

As we can see the part “doing today?” changed it’s font.

As before we can select the granularity between “word”, “all” and “char”, with “char” representing replacing characters instead of entire words.

Other than this we can also control the minimum and maximum number of words to be augmented with the help of “aug_min” and “aug_max” parameters. For example, changing the “aug_min” to 2, we get the following results.

Hello World! 𝘏𝘰𝘸 are we doing 𝘵𝘰𝘥𝘢𝘺?

3. Replace Similar Characters

This function replaces characters in the given input with similar looking characters, like “a” is replaced with “@”.

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.replace_similar_chars(input_text)

The above code replaces random characters and gives the following output

Hello World! How @re we doin9 today?

As we can see, “a” is replaced with “@” and “g” is replaced with “9”.

Similar to the replace font function, we can define the minimum and maximum number of words to be replaced in each word with “aug_char_min” and “aug_char_max” parameters. We can also change the probability of whether or not a word would be augmented with the parameter “aug_word_p”.

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.replace_similar_chars(input_text, aug_char_min=2, aug_word_p=0.6)

For example, the above code would give the following results.

H3l7o uuor7d! |-|ovv are vv3 doing 7od4y?

As we can see, a lot more words are augmented as compared to the previous example and also every word has atleast 2 replaced characters.

4. Flip Upside Down

This function flips the text upside down and also sideways.

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.replace_upside_down(input_text)

The above code returns the following output

¿ʎɐpoʇ ɓuᴉop ǝʍ ǝɹɐ ʍoH ¡plɹoM ollǝH

5. Simulate Typos

This function changes the text to look as though there are typos in the text. This is done using misspellings, keyboard distance, and swapping.

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.simulate_typos(input_text)

The above code gives the following output.

Hello World! How rae we donig today?

As we can see, the words “are” and “doing” are replaced by “rae” and “donig”, respectively.

Similar to the above functions, here also we can specify various probabilities to meet the expected output. For example, we can change the “aug_char_p” parameter to change the probability of replacing a letter in a word. We can change the minimum and maximum number of letters to be replaced in each word using the “aug_char_min” and “aug_char_max” parameters. We can also change the probability of a word being augmented using the parameter “aug_word_p”.’

import augly.text as txtaugsinput_text = "Hello World! How are we doing today?"aug_text = txtaugs.simulate_typos(input_text, aug_char_p=0.8, aug_word_p=0.6)

For example the above code gives the following output.

Heklo World! Hwo arw we donig todya?

As we can see, there are a lot more misspelling and swapping as compared to the last example.

These are the few augmentation methods provided by Facebook’s latest augmentation library AugLy. Though some of these augmentation techniques can change the text significantly thus making the augmented text loose it’s original meaning, using these augmentation to train a system would make it more robust and it might also perform better on unstructured, raw real-world data.

I aim to follow up on this by testing this library on various standardized text-based competitions and datasets, but please feel free to leave your thoughts or findings in the comments bellow.

Here is a link to Google Colab notebook I used to test out this library.

References

[1] Facebook’s AugLy Blog

[2] AugLy’s Github Repository

--

--

Shivam Sharma

Data Scientist working in the field of NLP, NLG and NLU