AI-meets-human-language

(1/8) A machine learning algorithm called Word2Vec was used for this project. It needs continuous text as training data – in this case transcripts of over 4000 TEDTalks.

(2/8) Before the training process, each word in every single sentence is assigned their context words which stand before or after a chosen word in the sentence. Those are later used as input for the machine learning algorithm.

(3/8) A so-called weight matrix, initially filled with random numbers, contains one column for each intended dimension of the word embedding, in my case 100 dimensions. Each row represents an indiviudal word included in the training text (TED talks).

(4/8) The machine training starts: The algorithm gets some context words as input (for simplification, only one word is represented in the sketch) and selects the equivalent rows in the matrix. That´s actually already the 100-dimensional vector of the word. Right now, it has no meaning because the numbers were randomly initiated. This will change.

(5/8) A second weight matrix is introduced. The rows also represent the individual words of the TEDtalks. By multiplying two rows, the probability of those words to be next to each other in a sentence is calculated. This is done for each row in the second matrix.

(6/8) The result is a probability for every row (every word) which sum op to 1 or 100%. In our example, the algorithm is supposed to predict the word “people“ but calculated a probability of 30%. This value should be increased.

(7/8) The last step is called back-propagation and is responsible for the “learning“ process. It updates the numbers in the weight matrices to better match the intended outcome. This causes the 100dimensional vectors of the words to change.

(8/8) Those steps are repeated for every extracted word combination in the training data. There are also several iterations – 20 iterations in my case. With each iteration, the predictions get better and the word vectors gain meaning when compared to each other.

____

Closing statement

AI is a fascinating topic which will constantly change and advance. But as it does, it is getting more and more complex, difficult to understand without a technical background. That’s why people get scared and might think that AI could develop its own "will“. I think the greatest strength of AI is identifying patterns of everyday life by analyzing big piles of data. It can give you a different perspective and make your life easier. That’s a powerful tool (which could also be used with bad intentions, of course). But a machine is not "intelligent“. It can’t "think“ or "understand“. Machine learning and human learning are two very different things. It´s important to keep that in mind while talking to Alexa or ChatGPT.
‍
Whatever your background is, I hope you enjoyed my journey, exploring the functionalities of word embeddings. If you do have questions or feedback, please get in touch!

Don´t forget to test the interactive tools for yourself. You will find all five links at the top of this page.

Contact Info

Anna-Lena Keith

@ FH Aachen – University of Applied Sciences | design

e-Mail: anna-lena.keith@alumni.fh-aachen.de

Instagram: _annalena_designprojekte

Portfolio (German)

AI meets human language

Explorative perspectives on Word Embeddings in Natural Language Processing (NLP)

____

What are Word Embeddings?

____

How are word embeddings created?

100 dimensions as 100 axes

____

The Chinese Room Argument

Cosine Similarity

What are vectors?

What does similarity mean in this context?

Expressions

TED Tags

Word Pairs

Vector operations

____

Closing statement

Contact Info

____

Exhibition @ FH Aachen in February 2023

____

Bachelor project as booklet