Etymological Wordnet

Introduction

Some might be surprised to find out that the English word "muscular" and the German word for the animal "bat" share the same origins.

The Etymological Wordnet project provides information about how words in different languages are etymologically related.

The information is for the most part mined from Wiktionary. The semi-structured data is turned into a machine-readable etymological database that also incorporates some additional manually added etymological relationships.

Access Data

Browse Online
A very basic interface to (an older version of) the data is provided at lexvo.com. A more advanced browsing interface will be available later.
Download
Download Etymological Wordnet 2013-02-08 version in Text (TSV) format
Java API
Programmatical access for JVM-based progamming languages such as Java, Scala, and so on. Download the UWN library and the Etymological Wordnet plugin.

Note: A new version is currently in the works and expected to be released in the summer of 2020. Please check back soon!

References


For academic use, please cite the following publications:

Gerard de Melo.
Etymological Wordnet: Tracing the History of Words  PDF   BibTeX
In: Proc. LREC 2014. ELRA, 2014, Paris, France.

Gerard de Melo and Gerhard Weikum.
Towards Universal Multilingual Knowledge Bases   PDF    BibTeX
In: Principles, Construction, and Applications of Multilingual Wordnets. Proceedings of
the 5th Global Wordnet Conference (GWC 2010)
.
Narosa Publishing 2010, New Delhi India.

Contact

Please get in touch with Gerard de Melo if you would like to contribute to the Etymological WordNet or if you have additional suggestions or research proposals. There are several ways in which the data could be improved if you have a specific research project in mind.

Further Resources

Universal Wordnet (UWN)
One of the largest multilingual knowledge graphs, transforming the well-known WordNet database into a massively multilingual resource covering over 1 million words and several million named entities in a single semantically organized hierarchy. This is based on machine learning along with the MENTA extension based on Wikipedia. Our derivative project OpenWordNet-PT (GitHub) is being used by Google Translate.
Lexvo.org
Contributes information about words and other language-related entities to the Linked Data Web and Semantic Web, leading to a Web of Data in which the British Library, the Spanish National Library, and others have linked their data to Lexvo.org, and Lexvo.org in turn connects its own data to other valuable resources.
Sentiment/Emotion
Datasets and resources for sentiment analysis and fine-grained emotion analysis, in part available for multiple languages.
Other Resources
We provide a number of other linguistic and knowledge-related resources.

 

Return to Main Page


© 2022 Gerard de Melo