• Türkçe
    • English
  • English 
    • Türkçe
    • English
  • Login
View Item 
  •   DSpace Home
  • Mühendislik Fakültesi
  • Bilgisayar Mühendisliği Bölümü
  • Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu
  • View Item
  •   DSpace Home
  • Mühendislik Fakültesi
  • Bilgisayar Mühendisliği Bölümü
  • Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Learnıng Vısually-Grounded Representatıons Usıng Cross-Lıngual Multımodal Pre-Traınıng

View/Open
menekşe kuyu-yeni.pdf (5.193Mb)
Date
2020-09
Author
Kuyu, Menekşe
xmlui.mirage2.itemSummaryView.MetaData
Show full item record
Abstract
In recent years, pre-training approaches in the field of NLP have emerged with the increase in the number of data and developments in computational power. Although these approaches initially included only pre-training a single language, cross-lingual and multimodal approaches were proposed which employs multiple languages and modalities. While cross-lingual pre-training focuses on representing multiple languages, Multimodal pre-training integrates Natural Language Processing and Computer Vision areas and fuse visual and textual information and represent it in the same embedding space. In this work, we combine cross-lingual and multimodal pre-training approaches to learn visually-grounded word embeddings. Our work is based on cross-lingual pre-training model XLM which has shown success on various downstream tasks such as machine translation and cross-lingual classification. In this thesis, we proposed a new pre-training objective called Visual Translation Language Modeling (vTLM) which combines visual content and natural language to learn visually-grounded word embeddings. For this purpose, we extended the large-scale image captioning dataset Conceptual Captions to another language—German using state-of-the art translation system to create a cross-lingual multimodal dataset which is required in pretraining. We finetuned our pre-trained model on Machine Translation (MT) and Multimodal Machine Translation (MMT) tasks using Multi30k dataset. We obtained state-of-the-results on Multi30k test2016 set for both MT and MTT tasks. We also demonstrated attention weights of the model to analyze how it operates over the visual content.
URI
http://hdl.handle.net/11655/23170
xmlui.mirage2.itemSummaryView.Collections
  • Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [134]
xmlui.dri2xhtml.METS-1.0.item-citation
Kuyu, M. (2020). Learning Visually-Grounded Representations Using Cross-lingual Multimodal Pre-training. Yüksek Lisans tezi, Hacettepe Üniversitesi, Ankara.
Hacettepe Üniversitesi Kütüphaneleri
Açık Erişim Birimi
Beytepe Kütüphanesi | Tel: (90 - 312) 297 6585-117 || Sağlık Bilimleri Kütüphanesi | Tel: (90 - 312) 305 1067
Bizi Takip Edebilirsiniz: Facebook | Twitter | Youtube | Instagram
Web sayfası:www.library.hacettepe.edu.tr | E-posta:openaccess@hacettepe.edu.tr
Sayfanın çıktısını almak için lütfen tıklayınız.
Contact Us | Send Feedback



DSpace software copyright © 2002-2016  DuraSpace
Theme by 
Atmire NV
 

 


DSpace@Hacettepe
huk openaire onayı
by OpenAIRE

About HUAES
Open Access PolicyGuidesSubcriptionsContact

livechat

sherpa/romeo

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeDepartmentPublisherLanguageRightsIndexing SourceFundingxmlui.ArtifactBrowser.Navigation.browse_subtypeThis CollectionBy Issue DateAuthorsTitlesSubjectsTypeDepartmentPublisherLanguageRightsIndexing SourceFundingxmlui.ArtifactBrowser.Navigation.browse_subtype

My Account

LoginRegister

Statistics

View Usage Statistics

DSpace software copyright © 2002-2016  DuraSpace
Theme by 
Atmire NV