Hamidreza Ghader

ILPS alumnus, PhD in Computer Science, currently lead NLP/ML scientist at Contexta360
Research Interests
Neural Machine Translation
Statistical Machine Translation
Natural Language Processing
Statistical Machine Learning

Detailed CV

Office C3.230
Science Park 904
1098 XH Amsterdam
The Netherlands

Hamid dot [MY-FAMILY-NAME] at contexta360 dot com

I work as the lead NLP/ML scientist for Contexta360. I earned my PhD at University of Amsterdam as a member of Language Technology Lab. I worked on Statistical Machine Translation under supervision of Dr. Christof Monz and Prof. Maarten de Rijke. Before starting my PhD, I was working as a research assistant in Natural Language and Text Processing Laboratory at University of Tehran. There, we were developing Faraazin machine translation system.

Short Bio

BSc. in Computer Engineering at ECE Department, University of Tehran.

MSc. in Artificial Intelligence at Computer Engineering Department, Iran University of Science and Technology

PhD. in Computer Science, University of Amsterdam

Selected Works

Classifying Wikipedia in a fine-grained hierarchy: what graphs can contribute.

Wikipedia is a huge opportunity for machine learning, being the largest semi-structured base of knowledge available. Because of this, many works examine its contents, and focus on structuring it in order to make it usable in learning tasks, for example by classifying it into an ontology. Beyond its textual contents, Wikipedia also displays a typical graph structure, where pages are linked together through citations. In this paper, we address the task of integrating graph (i.e. structure) information to classify Wikipedia into a fine-grained named entity ontology (NE), the Extended Named Entity hierarchy.

More Info

An Intrinsic Nearest Neighbor Analysis of Neural Machine Translation Architectures.

Earlier approaches indirectly studied the information captured by the hidden states of recurrent and non-recurrent neural machine translation models by feeding them into different classifiers. In this paper, we look at the encoder hidden states of both transformer and recurrent machine translation models from the nearest neighbors perspective. We investigate to what extent the nearest neighbors share information with the underlying word embeddings as well as related WordNet entries. Additionally, we study the underlying syntactic structure of the nearest neighbors to shed light on the role of syntactic similarities in bringing the neighbors together.

More Info

What does Attention in neural machine translation pay attention to?

Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.

More Info

Which Words Matter in Defining Phrase Reorderings in Statistical Machine Translation?

We propose two models to use shorter sub-phrase pairs of an original phrase pair to smooth the phrase reordering distributions. In the first model we follow the classic idea of backing off to shorter histories commonly used in language model smoothing. In the second model, we use syntactic dependencies to identify the most relevant words in a phrase to back off to. We show how these models can be easily applied to existing lexicalized and hierarchical reordering models. The results show that not all the words inside a phrase pair are equally important in defining phrase reordering behavior and shortening towards important words will decrease the sparsity problem for long phrase pairs.

More Info

Automatic WordNet Construction Using Markov chain Monte Carlo

In this work we proposed a fully-automated approach for constructing a Persian WordNet. Our acquired WordNet has a precision of 90.46% which is a considerable improvement in comparison with automatically-built WordNets in Persian. Just send me an email if you want the WordNet.

More Info

Faraazin Machine Translation System

Formerly, I was a member of machine translation development team in Natural Language and Text Processing Laboratory at University of Tehran. There we developed a hybrid machine translation system which combines transfer-based models with statistical approaches.

More Info