Robots that incorporate social norms in their behaviors are seen as more supportive, friendly,and understanding. Since it is impossible to manually specify the most appropriate behavior for all possible situations, robots need to be able to learn it through trial and error, by observing interactions between humans, or by utilizing theoretical knowledge available in natural language. In contrast to the former two approaches, the latter has not received much attention because understanding natural language is non-trivial and requires proper grounding mechanisms to link words to corresponding perceptual information. Previous grounding studies have mostly focused on grounding of concepts relevant to object manipulation, while grounding of more abstract concepts relevant to the learning of social norms has so far not been investigated. Therefore, this paper presents an unsupervised cross-situational learning based online grounding framework to ground emotion types, emotion intensities and genders. The proposed framework is evaluated through a simulated human–agent interaction scenario and compared to an existing unsupervised Bayesian grounding framework. The obtained results show that the proposed framework is able to ground words, including synonyms, through their corresponding perceptual features in an unsupervised and open-ended manner, while outperfoming the baseline in terms of grounding accuracy, transparency, and deployability.