Natural human-robot interaction requires robots to link words to objects and actions through grounding. Although grounding has been investigated in previous studies, not many considered grounding of synonyms and the majority of employed models only worked offline. In this paper, we try to fill this gap by introducing an online learning framework for grounding synonymous object and action names using crosssituational learning. Words are grounded through geometric characteristics of objects and kinematic features of the robot joints during action execution. An interaction experiment between a human tutor and HSR robot is used to evaluate the proposed framework. The results show that the employed framework is able to successfully ground all used words.