A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
TABLE OF CONTENTS
MODELING THE MIRROR: GRASP LEARNING AND
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
Copyright 2002 Erhan Oztop
To my Grandmother
The time I had at USC during the Ph.D. route was a very enriching period of my life. I had the opportunity to work in an exciting and stimulating research environment, led by Michael Arbib.
I would like to present my deepest gratitude to The Scientific and Technical Research Council of Turkey (TUBITAK) for providing me the scholarship that made it possible for me to attempt and complete the Ph.D. study presented in this thesis. The study would not have been possible if TUBITAK did not provide support for the very first semester and the final semester at USC.
I would like to thank Michael Arbib for guiding and educating me throughout the my years at USC. He has been an extraordinary advisor and mentor, whom I owe all the brain theory I learned. I would also like to present my gratitude to Michael Arbib for providing support via HFSP and USCBP grants. Without HFSP and USCBP support, this study would not be possible.
Stefan Schaal is a great mentor, who has been a constant support and source of inspiration with never-ending energy. Besides introducing me to Robotics, he and his colleague, Sethu Vijayakumar were very influential in maturating the concept of machine learning in my mind.
I also owe a great debt of gratitude to Nina Bradley for being a source positive energy and mentoring me especially in infant motor development. Without her, the thesis would be lacking a major component.
I am also full of gratitude to my Ph.D. qualification exam comittee members Maja Mataric and Christoph von der Malsburg for their guidance and support.
I am greatly indebted to Giacomo Rizzolatti for enabling me to visit his lab and providing the opportunity to communicate with not only himself but also with Vittorio Gallese and Leonardo Fogassi who have provided invaluable insights about mirror neurons. In addition, I am very thankful to Massimo Matelli and Giuseppe Luppino, for the first hand information on the mirror neuron system connectivity, and to Luciano Fadiga for stimulating discussions. I would also like to thank to Christian Keysers and Evelyne (Kohler) for not only actively involving me in their recording sessions but also offering their sincere friendship.
I am very thankful to Hideo Sakata for giving me the opportunity to visit his lab in Tokyo and interact with many researchers including Murata-san with whom I had very stimulating discussions.
I am deeply thankful to Mitsuo Kawato, for giving me the opportunity to interact with various researchers in Japan by having me at ATR during the summer of 2001. My research experience at ATR was very rewarding; I greatly expanded my knowledge on motor control and motor learning. I would like to salute the staff at ATR for all their help. I also would like to present my thanks to the friends at ATR for welcoming me and making me feel at home.
I would like to present my appreciation and thanks to my mentors at Middle East Technical University in Turkey. I present my sincere thanks to my masters advisor Marifi Guler for introducing me to neural computation and to Volkan Atalay for introducing me to computer vision, and supporting my Ph.D. application. Especially, I would like to present my gratitude to Fatos Yarman Vural for her guidance and support during my Masters study and for preparing me for the Ph.D. work presented in this dissertation. Other influential Computer Science professors to whom I am grateful for educating me are Nese Yalabik, Gokturk Ucoluk and Sinan Neftci.
I would like to present my gratitude to Tosun Terzioglu, Albert Ekip, Turgut Onder and Semih Koray who were professors of the Mathematics Department at the Middle East Technical University. They taught me how to think ‘right’ and exposed the beauty of mathematics to me.
Throughout these six years at USC, I had the pleasure to meet several valuable people who contributed to this dissertation. Firstly, I am very thankful to Aude Billard and Auke Jan Ijspeert for all their support and scientific interaction and feedback. They have a huge role in helping me get through the tough times during my Ph.D. work. In addition, I would like to thank Aude Billard for providing me the computing hardware and helping me have a nice working environment, which was very essential for the progress of my Ph.D. study. I am thankful to my great friend Sethu Vijayakumar for his support and stimulating discussions. Jun Nakanishi, Jun Mei Zu, Aaron D’Souza, Jan Peters and Kazunori Okada besides being of great support, were always there to discuss issues and helped me overcome various obstacles. I am deeply thankful to Shrija Kumari for offering not only her smile and friendship but also her energy to proofread my manuscript. I owe a lot to Jun Mei Zu: she has always been there as a great friend and has always offered her help and support when I needed it most. I am indebted to my ex-officemate and a very valuable friend Mihail Bota for his constant support and interactions for improving the thesis and providing me the psychological support to overcome many obstacles throughout my Ph.D. years. Finally, I would like to thank Juan Salvador Yahya Marmol for being a good friend and sharing my workload during various periods of my Ph.D. I count myself very lucky to have these great friends and colleagues whom once again I present my gratitude: Thank you guys!
Not a Hedco Neuroscience inhabitant but a very valuable friend, Lay Khim Ong was always there for offering her help both psychologically and physically (Hey Khim: thank you for your great editing!). I would like to thank other great friends who supported me (in spite of my negligence in keeping in touch with them). Kyle Reynolds, Awe Vinijkul, Aye Vinijkul Reynolds, Alper Sen, Ebru Dincel: please accept my sincere thanks and appreciation.
I owe deep gratitude to my wife Mika Satomi for her patience in dealing with me in difficult times. She was always there. Her contribution to this thesis is indispensable. I especially celebrate and thank her for the artistic talents and hard work that she generously offered me throughout the Ph.D. study.
I am greatly indebted to Paulina Tagle and Laura Lopez for their support and help over all these years. I also would like to thank Laura Lopez and Yolanda Solis for their kind friendship and support. My gratitude to Luigi Manna, who helped me with the hardware and software issues during the Ph.D. study.
I would like to thank Laurent Itti for his generous help for improving our lab environment and providing partial support for my research. Also, I would like to salute his lab members for their support and friendship. Florence Miau, in particular, had always offered her warm friendship during her internship at USC.
I would like to present my appreciation to the good things in life particularly, I would like to thank the ocean for comforting and rejuvenating me during difficult times.
Finally, I am deeply indebted to my family, to whom I owe much more than what can be expressed. This work would not be possible without the help of my parents. (Anne ve Baba, Evrim ve Nurdan: Hersey icin cok cok tesekkurler!)
Figure 2.1 Lateral view of macaque brain showing the areas of agranular frontal cortex and posterior parietal cortex (adapted from Geyer et al. 2000). The naming conventions: frontal regions, Matelli et al.(1991); parietal regions, Pandya and Seltzer (1982)
Figure 2.2 A canonical neuron response during grasping of various objects in the dark (left to right and top to bottom: plate, ring, cube, cylinder, cone and sphere. The rasters and histograms are aligned with object presentation. Small grey bars in each raster marks onset of key press, go signal, key release, onset of object pulling, release signal, and object release, respectively. The peaks in ring and sphere object cases correspond to the grasping of the object by the monkey (adapted from Murata et al. 1997a)
Figure 2.3 The motor responses of the same neuron shown in Figure 2.2. The motor preference of the neuron is also carried over to the visual preference (compare the ring and sphere histograms of both figures) (adapted from Murata et al. 1997a)
Figure 2.4 Activity of a cell during action observation (left) and action execution (right). There is no activity in presentation of the object during both initial presentation and bringing the tray towards the monkey. The vertical line over the histogram indicates the hand-object contact onset. (from Gallese et al., 1996).
Figure 2.5 Visual response of a mirror neuron. A. Precision grasp B. power grasp C. mimicking of precision grasp. The vertical lines over the histograms indicate the hand-object contact onset. (adapted from Gallese et al., 1996)
Figure 2.6 Example of a strictly congruent manipulating mirror neuron: A) The experimenter retrieved the food from a well in a tray. B) Same action, but performed by the monkey. C) The monkey grasped a small piece of food using a precision grip. The vertical lines over the histograms indicate the hand-object contact onset (adapted from Gallese et al., 1996).
Figure 2.7 The classification of area F5 neurons derived from published literature (Dipellegrino et al. 1992; Gallese 2002; Gallese et al. 1996; Murata et al. 1997a; Murata et al. 1997b; Rizzolatti et al. 1996a; Rizzolatti and Gallese 2001). All F5 neurons fire in response to some motor action. In addition, canonical neurons fire for object presentation while the mirror neurons fire for action observation. The majority of hand related F5 neurons are purely motor (Gallese 2002)(labelled as Motor Neurons in the figure)
Figure 2.8 The macaque parieto-frontal projections from mesial parietal cortex, medial bank of the intraparietal sulcus and the surface of the superior parietal lobule (adapted from Rizzolatti et al. 1998). Note that the Brodmann’s area 7m corresponds to Pandya and Seltzer's (1982) area PGm
Figure 2.10 An AIP visual-dominant neuron activity under three task conditions: Object manipulation in the light, object manipulation in the dark and object fixation in the light. The neuron is active during fixation and holding phase when the action is performed in light condition. However, during grasping in dark the neuron shows no activity. The fixation of the object alone without grasping also produces a discharge (adapted from Sakata et al. 1997a)
Figure 2.12 An AIP visual-dominant neuron’s axis orientation tuning and object fixation response is shown. The neuron fires maximally during the fixation of a vertical bar or a cylinder. The tuning is demonstrated in the lower half of the figure (adapted from Sakata et al. 1999)
Figure 2.13 Response of an axis-orientation-selective (AOS) neuron in the caudal part of the lateral bank of the intraparietal sulcus (c-IPS) to a luminous bar tilted 45° forward (left) or 45 backward (right) in the sagittal plane. The monkey views the bar with binocular vision. The line segment under the histograms mark the fixation start and the period of 1 second. (adapted from Sakata et al. 1999)
Figure 2.15 Orientation tuning of a surface-orientation selective (SOS) neuron. First row: Stimuli presented. Middle row: responses of the cell with binocular view. Last row: responses of the cell with monocular view (adapted from Sakata et al. 1997a)
Figure 3.2 AIP extracts the affordances and F5 selects the appropriate grasp from the AIP ‘menu’. Various biases are sent to F5 by Prefrontal Cortex (PFC) which relies on the recognition of the object by Inferotemporal Cortex (IT). The dorsal stream through AIP to F5 is replicated in the MNS model
Figure 3.3 Each of the 3 grasp types here is defined by specifying two "virtual fingers", VF1 and VF2, which are groups of fingers or a part of the hand such as the palm which are brought to bear on either side of an object to grasp it. The specification of the virtual fingers includes specification of the region on each virtual finger to be brought in contact with the object. A successful grasp involves the alignment of two "opposition axes": the opposition axis in the hand joining the virtual finger regions to be opposed to each other, and the opposition axis in the object joining the regions where the virtual fingers contact the object. (Iberall and Arbib 1990)
Figure 3.4 The components of hand state F(t) = (d(t), v(t), a(t), o1(t), o2(t), o3(t), o4(t)). Note that some of the components are purely hand configuration parameters (namely v,o3,o4,a) whereas others are parameters relating hand to the object
Figure 3.5 The MNS (Mirror Neuron System) model. (i) Top diagonal: a portion of the FARS model. Object features are processed by cIPS and AIP to extract grasp affordances, these are sent on to the canonical neurons of F5 that choose a particular grasp. (ii) Bottom right. Recognizing the location of the object provides parameters to the motor programming area F4 which computes the reach. The information about the reach and the grasp is taken by the motor cortex M1 to control the hand and the arm. (iii) New elements of the MNS model: Bottom left are two schemas, one to recognize the shape of the hand, and the other to recognize how that hand is moving. (iv) Just to the right of these is the schema for hand-object spatial relation analysis. It takes information about object features, the motion of the hand and the location of the object to infer the relation between hand and object. (v) The center two regions marked by the gray rectangle form the core mirror circuit. This complex associates the visually derived input (hand state) with the motor program input from region F5canonical neurons during the learning process for the mirror neurons. The grand schemas introduced in section 3.2 are illustrated as the following. The “Core Mirror Circuit” schema is marked by the center grey box; The “Visual Analysis of Hand State” schema is outlined by solid lines just below it, and the “Reach and Grasp” schema is outlined by dashed lines. (Solid arrows: established connections; dashed arrows: postulated connections)
Figure 3.6 (a) For purposes of simulation, we aggregate the schemas of the MNS (Mirror Neuron System) model of Figure 3.5 into three "grand schemas" for Visual Analysis of Hand State, Reach and Grasp, Core Mirror Circuit. (b) For detailed analysis of the Core Mirror Circuit, we dispense with simulation of the other two grand schemas and use other computational means to provide the three key inputs to this grand schema
Figure 3.7 (Left) The final state of arm and hand achieved by the reach/grasp simulator in executing a power grasp on the object shown. (Right) The hand state trajectory read off from the simulated arm and hand during the movement whose end-state is shown at left. The hand state components are: d(t), distance to target at time t; v(t), tangential velocity of the wrist; a(t), Index and thumb finger tip aperture; o1(t), cosine of the angle between the object axis and the (index finger tip – thumb tip) vector; o2(t), cosine of the angle between the object axis and the (index finger knuckle – thumb tip) vector; o3(t), The angle between the thumb and the palm plane; o4(t), The angle between the thumb and the index finger
Figure 3.9 (a) Training the color expert, based on colored images of a hand whose joints are covered with distinctively colored patches. The trained network will be used in the subsequent phase for segmenting image. (b) A hand image (not from the training sample) is fed to the augmented segmentation program. The color decision during segmentation is done by consulting to the Color Expert. Note that a smoothing step (not shown) is performed before segmentation
Figure 3.10 Illustration of the model matching system. Left: markers located by feature extraction schema. Middle and Right: initial and final stages of model matching. After matching is performed a number of parameters for the Hand configuration are extracted from the matched 3D model
Figure 3.11 The scaling of an incomplete input to form the full spatial representation of the hand state As an example, only one component of the hand state, the aperture is shown. When the 66 percent of the action is completed, the pre-processing we apply effectively causes the network to receive the stretched hand state (the dotted curve) as input as a re-representation of the hand state information accessible to that time (represented by the solid curve; the dashed curve shows the remaining, unobserved part of the hand state)
Figure 3.12 The solid curve shows the effective input that the network receives as the action progresses. At each simulation cycle the scaled curves are sampled (30 samples each) to form the spatial input for the network. Towards the end of the action the networks input gets closer to the final hand state
Figure 3.13 (a) A single grasp trajectory viewed from three different angles to clearly show its 3D pattern. The wrist trajectory during the grasp is marked by square traces, with the distance between any two consecutive trace marks traveled in equal time intervals. (b) Left: The input to the network. Each component of the hand state is labelled. (b) Right: How the network classifies the action as a power grasp: squares: power grasp output; triangles: precision grasp; circles: side grasp output. Note that the response for precision and side grasp is almost zero
Figure 3.14 Power and precision grasp resolution. The conventions used are as in the previous figure. (a) The curves for power and precision cross towards the end of the action showing the change of decision of the network. (b) The left shows the initial configuration and the right shows the final configuration of the hand
Figure 3.15: (Top) Strong precision grip mirror response for a reaching movement with a precision pinch. (Bottom) Spatial location perturbation experiment. The mirror response is greatly reduced when the grasp is not directed at a target object. (Only the precision grasp related activity is plotted. The other two outputs are negligible.)
Figure 3.16 Altered kinematics experiment. Left: The simulator executes the grasp with bell-shaped velocity profile. Right: The simulator executes the same grasp with constant velocity. Top row shows the graphical representation of the grasps and the bottom row shows the corresponding output of the network. (Only the precision grasp related activity is plotted. The other two outputs are negligible.)
Figure 3.17 Grasp and object axes mismatch experiment. Rightmost: the change of the object from cylinder to a plate (an object axis change of 90 degrees). Leftmost: the output of the network before the change (the network turns on the precision grip mirror neuron). Middle: the output of the network after the object change. (Only the precision grasp related activity is plotted. The other two outputs are negligible.)
Figure 3.18 The plots show the level of mirror responses of the explicit affordance coding object for an observed precision pinch for four cases (tiny, small, medium, big objects). The filled circles indicate the precision activity while the empty squares indicate the power grasp related activity
Figure 3.19 The solid curve: the precision grasp output, for the non-explicit affordance case, directed to a tiny object. The dashed curve: the precision grasp output of the model to the explicit affordance case, for the same object
Figure 3.20: Empty squares indicate the precision grasp related cell activity, while the filled squares represent the power grasp related cell activity. The grasps show the effect of changing the object affordance, while keeping a constant hand state trajectory. In each case, the hand-state trajectory provided to the network is appropriate to the medium-sized object, but the affordance input to the network encodes the size shown. In the case of the biggest object affordance, the effect is enough to overwhelm the hand state’s precision bias.
Figure 3.21 The graph is drawn to show the decision switch time versus object size. The minimum is not at the boundary, that is, the network will detect a precision pinch quickest with a medium object size. Note that the graph does not include a point for "Biggest object" since there is no resolution point in this case (see the final panel of Figure 3.19)
Figure 3.24 The plot shows the output of the MNS model when driven by the visual recognition system while observing the action depicted in Figure 3.22. It must be emphasized that the training was performed using the synthetic data from the grasp simulator while testing is performed using the hand state extracted by the visual system only. Dashed line: Side grasp related activity; Solid line: Precision grasp related activity. Power grasp activity is not visible as it coincides with the time axis
Figure 4.1 The elevated circular region corresponds to the area defined by the equation (x*x+y*y) <0.25. The environment returns +1 as the reward if the action falls into the circular region, otherwise –1 is returned.
Figure 5.1 Infant grip configurations can be divided in two categories: power and precision grips. Infants tend to switch from power grips to precision grips as they grow (adapted from Butterworth et al. 1997)
Figure 5.3 Hand Position layer specifies the approach direction of the hand towards the object. The representation is allocentric (centred on the object). Geometrically the space around the object can be uniquely specified with the vector (azimuth, elevation, radius). The Hand Position layer generates the vector by a local population vector computation. The locus of the local neighbourhood is determined by the probability distribution represented in the firing potential of Hand Position layer neurons (see Chapter 4, for details)
Figure 5.4:The grasp stability we used in the simulations is illustrated for a hypothetical precision pinch grip (note that this is a simplified, the actual hand used in the simulations has five fingers)
Figure 5.5 The trained model’s Hand Position layer is shown as a 3D plot. One dimension is summed to reduce the 4D map to a 3D map. Intuitively the map says: ‘when the object is above the shoulder and in front grasp it from the bottom’
Figure 5.6: The output of the trained model’s target position layer is shown as a 3D plot. One dimension is summed to reduce the 4D map to a 3D map. The object is on the left side of the (right handed) arm. Intuitively, the map says ‘when the object is on the left side grasp it from the right side of the object’
Figure 5.7 The learning evolution of the distribution of the Hand Position layer is shown as a 3D plot. Note that the 1000 neurons shown represent the probability distribution of approach directions. Initially, the layer is not trained and responds in a random fashion to the given input. As the learning progresses, the neurons gain specificity for this object location.
Figure 5.9 Two learned precision grips (left: three fingered; right four fingered) are shown. Note that the wrist configuration for each case. ILGM learned to combine the wrist location with the correct wrist rotations to secure the object
Figure 5.12 ILGM learned a ‘menu’ of precision grips with the common property that the wrist was placed well above the object. The orientation of the hand and the contact points on the object showed some variability. Two example precision grips are shown in the figure