Category: Research Paper


by Calvin Leather, Yuqing Hu

In response to this article: http://www.jneurosci.org/content/36/39/10016

Recent literature in reinforcement learning has demonstrated that the context in which a decision is made influences subject reports and neural correlates of perceived reward. For example, consider visiting a restaurant where have previously had many excellent meals. Expecting another excellent meal, when you receive a merely satisfactory meal, your subjective experience is negative. Had you received this objectively decent meal elsewhere, without the positive expectations, your experience would have been better. This intuition is captured in adaptive models of value, where a stimuli’s reward (i.e. Q-value) is expressed as being relative to the expected reward in a situation, and it has been found that this accurately models activation in value regions (Palminteri et al 2015). Such a model also can be beneficial as it allows reinforcement learning models to learn to avoid punishment, as avoiding a contextually-expected negative payoff results in a positive reward. This had previously been challenging to express within the same framework as reinforcement learning models (Kim et al, 2006).

Alongside these benefits, there has been concern that adaptive models might be confused by certain choice settings. In particular, agents with an adaptive model of value would have an identical hedonic experience (i.e. Q-values in the model) when receiving a reward of +10 units, in a setting where they might receive either +10 or 0 units, and a reward of 0 units, in a setting where they might receive either -10 or 0 units (we will refer to this later as the ‘confusing situation’). With this issue in mind, Burke et. al. (2016) develop an extension to the adaptive model, where contextual information only has a partial influence on reward. So, whereas the previous, fully-adaptive model has a subjective reward (Q-value) of +5 units for receiving an objective reward of 0 in the context where the possibilities were 0 and -10, and an absolute model ignoring context would experience a reward of 0, the Burke model would experience a reward of +2.5. It takes the context into account, but only partially, and accordingly they call their model ‘partially-adaptive’. Burke et. al. compare this partially-adaptive model with a fully-adaptive model, and an absolute model (which ignores context). When subjects were given the same contexts and choices as the confusing situation outlined above, Burke et. al. found that the partially-adaptive model reflects neural data in the vmPFC and striatum better than the fully-adaptive or absolute models.

The partially-adaptive model is interesting, as it has the same advantages as the fully-adaptive model (reflecting subjective experience and neural data well, allowing for avoidance learning), while potentially avoiding the confusion outlined above. Here, we seek to investigate the implications and benefits of Burke et. al.’s partially-adaptive model more thoroughly. In particular, we will consider the confusion situation’s ecological validity and potential resolution, whether it is reasonable that partially-adaptive representations might extend beyond decision (to learning and memory), and the implications of the theory for future work. Before we do this we would like to briefly present an alternative interpretation of their findings.

The finding that the fMRI signal is best classified by a partially-adaptive model does not necessarily entail the brain utilizing a partially-adaptive encoding as the value over which decisions occur. All neurons within a voxel can influence the fMRI signal, so it is possible that the signal may reflect a combination of multiple activity patterns present within a voxel. This mixing phenomenon has been used to explain the success of decoding early visual cortex, where the overall fMRI signal in a voxel reflects the specific distribution of orientation-specific columns within a voxel (Swisher, 2010). Similarly, the partially-adaptive model’s fit might be explained by the average contribution of some cells with a full-adaptive encoding, and other cells with absolute encodings of value (within biological constraints). This concern is supported by the co-occurrence of adaptive and non-adaptive cells in macaque OFC (Kobayashi, 2010). Therefore, more work is needed to understand the local circuitry and encoding heterogeneity of regions supporting value-based decision making.

Returning to the theory presented by the authors, we would like to consider whether a fully-adaptive encoding of value is truly suboptimal. The type of confusing situation presented above was shown to be problematic for real decision makers in Pompilio and Kacelnik (2010), where starlings became indifferent between two options with different objective values, due to the contexts those options appeared in during training. However, this type of choice context might not be ecologically valid. If two stimuli are exclusively evaluated within different contexts, as in Pompilio and Kacelnik, it is not relevant whether they are confusable, as the decision maker would never need to compare them.

Separate from the confusion problem’s ecological validity is the inquiry into its solution. Burke et. al. suggest partially-adaptive encoding avoids confusion, and therefore should be preferred to a fully-adaptive encoding. However, this might only be true for the particular payoffs used in the experiment. Consider a decision maker who makes choices in two contexts. One, the loss context, has two outcomes, L0 (worth 0), and Lneg (worth less than 0), while the other, the gain context, has two outcomes, G0 (worth 0), and Gpos (worth more than 0). If L0-Lneg = Gpos– G0, as in Burke et. al., a fully-adaptive agent would be indifferent between G0 and Lneg (and between Gpos and L0). A partially-adaptive agent, however, would not be indifferent, as the value of G0 would be higher than Lneg.   Now consider what happens if we raise the value the value of Gpos. By doing this, we can raise the average value of the gain context by any amount. Now consider what this does the experienced value (Q-value) of G0. As we increase the average reward of the context, G0 becomes a poorer option in terms of its Q-value. Note that since the only reward we are changing is Gpos, the Q-values for the loss context do not change. Therefore, we can decrease the Q-value for G0 until it is equal to that of Lneg. This is exactly the confusion that we had hoped the partially-adaptive model would avoid. Furthermore, this argument will work for any partially-adaptive model: we are unable to defeat this concern by parameterizing the influence of context in the update equations, and manipulating this parameter.

As mentioned earlier, it is possible that some cells might encode partially-adaptive value, while others might have a fully- or non-adaptive encoding. We should be open to the possibility that even if partially-adaptive value occurs in decision, non-adaptive encodings might be used for storage of value information, and are transformed at the time of decision into the observed partially-adaptive signals. Why might this be reasonable? An agent who maintains partially-adaptive representations in memory faces several computational issues. One is efficiency: storage requirements for a partially-adaptive representation requires S*C quantities or distributions to be stored (one for each of the S stimuli in each of the C contexts). On the other hand, consider an agent who stores non-adaptive stimulus values and the average value of each context, and then adjusts stimulus values using the context values at the time of decision. They could utilize the same information, but only store S+C quantities. Another problem with storage of value information in an adaptive format is the transfer of learning across contexts. If I encounter a stimulus in context A, my experiences should alter my evaluation of that stimulus in context B: getting sick after eating a food should reduce my preference for that food in every context. An agent who stores value adaptively would need to update one quantity for the encountered stimulus in each context, namely C quantities. An agent who stores value non-adaptively only updates a single quantity. So, even if decision utilizes partially-adaptive encoding, non-adaptive representation is most efficient for storage. Furthermore, non-adaptive information is present in the state of the world (e.g. concentration of sucrose in a juice does not adapt to expectations), so this information is available to agents during learning. Accordingly, it must be asked why agents would discard information that might ease learning. While these differences do not necessarily affect the authors’ claims about value during decision, they should be considered when investigating the merits of different models of value.

In sum, while partial adaption is an exciting theory that may provide novel motivations for empirical work, more effort is needed to understand when and where it is optimal. If we can overcome these concerns, the new theory opens up potential investigation into the nature of contextual influence: if we allow a range of contextual influence (via a parameter) in the partially-adaptive model, do certain individuals have more contextual influence, and does this heterogeneity correlate with learning performance? Do different environments (e.g. noise in signals conveying the context) alter the parameter? Do different cells or regions respond with different amounts of contextual influence? As such, the theory opens up new experimental hypotheses that might allow us to better understand how the brain incorporates context in the learning and decision-making processes.

 

 

References

 

Burke, C. J., Baddeley, X. M., Tobler, X. P. N., & Schultz, X. W. (2016). Partial Adaptation of Obtained and Observed Value Signals Preserves Information about Gains and Losses. Journal of Neuroscience, 36(39), 10016–10025. doi:10.1523/JNEUROSCI.0487-16.2016

Kim, H., Shimojo, S., & Doherty, J. P. O. (2006). Is Avoiding an Aversive Outcome Rewarding ? Neural Substrates of Avoidance Learning in the Human Brain. PLoS Biology, 4(8), 1453–1461. doi:10.1371/journal.pbio.0040233

Kobayashi, S., Carvalho, O. P. De, & Schultz, W. (2010). Adaptation of Reward Sensitivity in Orbitofrontal Neurons. Journal of Neuroscience, 30(2), 534–544. doi:10.1523/JNEUROSCI.4009-09.2010

Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature Communications, 6, 1–14. doi:10.1038/ncomms9096

Pompilio, L., & Kacelnik, A. (2010). Context-dependent utility overrides absolute memory as a determinant of choice. PNAS, 107(1), 508–512. doi:10.1073/pnas.0907250107

Swisher, J. D., Gatenby, J. C., Gore, J. C., Wolfe, B. A., Moon, H., Kim, S., & Tong, F. (2010). Multiscale pattern analysis of orientation-selective activity in the primary visual cortex. Journal of Neuroscience, 30(1), 325–330. doi:10.1523/JNEUROSCI.4811-09.2010.Multiscale

Advertisements

Coauthored with Xiaoyan Lei, John Giles, Albert Park, John Strauss and Yaohui Zhao. Full paper link here. 

Abstract: 
Using the China Health and Retirement Longitudinal Study 2008 pilot, this paper analyzes the patterns and correlates of intergenerational transfers between elderly parents and adult children in Zhejiang and Gansu Provinces. The pilot is a unique data source from China that provides information on the direction as well as amount of transfers between parents and each of their children, and clearly distinguishes transfers between parents and children from those among other relatives or friends. The paper shows that transfers flow predominantly from children to elderly parents, with transfers from children playing an important role in elderly support. Taking advantage of the rich information available in this survey, the authors find strong evidence that transfers are significantly affected by the financial capabilities of individual children. Educated and married children have a higher tendency to provide transfers to their parents; and oldest sons are less likely to provide transfers than their younger brothers. With future continued rapid economic growth in China, the income disadvantage of the elderly will persist and upward generational transfers will likely remain the most common form of private transfers. In the absence of some other source of elderly support (such as a public pension or own savings), the dwindling number of children implies that the financial burden associated with supporting the elderly is likely to increase.

 

Econ 206 Term Paper, finally done!!! Thank Prof. Kibris and Prof. Becker!

Abstract: Pre-exam recruitment (PER) self-arranged by colleges in China is the alternative admission method outside the centralized National College Entrance Examination (NCEE) System, and it has become increasingly prevalent among colleges over the recent years. We attribute this rapid spread of PER to the reform of the admission policy which renders the admission mechanism vulnerable to college manipulation. The former “sequential mechanism” is equivalent to Boston mechanism, and it is not strategy-proof for students or colleges. Under that old mechanism, students strategize in a sophisticated level that leaves no incentive for colleges to manipulate, since the overall space for Pareto improvement is limited. The reform brings about the “parallel mechanism”, which generates the matchings that are equivalent to both the student-optimal and college-optimal deferred acceptance algorithm under the acyclic priority structure. Since students can only submit a fixed length of preference list in the application, this new mechanism is strategy-proof for them. However, it is manipulable for individual colleges because PER allows them to reallocate their type specific quotas. Colleges can reduce the quality gap between different types of students, so as to improve the overall qualities of students admitted. However, unlike the manipulation under the inefficient “sequential mechanism”, the “parallel mechanism” which is college optimal has no space for Pareto improvement for all the colleges as a whole. Under this mechanism, individual college benefits themselves at the expense of hurting other colleges, but some students of cyclic preferences strictly benefits from PER as they are able to attend their more favorable colleges. In equilibrium, all the colleges participate in PER and allocate their quotas in accordance with the distributions of students across types. However, this equilibrium is unattainable as the proportion of capacities set aside for PER can never be 100% under the existing education policy. Colleges thus keep enlarging this proportion as a response of other colleges’ PER, and this may partly explain why we observe the increasing prevalence of PER.

Coauthored with Xiaoyan Lei, James P. Smith and Yaohui Zhao. Full paper link here.

Abstract: 
Using the China Health and Retirement Longitudinal Study (CHARLS) 2008 pilot, the authors investigate the relationship between cognitive abilities and social activities for people aged 45 or older. They group cognition measures into two dimensions: intact mental status and episodic memory. Social activities are defined as participating in certain common specified activities in China such as playing chess, card games, or Mahjong, interacting with friends, and other social activities. OLS association results show that playing Mahjong, chess or card games and interacting with friends are significantly related with episodic memory, both individually and taken as a whole (any of the 3 activities), but individually they are not related to mental intactness while taken as a whole they are. Because social activities may be endogenous, they further investigate using OLS reduced form models whether having facilities that enables social activities in the community level is related to cognition. They find that having an activity center in the community is significantly related to higher episodic memory but no relation to mental intactness. These results point to a possible causal relationship between social activities and cognitive function, especially in strengthening short-term memory.

First draft, criticism and suggestions are welcome 🙂 Thank Jim, Prof. Lei, and Prof. Zhao.

The full paper link: click here

Abstract:
In this paper, we model gender differences in cognitive ability in China using a new sample of middle-aged and older Chinese respondents. Modeled after the American Health and Retirement Survey (HRS), the CHARLS Pilot survey respondents are 45 years and older in two quite distinct provinces—Zhejiang a high growth industrialized province on the East Coast, and Gansu, a largely agricultural and poor Province in the West. Our measures of cognition in CHARLS relies on two measures that proxy for different dimensions of adult cognition—episodic memory and intact mental status. We relate both these childhood health measures to adult health and SES outcomes during the adult years. We find large cognitive differences to the detriment of women that were mitigated by large gender differences in education among these generations of Chinese people. These gender differences in cognition are especially concentrated within poorer communities in China with gender difference being more sensitive to community level attributes than to family level attributes, with economic resources. In traditional poor Chinese communities, there are strong economic incentives to favor boys at the expense of girls not only in their education outcomes, but in their nutrition and eventually their adult height. These gender cognitive differences have been steadily decreasing across birth cohorts as the economy of China grew rapidly. Among younger cohorts of young adults in China, there is no longer any gender disparity in cognitive ability.