# Program: Recent Advances in Info-Metrics Research April 8, 2021

All times Eastern Daylight Time. Presentation titles link to abstracts below.

## Workshop Openers

Welcoming Remarks: **Max Paul Friedman** (Dean, College of Arts and Sciences, American U)

10:00-10:10

Opener/Objectives: **Aman Ullah** (UC, Riverside)

10:10-10:20

## Session I — Invited: Deep Inference10:20-11:10

- Chair
- Robin Lumsdaine (American U)
- Speaker
**Karl J. Friston**(Trust Centre for Neuroimaging Institute of Neurology, UCL, UK)

Title:**Deep Inference**

Break: 11:10-11:20

## Session II: Info-Metrics Across Sciences11:20-12:20

- Chair
- John Harte (UC Berkeley)
- Speaker 1
**Michel Stutzer**(Leeds School of Business, U Colorado, Boulder)

Title:**Entropy in Asset Pricing Research**- Speaker 2
**Wojciech (Wojtek) Szpankowski**(Computer Science and Director Center for Science of Info, Purdue)

Title:**Precise Minimax Regret for Logistic Regression**- Speaker 3
**Ariel Caticha**(Department of Physics, SUNY-Albany)

Title:**Entropic Dynamics on Statistical Manifolds**

## Session III: Information Processing in Cell-Cell Communication12:20-1:00

- Chair
- Nataly Kravchenko-Balasha (Laboratory of biophysics and cancer research, Hebrew U)
- Speaker 1
**Amir Erez**(Racah Institute of Physics, Hebrew U, Jerusalem)

Title:**Cell-to-Cell Information at a Feedback-Induced Bifurcation Point**- Speaker 2
**Bo Sun**(Department of Physics, Oregon State U)

Title:**Information Flow in Multicellular Systems — from One to Many**

Lunch Break: 1:00-1:30

## Session IV: Information-Theoretic Econometrics, Statistics and Bayesian 1:30-2:30

- Chair
- Essie Maasoumi (Emory U)
- Speaker 1
**Eric Renault**(Economics, U of Warwick)

Title:**Identification of Beliefs in the Presence of Disaster Risk and Misspecification**- Speaker 2
**Duncan Foley**(New School for Social Research)

Title:**Bayesian Quantum Wave/amplitude Inference in Urn Mixture Models**- Speaker 3
**Aman Ullah**(Economics, UC Riverside)

Title:**A Flexible Information Theoretic Approach for Inference of Multiple Regression Function and Marginal Effects**

## Session V: Info-Metrics in Data Science 2:30-3:10

- Organizer
- Min Chen (Department of Engineering Science, Oxford U)
- Chair
- Duncan Foley (New School for Social Research)
- Speaker 1
**Mateu Sbert**(Graphics and Imaging Laboratory, U of Girona)

Title:**The Role of the Information Channel in Data Analysis**- Speaker 2
**Min Chen**

Title:**Finding a Bounded Measure for Estimating the Benefit of Bata Visualization**

Break: 3:10-3:30

## Session VI: Info-Metrics and AI Modeling 3:30-4:30

- Chair
- Eric Renault (Economics, U of Warwick)
- Speaker 1
- William F. Lawless (Math & Psychology, Paine College) and Ira S. Moskowitz (NRL)

Title:**Info-Metrics for Autonomous Human-Machine Teams and Systems (A-HMT-S)** - Speaker 2
**Ellis Scharfenaker**(Economics, U of Utah) and Duncan Foley

Title:**Unfulfilled Expectations and Labor Market Interactions: An Info-Metrics Approach**- Speaker 3
**Ximing Wu**(Agricultural Economics, Texas A&M)

Title:**Maximum Entropy Distribution Based on Quantile-grouped Interval Averages**

Break: 4:30-4:45

## Session VII: Info-Metrics in Action — Short Presentations 4:45-5:45

- Chair
- Rossella Bernardini Papalia (Department of Statistical Sciences, U Bologna)
- Speaker 1
- Tinatin Mumladze & Danielle Wilson (Department of Economics, American U)

Title:**Effect of Policy-relevant Factors on COVID-19 Patient Survival Probability: An Information-Theoretic Analysis** - Speaker 2
- Irene Bardi (Department of Statistical Sciences, U Bologna)

Title:**Big Data, Data Quality and Entropy:**

A proposal for data quality analysis of digital platforms - Speaker 3
- Danielle Wilson (Department of Economics, American U)

Title:**An Info-Metrics Approach to Estimating the Supplemental Poverty Rates of Public Use Microdata Areas** - Speaker 4
- Second Bwanakare (U of Information Technology and Management, Poland)

Title:**Introduction to Non-Extensive Cross Entropy (NCE) for Social Phenomena Modelling**

## Session VIII — Invited: Veridical Data Science and AI 5:45-6:35

- Chair
- J. Michael Dunn (Indiana U, Bloomington)
- Speaker
**Bin Yu**(Departments of Statistics and Electrical Engineering and Computer Sciences, UC Berkeley)

Title:**Veridical Data Science for responsible AI: characterizing V4 neurons through DeepTune**

Concluding Remarks

6:45-7:00

**Amos Golan** (American U and SFI) and **J. Michael Dunn** (Indiana U, Bloomington)

# Abstracts

Listed in chronological order of the workshop program.

# Deep Inference Karl J. Friston

In the cognitive neurosciences and machine learning, we have formal ways of understanding and characterising perception and decision-making; however, the approaches appear very different: current formulations of perceptual synthesis call on theories like predictive coding and Bayesian brain hypothesis. Conversely, formulations of decision-making and choice behaviour often appeal to reinforcement learning and the Bellman optimality principle. On the one hand, the brain seems to be in the game of optimising beliefs about how its sensations are caused; while, on the other hand, our choices and decisions appear to be governed by value functions and reward. Are these formulations irreconcilable, or is there some underlying imperative that renders perceptual inference and decision-making two sides of the same coin?

**Key words**: active inference ∙ cognitive ∙ dynamics ∙ free energy ∙ epistemic value ∙ self-organization

# Entropy in Asset Pricing ResearchMichel Stutzer

Three widely-studied, asset pricing research problems are the data-based prediction of derivative securities prices, the estimation and testing of extant asset pricing models, and the choice of shortfall-minimizing retirement portfolios. Entropy-based methods have been fruitfully used in all three areas. In addition to the familiar axiomatic, information-theoretic rationalization of entropic approaches to these problems, the statistical theory of large deviations provides an appealing frequentist rationale.

# Precise Minimax Regret for Logistic RegressionP. Jacquet (INRIA), G. Shamir (Google) and W. Szpankowski (Purdue)

In this talk we shall discuss online logistic regression with binary labels and general feature values in which a learner sequentially tries to predict an outcome/ label based on data/ features received in rounds. Our goal is to evaluate precisely the (maximal) minimax regret which we analyze using a unique and novel combination of information-theoretic and analytic combinatoric tools such as Fourier transform, saddle point method, and Mellin transform in the multi-dimensional settings.

To be more precise, the pointwise regret of an online algorithm is defined as the (excess) loss it incurs over a constant comparator (weight vector) that is used for prediction. It depends on the feature values, label sequence, and the learning algorithm. In the maximal minimax scenario we seek the best weights for the worst label sequence over all label distributions.

For dimension *d*=*o*(*T*)^{1/3 }we show that the maximal minimax regret grows as

*d/o* (log (2*T*/π) + *c _{d}* +

*O(*d

^{3/2}/√

*T)*

where * T* is the number of rounds of running a training algorithm and

*is explicitly computable constant that depends on dimension d and data. For features uniformly distributed on a d-dimensional sphere or ball we estimate precisely the constant*

**c**_{d}*showing that*

**c**_{d}**leading to the minimax regret growing for large**

*c*~—(_{d}*d*/2)log(*d/*√2π)**as**

*d***(**. We also extend these results to non-binary labels. The precise maximal minimax regret presented here is the first result of this kind for any feature values and wide range of

*d*/2)log(*T/d*) — (√8π) +*O*(1)**.**

*d*

# Entropic Dynamics on Statistical Manifolds Ariel Caticha

Entropic dynamics is a framework in which the laws of dynamics are derived as an application of entropic methods of inference. The most detailed applications so far have been mostly in physics (quantum mechanics, quantum field theory, and gravity) but the basic ideas are applicable to any generic system described by a probability distribution. The dynamics unfolds on a statistical manifold that is automatically endowed by a metric structure provided by information geometry. The curvature of the manifold can exert a significant influence. A central feature is that the model includes an “entropic” notion of time that is tailored to the system under study; the system is its own clock. As one might expect entropic time is intrinsically directional; there is a natural arrow of time which leads to a simple description of the approach to equilibrium.(This work has been carried out in collaboration with Pedro Pessoa and Felipe Xavier Costa.)

# Cell-to-cell Information at a Feedback-induced Bifurcation Point A. Erez, T.A. Byrd, M. Vennettilli, A. Mugler

A ubiquitous way that cells share information is by exchanging molecules. Yet, the fundamental ways that this information exchange is influenced by intracellular dynamics remain unclear. Here we use information theory to investigate a simple model of two interacting cells with internal feedback. We show that cell-to-cell molecule exchange induces a collective two-cell critical point and that the mutual information between the cells peaks at this critical point. Information can remain large far from the critical point on a manifold of cellular states but scales logarithmically with the correlation time of the system, resulting in an information-correlation time trade-off. This trade-off is strictly imposed, suggesting the correlation time as a proxy for the mutual information.

# Information Flow in Multicellular Systems

— from One to Many Bo Sun

*In collaboration with Ayelet Lesman lab (TAU), Mike Overholtzer lab (MSKCC), and Assaf Zaritsky (Ben Gurion U) with contributions from Amos Zamir, Assaf Nahum, Yishaia Zabary, Maor Boublil, Chen Galed.*

Abstract forthcoming.

**Related papers:**

- Ferroptosis occurs through an osmotic mechanism and propagates independently of cell rupture
- Emergence of synchronized multicellular mechanosensing from spatiotemporal integration of heterogeneous single-cell information transfer
- Quantifying the dynamics of long-range cell-cell mechanical communication

# Identification of Beliefs in the Presence of Disaster Risk and Misspecification Saraswata Chaudhuri, Eric Renault, Oscar Wahlstrom

This paper discusses the econometric underpinnings of Barro (2006)’s defense of the rare disaster model as a way to bring back an asset pricing model “into the right ballpark for explaining the equity-premium and related asset-market puzzles”. By construction, arbitrarily low-probability economic disasters can restore the validity of model-implied moment conditions only if the amplitude of economic disaster may be arbitrary large in due proportion. Unfortunately, we prove an impossibility theorem stating that in case of potentially unbounded disasters, there is no such thing as a population empirical likelihood-based model-implied probability distribution. In other words, one cannot identify some belief distortions for which the empirical likelihood-based implied probabilities in sample, as computed by Julliard and Ghosh (2012), could be a consistent estimator. This may lead to consider alternative statistical discrepancy measures to avoid the perverse behavior of empirical likelihood. By application of Csiszar (1995)’s generalized projections, we do prove that, under sufficient integrability conditions, power divergence Cressie-Read measures, with a positive power coefficient, properly define a unique population model-implied probability measure. However, when this computation is useful because the reference asset pricing model is misspecified, each power divergence will deliver a different model-implied beliefs distortion. Unfortunately, since empirical likelihood is ill behaved, there is no compelling argument for choosing one of these alternative discrepancy measures.

# Maximum Entropy Distribution Based on Quantile-grouped Interval Averages Ximing Wu

Samples are often summarized by a frequency table of intervals with known bound- aries. It is widely known that the Maximum Entropy (ME) distribution based on this type of summary is the histogram. Another common format of data summary is the averages of fixed frequency intervals (for instance, average of each sample decile). The histogram is not applicable here since typically the defining interval boundaries are not reported. We show that a maximum entropy quantile function arises naturally from entropy optimization subject to this type of summary statistics. We establish the existence and uniqueness of this solution. An algorithm and closed form solutions for the ME quantile function and its associated distribution, density and Lorenz curve are derived. A number of illustrations are provided. We also compare the informativeness of equal interval grouping and equal frequency grouping within the diagram of ME estimation.

# A Flexible Information Theoretic Approach for Inference of Multiple Regression Function and Marginal Effects Amos Golan, Tae-Hwy Lee, Millie Yi Mao, Aman Ullah

We develop an estimation procedure of multiple regression function and its derivatives (for marginal effects) based on multivariate maximum entropy (ME) density estimation. In estimating a multivariate density, the number of moment constraints in inferring a joint density function increases at a rate much faster than the dimension of the variables. This is because it depends on high order moments and cross moments of the variables. Hence, reducing the moment constraints by selecting the relevant moments is essential. We propose two methods for selecting the moment constraints. One is based on a Lasso-type regularization and the other is by testing the significance level of additional moment constraints. The ME-based estimators are asymptotically normal and root-n convergent. Monte Carlo simulations show that our proposed moment selection methods are consistent and produce the correct joint density estimation leading to superior estimation and prediction of the regression function and its derivatives. It is also shown that our entropy-based procedure outperforms nonparametric kernel-based methods in estimating and predicting the regression function and its derivatives. An empirical example of the out-of-sample prediction of the wage using education and experience shows that the ME method produces smaller mean squared prediction errors relative to parametric and nonparametric methods.

**Key words**: Shannon Entropy ∙ Information theory ∙ Multivariate maximum ∙ entropy distributions ∙ Multiple regression ∙ Recursive integration ∙ Moment constraint selection ∙ Lasso

**JEL Classification**: C1, C3, C5

# The role of the information channel in data analysis Mateu Sbert, Amos Golan, Miquel Feixas, Shuning Chen, Marius Vila

The concept of information (or communication) channel was introduced by Shannon to model the communication between source and receiver. This concept has been general enough to be applied to the joint distribution of any two variables and, beyond its original application to communication, has shown its usefulness in understanding and modeling data in several branches of science. We review here the basic concepts of the information channel and consider its application to social sciences, and in particular to social accounting matrix.

# Finding a Bounded Measure for Estimating the Benefit of Data Visualization Min Chen, Mateu Sbert, Alfie Abdul-Rahman, Deborah Silver

Most measurement systems are not ground truth. They are functions that map some reality to some quantitative values, in order to aid the explanation of the reality and the computation of making predictions. The cost-benefit measure proposed by Chen and Golan is one of such functions. While the cost-benefit measure successfully captures trade-offs in various data intelligence workflows, the values could shoot up toward infinity easily, hindering the reconstruction of the reality from the measured values. Motivated by the practical needs for a more interpretable cost-benefit measure, we have started a journey to find a function where the measurement of "benefit" is bounded in relation to the entropy of the information space concerned.

Info-Metrics for Autonomous Human-Machine Teams and Systems (A-HMT-S) W.F. Lawless, Ira S. Moskowitz

The info-metrics for the performance of a team or an organization are likely to follow a rational approach that confronts noisy, incomplete and uncertain information organized around Shannon’s theory of information. When these metrics are applied, uncertainty is commonly located in methods, algorithms, models or incomplete data. But info-metrics as presently constituted have to be transformed to confront autonomous systems facing uncertain environments, especially when the uncertainty is caused by opposing autonomous systems (e.g., conflict, deception, competition). We address this more difficult problem in our research with our theory of the interdependence of complementarity.

**Key words**:interdependence ∙ complementarity ∙ bistability ∙ uncertainty and incompleteness ∙ non-factorable information and tradeoffs ∙ info-metrics ∙ geometry

# Unfulfilled Expectations and Labor Market Interactions: An Info-Metrics Approach Ellis Scharfenaker & Duncan Foley

We apply the principle of maximum entropy to the problem of inferring equilibrium outcomes in a labor market comprised of informational-entropy constrained workers and employers. The model predicts persistent unemployment and job vacancies as a statistical feature of labor market interactions and unfulfilled expectations. A downward-sloping convex Beveridge curve emerges as a feature of the model. Worker and employer interactions also feedback on the wage and determine a statistical equilibrium wage distribution defined by individual behavioral parameters as well as market-level parameters.

Bayesian Quantum Wave/Amplitude Inference in Urn Mixture Models Duncan Foley

We compute the posterior probability of compositions of two urns from a sample drawn from them but without knowledge of which urn each sample observation comes from using both the classical probability method, where the frequency distributions describing the urns are real nonnegative numbers that sum to 1, and the quantum wave/amplitude method, where the frequency distributions describing the urns are complex numbers whose squared magnitudes sum to 1. When the source of sample observations is not known, these posterior probabilities differ due to an interference between the unobserved phases of the descriptions of the urns. The quantum wave method must imply posterior probabilities that compress the sample data at least as well as the classical method.

# Effect of Policy-relevant Factors on COVID-19 Patient Survival Probability: An Information-theoretic Analysis Amos Golan, Tinatin Mumladze, Danielle Wilson

The possibility of reoccurring waves of the novel coronavirus that triggered the 2020 pandemic makes it critical to identify underlying policy-relevant factors that could be leveraged to decrease future COVID-19, and other SARS-related, death rates. We examine variation in several underlying, policy-relevant, country-level factors and COVID-19 patients death rates across twenty countries. We find three such factors that significantly impact the survival probability of patients infected with COVID-19. In order of impact, these are universal TB (BCG) vaccination policies, air pollution deaths and health-related expenditure. We quantify each probability change by age and sex. To deal with our small sample size, high correlations, and inference at the tails of the distribution, we use an information-theoretic inferential method that also allows us to introduce prior information. These priors are constructed from independent SARS data.

# Big Data, Data Quality, and Entropy:

A proposal for data quality analysis of digital platforms Irene Bardi

Data is nowadays everywhere. During the past 20 years, the world data flow grew explosively in terms of speed, dimensions and types.

Data enters into every sphere of our life (society, economy, privacy and politics) with or without our knowledge of it happening. We are currently living in a data economy. This phenomenon is called ‘Big Data’, and in this work it will be presented from different points of view: definitions, sources, history, properties and, most important, its capacity to produce value.

The value of data is strictly connected to its use: through a process of elaboration and aggregation as well as by using of algorithms, Big Data is used to create a typed schema of individual behaviour that reveals correlation insights between choices, behaviours, actions, preferences, etc. Consequently, Big Data is a useful means to understand the needs of consumers and to build highly predictable models for supply and demand, services, cultural content, etc. but also to show their evolution. However, even if the amount of data is constantly growing, such amount can create a real value only if combined with quality: the poor Data Quality of Big Data can easily lead to big errors, as well as to low efficiency and decision-making mistakes.

In this work I summarized many aspects regarding the Data Quality of Big Data, analysing the related challenges by providing definitions, indicators and good practices; I also reported a hierarchical structure of a Data Quality framework, together with some methods and techniques for the Data Quality improvement. Then I described the application of an innovative approach to the Data Quality assessment that implies the use of the Entropy concept from the Information theory. Lastly, I described the phases of a Data Quality Project that I took part in a real company, and some of the procedures that were done to improve the Data Quality.

# An Info-Metrics Approach to Estimating the Supplemental Poverty Rates of Public Use Microdata Areas Danielle Wilson

The Supplemental Poverty Measure (SPM) is an extension of the Official Poverty Measure (OPM) that considers non-cash benefits, tax credits and necessary expenses when determining an individual’s poverty status. The U.S. Census Bureau annually produces SPM estimates using the Current Population Survey’s Annual Social and Economic Supplement (CPS-ASEC). Although the CPS-ASEC collects detailed income and relationship data, it can only produce estimates at the national and state level. Although the larger American Community Survey (ACS) can produce estimates at a more disaggregate level, it does not collect the detailed data necessary to produce SPM estimates. In 2020, the Census Bureau published a series ACS micro-data sets with imputed values for the necessary components considered by the SPM. However, the ACS estimates differ from their CPS-ASEC counterparts at both the national and state level. These differences are likely due to the additional imputation needed in the ACS but their magnitude is unknown at the Public Use Microdata Area (PUMA) level. Using an information-theoretic approach proposed by Papila & Fernandez-Vasquez (2020), the relative error of disaggregate SPM rates in the ACS is estimated by constraining their weighted average to more reliable state-level CPS-ASEC SPM rates. The estimated (relative) errors are subsequently used to refine all original average PUMA ACS rates for 2016 to 2018.

# Introduction to Non-extensive Cross-entropy (NCE) for Social Phenomena ModellingSecond Bwanakare

**Motivation:** The Tsallis NCE is suitable for statistical modelling of complex(non-ergodic) systems. It generalises the Shannon- Gibbs entropy and depicts the level of complexity of the system.

**Main topics:**

Power law(PL) in social sciences

Tsallis entropy vs Shannon-Gibbs entropy

q-Tsallis generalized Kullback_Leibler Information Divergence (q- TKL_ID))

Inference Application

# Veridical Data Science for Responsible AI: Characterizing V4 Neurons through DeepTune Bin Yu

"AI is like nuclear energy — both promising and dangerous" — Bill Gates, 2019.

Data Science is a pillar of AI and has driven most of recent cutting-edge discoveries in biomedical research. In practice, Data Science has a life cycle (DSLC) that includes problem formulation, data collection, data cleaning, modeling, result interpretation and the drawing of conclusions. Human judgement calls :wq:ware ubiquitous at every step of this process, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the "dangers" of AI To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). Through a workflow and documentation (in R Markdown or Jupyter Notebook) that allows one to manage the whole DSLC, the PCS framework unifies, streamlines and expands on the best practices of machine learning and statistics — bringing us a step forward towards veridical Data Science.

The PCS framework will be illustrated through the development of the DeepTune framework for characterizing V4 neurons. DeepTune builds predictive models using DNNs and linear regression and applies the stability principle to obtain stable interpretations of 18 predictive models. Finally, a general DNN interpretation method based on contextual decomposition (CD) will be discussed with applications to sentiment analysis and cosmological parameter estimation.