2021 IEEE East Asian School of Information Theory

Poster abstracts

[1] Haiyun He, Optimal change-point detection with training sequences in the large and moderate deviations regimes

  • Abstact: This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. In contrast to most related works, we assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available. We further require the probability of the emph{estimation error} to decay either exponentially or sub-exponentially fast (corresponding respectively to the large and moderate deviations regimes in information theory parlance). Based on the training sequences as well as the test sequence consisting of a single change-point, we design a change-point estimator and further show that this estimator is optimal by establishing matching (strong) converses. This leads to a full characterization of the optimal confidence width (i.e., half the width of the confidence interval within which the true change-point is located at with high probability) as a function of the undetected error, under both the large and moderate deviations regimes.

[2] Jaewoong Cho, A fair classifier using kernel density estimation

  • Abstact: As machine learning becomes prevalent in a widening array of sensitive applications such as job hiring and criminal justice, one critical aspect in the design of machine learning classifiers is to ensure fairness: Guaranteeing the irrelevancy of a prediction to sensitive attributes such as gender and race. This work develops a kernel density estimation (KDE) methodology to faithfully respect the fairness constraint while yielding a tractable optimization problem that comes with high accuracy-fairness tradeoff. One key feature of this approach is that the fairness measure quantified based on KDE can be expressed as a differentiable function w.r.t. model parameters, thereby enabling the use of prominent gradient descent to readily solve an interested optimization problem. This work focuses on classification tasks and two well-known measures of group fairness: demographic parity and equalized odds. We empirically show that our algorithm achieves greater or comparable performances against prior fair classifers in accuracy-fairness tradeoff as well as in training stability on both synthetic and benchmark real datasets.

[3] Moonseok Choi, Equal experience of recommender systems

  • Abstact: We explore the fairness issue that arises in recommender systems. Biased data due to inherent stereotypes of particular groups (e.g., female students prefer humanities to mathematics, and vice versa for males) may yield a limited scope of suggested items to a certain group of users. Our main contribution lies in the introduction of a novel fairness notion (that we call equal experience), which can serve to regulate such unfairness in the presence of biased data. The notion captures the degree of the equal experience of item recommendations across distinct groups. We propose an optimization framework that incorporates the fairness notion as a regularization term, as well as introduce computationally-efficient algorithms that solve the optimization while faithfully implementing the regularization term. Experiments on synthetic and benchmark real datasets demonstrate that the proposed framework can indeed mitigate such unfairness while exhibiting a minor degradation of recommendation accuracy.

[4] Geewon Suh, When to use graph side information in matrix completion

  • Abstact: We consider a matrix completion problem that leverages graph as side information. One common approach in recently developed efficient algorithms is to take a two-step procedure: (i) clustering communities that form the basis of the graph structure; (ii) exploiting the estimated clusters to perform matrix completion together with iterative local refinement of clustering. A major limitation of the approach is that it achieves the information-theoretic limit on the number of observed matrix entries, promised by maximum likelihood estimation, only when a sufficient amount of graph side information is provided (the quantified measure is detailed later). The contribution of this work is to develop a computationally efficient algorithm that achieves the optimal sample complexity for the entire regime of graph information. The key idea is to make a careful selection for the information employed in the first clustering step, between two types of given information: graph and matrix ratings. Our experimental results conducted both on synthetic and real data confirm the superiority of our algorithm over the prior approaches in the scarce graph information regime.

[5] Junhyung Ahn, Matrix completion with hierarchical graph side information

  • Abstact: We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering and then iteratively refines estimates both on graph clustering and matrix ratings. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model (to be detailed), we demonstrate that our algorithm achieves the information-theoretic limit on the number of observed matrix entries (ie, optimal sample complexity) that is derived by maximum likelihood estimation together with a lower-bound impossibility result. One consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. We conduct extensive experiments both on synthetic and real-world datasets to corroborate our theoretical results as well as to demonstrate significant performance improvements over other matrix completion algorithms that leverage graph side information.

[6] Mingeun Kang, Autoencoder-based graph construction for semi-supervised learning

  • Abstact: We consider graph-based semi-supervised learning that leverages a similarity graph across data points to better exploit data structure exposed in unlabeled data. One challenge that arises in this problem context is that conventional matrix completion which can serve to construct a similarity graph entails heavy computational overhead, since it re-trains the graph independently whenever model parameters of an interested classifier are updated. In this paper, we propose a holistic approach that employs a parameterized neural-net-based autoencoder for matrix completion, thereby enabling simultaneous training between models of the classifier and matrix completion. We find that this approach not only speeds up training time (around a three-fold improvement over a prior approach), but also offers a higher prediction accuracy via a more accurate graph estimate. We demonstrate that our algorithm obtains state of-the-art performances by respectful margins on benchmark dataset.

[7] Kiwon Lee, Predicting kidney disease via fundus images

  • Abstact: We consider a deep learning framework to detect chronic kidney disease (CKD). As the recent increase in life expectancy has led to an aging society, detection for CKD is a new challenge. One challenge that arises in screening of CKD is that experts and analytic device are required to draw blood out and calculate the estimated glomerular filtration rate (eGFR), which is a milestone in measuring kidney function. In this paper, we propose a holistic deep learning framework to diagnose chronic kidney disease based on both retinal photographs and results of urine dipstick. We employed data from CHA hospitals in South Korea to develop and evaluate the deep learning algorithm. The area under the curve (AUC) was used as performance metrics. We demonstrate that our algorithm achieves the AUC performance of 93.95 (95 CI 92.96-94.94) for chronic kidney disease on the internal test dataset.

[8] Geonho Han, Multi-vehicle velocity estimation using IEEE 802.11ad waveform

  • Abstact: Wireless communication systems are to use millimeter-wave (mmWave) spectra, which can enable extra radar functionalities. In this paper, we propose a multi-target velocity estimation technique using IEEE 802.11ad waveform in a vehicle-to-vehicle (V2V) scenario. We form a wide beam to consider multiple target vehicles. The Doppler shift of each vehicle is estimated from least square estimation (LSE) using the round-trip delay obtained from the auto-correlation property of Golay complementary sequences in IEEE 802.11ad waveform, and the phase wrapping is compensated by the Doppler shift estimates of proper two frames. Finally, the velocities of target vehicles are obtained from the estimated Doppler shifts. Simulation results show the proposed velocity estimation technique can achieve significantly high accuracy even for short coherent processing interval (CPI).

[9] Giseung Park, Blockwise sequential model learning for partially observable reinforcement learning

  • Abstact: In this work, we propose a new sequential model learning to solve control problems when the environment is partially observable. Rather than compressing sequential information at every timestep as conventional recurrent neural network-based methods such as model-free or variational approaches, the proposed architecture generates a latent variable in each data block and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on self-attention, making the model capable of detailed sequential learning in partial observable settings. The proposed model builds an additional learning network to efficiently implement the gradient estimation using self-normalized importance sampling, bypassing the complex blockwise input data reconstruction in the model learning. Numerical results show that the proposed method outperforms previous approaches including the state-of-the-art algorithm for partially observable reinforcement learning in the considered benchmark environments.

[10] Haoning Chen, Coded computing for master-aided distributed computing systems

  • Abstact: We consider a MapReduce-type task running in a distributed computing model which consists of K edge computing nodes distributed across the edge of the network and a Master node that assists the edge nodes to compute output functions. The Master node and the edge nodes, both equipped with some storage memories and computing capabilities, are connected through a multicast network. We define the communication time spent during the transmission for the sequential implementation (all nodes send symbols sequentially) and parallel implementation (the Master node can send symbols during the edge nodes’ transmission), respectively. We propose a mixed coded distributed computing scheme that divides the system into two subsystems where the coded distributed computing (CDC) strategy proposed by Songze Li et al. is applied into the first subsystem and a novel master-aided CDC strategy is applied into the second subsystem. We prove that this scheme is optimal, i.e., achieves the minimum communication time for both the sequential and parallel implementation, and establish an optimal information-theoretic tradeoff between the overall communication time, computation load, and the Master node’s storage capacity. It demonstrates that incorporating a Master node with storage and computing capabilities can further reduce the communication time. For the sequential implementation, we deduce the approximately optimal file allocation between the two subsystems, which shows that the Master node should map as many files as possible in order to achieve smaller communication time. For the parallel implementation, if the Master node’s storage and computing capabilities are sufficiently large (not necessary to store and map all files), then the proposed scheme requires at most 1/2 of the minimum communication time of system without the help of the Master node.

[11] Hwanjin Kim, Massive MIMO channel prediction: Kalman filtering vs. machine learning

  • Abstact: This paper focuses on channel prediction techniques for massive multiple-input multiple-output (MIMO) systems. Previous channel predictors are based on theoretical channel models, which would be deviated from realistic channels. In this paper, we develop and compare a vector Kalman filter (VKF)-based channel predictor and a machine learning (ML)-based channel predictor using the realistic channels from the spatial channel model (SCM), which has been adopted in the 3GPP standard for years. First, we propose a low-complexity mobility estimator based on the spatial average using a large number of antennas in massive MIMO. The mobility estimate can be used to determine the complexity order of developed predictors. The VKF-based channel predictor developed in this paper exploits the autoregressive (AR) parameters estimated from the SCM channels based on the Yule-Walker equations. Then, the ML-based channel predictor using the linear minimum mean square error (LMMSE)-based noise pre-processed data is developed. Numerical results reveal that both channel predictors have substantial gain over the outdated channel in terms of channel prediction accuracy and data rate. The ML-based predictor has larger overall computational complexity than the VKF-based predictor, but once trained, the operational complexity of ML-based predictor becomes smaller than that of VKF-based predictor.

[12] Jongeui Park, active learning for imbalanced datasets based on normalized prediction error

  • Abstact: In this paper, we consider active learning for imbalanced datasets in which class imbalance exists and propose a new active learning scheme for imbalanced datasets. The proposed scheme is composed of three main parts: pre-training of the representation using both labeled and unlabeled samples to extract good representation, a novel self-referenced prediction error-based acquisition function that fits into the representation pre-training, and a mechanism to compensate for the effect of class imbalance. We evaluate the proposed scheme on multiple types of class imbalance, and numerical results show that the proposed scheme yields noticeable performance gain over existing active learning algorithms.

[13] Kyungrak Son, Distributed matrix multiplication using group algebra for on-device edge computing

  • Abstact: Leveraging the idea of group theory, we explore the distributed matrix multiplication problem posed in on-device edge computing. We first revisit how to embed matrix multiplication into group structure and then propose a distributed matrix multiplication scheme using the cyclic group. We identify the condition for the perfect reconstruction with the proposed scheme, and exhibit that the proposed scheme has better error performance than uncoded scheme in a noisy channel owing to the diversity gain of the proposed scheme.

[14] Seunghye Lee, Privacy-preserving estimation over wireless network

  • Abstact: In the era of big data, more and more user data is being collected, which leads to potential invasion of privacy. In this poster session, we will talk about basic privacy definition and popular LDP algorithms. And also based on preliminaries, we propose a privacy protection mechanisms for estimation over wireless network. In the Binary+Gaussian channel model, we suggested a Unbiased Estimator to infer the underlying statistics theta from a large dataset, while ensuring the privacy protection of each individual’s data. And finally, we calculated Privacy-Utility Tradeoff in given model.

[15] Seungyul Han, Diversity Actor-Critic (DAC): Sample-aware entropy regularization for sample-efficient exploration

  • Abstact: In this paper, sample-aware policy entropy regularization is proposed to enhance the conventional policy entropy regularization for better exploration. Exploiting the sample distribution obtainable from the replay buffer, the proposed sample-aware entropy regularization maximizes the entropy of the weighted sum of the policy action distribution and the sample action distribution from the replay buffer for sample-efficient exploration. A practical algorithm named diversity actor-critic (DAC) is developed by applying policy iteration to the objective function with the proposed sample-aware entropy regularization. Numerical results show that DAC significantly outperforms existing recent algorithms for reinforcement learning.

[16] Whiyoung Jung, Population-guided parallel policy search for reinforcement learning

  • Abstact: Parallel learning is one of the main research areas in reinforcement learning to enhance the control performance. In this work, we propose a new population-based parallel learning scheme to improve speed and stability of off-policy reinforcement learning algorithms. The proposed scheme has multiple learners with their own value-functions and policies and a chief for sharing a common information to all learners, and searches a good policy by exploiting the best policy information. The main contribution of our work is that the best policy information is used in a soft manner by constructing an augmented loss function for policy update to guide the population of policies to search a wide area in the policy space centered at the previous best policy. The guidance by the previous best policy and the enlarged search area enable faster and better policy search by escaping bad local optima. Furthermore, we prove monotone improvement of the expected cumulative return by the proposed scheme. In order to show the effectiveness of the proposed parallel learning scheme, we apply our parallel learning scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm, and compare its performance with other parallel learning schemes combined with TD3. Numerical results show that the proposed algorithm out-performs most of the current state-of-the-art parallel learning algorithms, and the gain is significant in the case of sparse reward environment.

[17] Woojun Kim, A variational approach to mutual information-based coordination for multi-agent reinforcement learning

  • Abstact: In this paper, we propose a new approach to the maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn simultaneously coordinated behavior rather than causally coordinated behavior. By introducing a latent variable, we induce nonzero mutual information between multi-agent simultaneous actions. Then, we derive a tractable lower bound on the considered MMI-regularized objective function based on variational bound. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC noticeably outperforms other MARL algorithms in multi-agent tasks requiring coordination.

[18] Laixi Shi, Provable convolutional sparse coding via nonconvex optimization

  • Abstact: Multi-channel sparse blind deconvolution, or convolutional sparse coding, refers to the problem of learning an unknown filter by observing its circulant convolutions with multiple input signals that are sparse. This problem finds numerous applications in signal processing, computer vision, and inverse problems. However, it is challenging to learn the filter efficiently due to the bilinear structure of the observations with respect to the unknown filter and inputs, as well as the sparsity constraint. In this paper, we propose a novel approach based on nonconvex optimization over the sphere manifold by minimizing a smooth surrogate of the sparsity-promoting loss function. It is demonstrated that manifold gradient descent with random initializations will provably recover the filter, up to scaling and shift ambiguity, as soon as the number of observations is sufficiently large under an appropriate random data model. Numerical experiments are provided to illustrate the performance of the proposed method with comparisons to existing ones.