2021 IEEE East Asian School of Information Theory
Poster abstracts
[1] Haiyun He, Optimal changepoint detection with training sequences in the large and moderate deviations regimes
Abstact: This paper investigates a novel offline changepoint detection problem from an informationtheoretic perspective. In contrast to most related works, we assume that the knowledge of the underlying pre and postchange distributions are not known and can only be learned from the training sequences which are available. We further require the probability of the emph{estimation error} to decay either exponentially or subexponentially fast (corresponding respectively to the large and moderate deviations regimes in information theory parlance). Based on the training sequences as well as the test sequence consisting of a single changepoint, we design a changepoint estimator and further show that this estimator is optimal by establishing matching (strong) converses. This leads to a full characterization of the optimal confidence width (i.e., half the width of the confidence interval within which the true changepoint is located at with high probability) as a function of the undetected error, under both the large and moderate deviations regimes.
[2] Jaewoong Cho, A fair classifier using kernel density estimation
Abstact: As machine learning becomes prevalent in a widening array of sensitive applications such as job hiring and criminal justice, one critical aspect in the design of machine learning classifiers is to ensure fairness: Guaranteeing the irrelevancy of a prediction to sensitive attributes such as gender and race. This work develops a kernel density estimation (KDE) methodology to faithfully respect the fairness constraint while yielding a tractable optimization problem that comes with high accuracyfairness tradeoff. One key feature of this approach is that the fairness measure quantified based on KDE can be expressed as a differentiable function w.r.t. model parameters, thereby enabling the use of prominent gradient descent to readily solve an interested optimization problem. This work focuses on classification tasks and two wellknown measures of group fairness: demographic parity and equalized odds. We empirically show that our algorithm achieves greater or comparable performances against prior fair classifers in accuracyfairness tradeoff as well as in training stability on both synthetic and benchmark real datasets.
[3] Moonseok Choi, Equal experience of recommender systems
Abstact: We explore the fairness issue that arises in recommender systems. Biased data due to inherent stereotypes of particular groups (e.g., female students prefer humanities to mathematics, and vice versa for males) may yield a limited scope of suggested items to a certain group of users. Our main contribution lies in the introduction of a novel fairness notion (that we call equal experience), which can serve to regulate such unfairness in the presence of biased data. The notion captures the degree of the equal experience of item recommendations across distinct groups. We propose an optimization framework that incorporates the fairness notion as a regularization term, as well as introduce computationallyefficient algorithms that solve the optimization while faithfully implementing the regularization term. Experiments on synthetic and benchmark real datasets demonstrate that the proposed framework can indeed mitigate such unfairness while exhibiting a minor degradation of recommendation accuracy.
[4] Geewon Suh, When to use graph side information in matrix completion
Abstact: We consider a matrix completion problem that leverages graph as side information. One common approach in recently developed efficient algorithms is to take a twostep procedure: (i) clustering communities that form the basis of the graph structure; (ii) exploiting the estimated clusters to perform matrix completion together with iterative local refinement of clustering. A major limitation of the approach is that it achieves the informationtheoretic limit on the number of observed matrix entries, promised by maximum likelihood estimation, only when a sufficient amount of graph side information is provided (the quantified measure is detailed later). The contribution of this work is to develop a computationally efficient algorithm that achieves the optimal sample complexity for the entire regime of graph information. The key idea is to make a careful selection for the information employed in the first clustering step, between two types of given information: graph and matrix ratings. Our experimental results conducted both on synthetic and real data confirm the superiority of our algorithm over the prior approaches in the scarce graph information regime.
[5] Junhyung Ahn, Matrix completion with hierarchical graph side information
Abstact: We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameterfree, and computationally efficient algorithm that starts with hierarchical graph clustering and then iteratively refines estimates both on graph clustering and matrix ratings. Under a hierarchical stochastic block model that well respects practicallyrelevant social graphs and a lowrank rating matrix model (to be detailed), we demonstrate that our algorithm achieves the informationtheoretic limit on the number of observed matrix entries (ie, optimal sample complexity) that is derived by maximum likelihood estimation together with a lowerbound impossibility result. One consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. We conduct extensive experiments both on synthetic and realworld datasets to corroborate our theoretical results as well as to demonstrate significant performance improvements over other matrix completion algorithms that leverage graph side information.
[6] Mingeun Kang, Autoencoderbased graph construction for semisupervised learning
Abstact: We consider graphbased semisupervised learning that leverages a similarity graph across data points to better exploit data structure exposed in unlabeled data. One challenge that arises in this problem context is that conventional matrix completion which can serve to construct a similarity graph entails heavy computational overhead, since it retrains the graph independently whenever model parameters of an interested classifier are updated. In this paper, we propose a holistic approach that employs a parameterized neuralnetbased autoencoder for matrix completion, thereby enabling simultaneous training between models of the classifier and matrix completion. We find that this approach not only speeds up training time (around a threefold improvement over a prior approach), but also offers a higher prediction accuracy via a more accurate graph estimate. We demonstrate that our algorithm obtains state oftheart performances by respectful margins on benchmark dataset.
[7] Kiwon Lee, Predicting kidney disease via fundus images
Abstact: We consider a deep learning framework to detect chronic kidney disease (CKD). As the recent increase in life expectancy has led to an aging society, detection for CKD is a new challenge. One challenge that arises in screening of CKD is that experts and analytic device are required to draw blood out and calculate the estimated glomerular filtration rate (eGFR), which is a milestone in measuring kidney function. In this paper, we propose a holistic deep learning framework to diagnose chronic kidney disease based on both retinal photographs and results of urine dipstick. We employed data from CHA hospitals in South Korea to develop and evaluate the deep learning algorithm. The area under the curve (AUC) was used as performance metrics. We demonstrate that our algorithm achieves the AUC performance of 93.95 (95 CI 92.9694.94) for chronic kidney disease on the internal test dataset.
[8] Geonho Han, Multivehicle velocity estimation using IEEE 802.11ad waveform
Abstact: Wireless communication systems are to use millimeterwave (mmWave) spectra, which can enable extra radar functionalities. In this paper, we propose a multitarget velocity estimation technique using IEEE 802.11ad waveform in a vehicletovehicle (V2V) scenario. We form a wide beam to consider multiple target vehicles. The Doppler shift of each vehicle is estimated from least square estimation (LSE) using the roundtrip delay obtained from the autocorrelation property of Golay complementary sequences in IEEE 802.11ad waveform, and the phase wrapping is compensated by the Doppler shift estimates of proper two frames. Finally, the velocities of target vehicles are obtained from the estimated Doppler shifts. Simulation results show the proposed velocity estimation technique can achieve significantly high accuracy even for short coherent processing interval (CPI).
[9] Giseung Park, Blockwise sequential model learning for partially observable reinforcement learning
Abstact: In this work, we propose a new sequential model learning to solve control problems when the environment is partially observable. Rather than compressing sequential information at every timestep as conventional recurrent neural networkbased methods such as modelfree or variational approaches, the proposed architecture generates a latent variable in each data block and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on selfattention, making the model capable of detailed sequential learning in partial observable settings. The proposed model builds an additional learning network to efficiently implement the gradient estimation using selfnormalized importance sampling, bypassing the complex blockwise input data reconstruction in the model learning. Numerical results show that the proposed method outperforms previous approaches including the stateoftheart algorithm for partially observable reinforcement learning in the considered benchmark environments.
[10] Haoning Chen, Coded computing for masteraided distributed computing systems
Abstact: We consider a MapReducetype task running in a distributed computing model which consists of K edge computing nodes distributed across the edge of the network and a Master node that assists the edge nodes to compute output functions. The Master node and the edge nodes, both equipped with some storage memories and computing capabilities, are connected through a multicast network. We define the communication time spent during the transmission for the sequential implementation (all nodes send symbols sequentially) and parallel implementation (the Master node can send symbols during the edge nodes’ transmission), respectively. We propose a mixed coded distributed computing scheme that divides the system into two subsystems where the coded distributed computing (CDC) strategy proposed by Songze Li et al. is applied into the first subsystem and a novel masteraided CDC strategy is applied into the second subsystem. We prove that this scheme is optimal, i.e., achieves the minimum communication time for both the sequential and parallel implementation, and establish an optimal informationtheoretic tradeoff between the overall communication time, computation load, and the Master node’s storage capacity. It demonstrates that incorporating a Master node with storage and computing capabilities can further reduce the communication time. For the sequential implementation, we deduce the approximately optimal file allocation between the two subsystems, which shows that the Master node should map as many files as possible in order to achieve smaller communication time. For the parallel implementation, if the Master node’s storage and computing capabilities are sufficiently large (not necessary to store and map all files), then the proposed scheme requires at most 1/2 of the minimum communication time of system without the help of the Master node.
[11] Hwanjin Kim, Massive MIMO channel prediction: Kalman filtering vs. machine learning
Abstact: This paper focuses on channel prediction techniques for massive multipleinput multipleoutput (MIMO) systems. Previous channel predictors are based on theoretical channel models, which would be deviated from realistic channels. In this paper, we develop and compare a vector Kalman filter (VKF)based channel predictor and a machine learning (ML)based channel predictor using the realistic channels from the spatial channel model (SCM), which has been adopted in the 3GPP standard for years. First, we propose a lowcomplexity mobility estimator based on the spatial average using a large number of antennas in massive MIMO. The mobility estimate can be used to determine the complexity order of developed predictors. The VKFbased channel predictor developed in this paper exploits the autoregressive (AR) parameters estimated from the SCM channels based on the YuleWalker equations. Then, the MLbased channel predictor using the linear minimum mean square error (LMMSE)based noise preprocessed data is developed. Numerical results reveal that both channel predictors have substantial gain over the outdated channel in terms of channel prediction accuracy and data rate. The MLbased predictor has larger overall computational complexity than the VKFbased predictor, but once trained, the operational complexity of MLbased predictor becomes smaller than that of VKFbased predictor.
[12] Jongeui Park, active learning for imbalanced datasets based on normalized prediction error
Abstact: In this paper, we consider active learning for imbalanced datasets in which class imbalance exists and propose a new active learning scheme for imbalanced datasets. The proposed scheme is composed of three main parts: pretraining of the representation using both labeled and unlabeled samples to extract good representation, a novel selfreferenced prediction errorbased acquisition function that fits into the representation pretraining, and a mechanism to compensate for the effect of class imbalance. We evaluate the proposed scheme on multiple types of class imbalance, and numerical results show that the proposed scheme yields noticeable performance gain over existing active learning algorithms.
[13] Kyungrak Son, Distributed matrix multiplication using group algebra for ondevice edge computing
Abstact: Leveraging the idea of group theory, we explore the distributed matrix multiplication problem posed in ondevice edge computing. We first revisit how to embed matrix multiplication into group structure and then propose a distributed matrix multiplication scheme using the cyclic group. We identify the condition for the perfect reconstruction with the proposed scheme, and exhibit that the proposed scheme has better error performance than uncoded scheme in a noisy channel owing to the diversity gain of the proposed scheme.
[14] Seunghye Lee, Privacypreserving estimation over wireless network
Abstact: In the era of big data, more and more user data is being collected, which leads to potential invasion of privacy. In this poster session, we will talk about basic privacy definition and popular LDP algorithms. And also based on preliminaries, we propose a privacy protection mechanisms for estimation over wireless network. In the Binary+Gaussian channel model, we suggested a Unbiased Estimator to infer the underlying statistics theta from a large dataset, while ensuring the privacy protection of each individual’s data. And finally, we calculated PrivacyUtility Tradeoff in given model.
[15] Seungyul Han, Diversity ActorCritic (DAC): Sampleaware entropy regularization for sampleefficient exploration
Abstact: In this paper, sampleaware policy entropy regularization is proposed to enhance the conventional policy entropy regularization for better exploration. Exploiting the sample distribution obtainable from the replay buffer, the proposed sampleaware entropy regularization maximizes the entropy of the weighted sum of the policy action distribution and the sample action distribution from the replay buffer for sampleefficient exploration. A practical algorithm named diversity actorcritic (DAC) is developed by applying policy iteration to the objective function with the proposed sampleaware entropy regularization. Numerical results show that DAC significantly outperforms existing recent algorithms for reinforcement learning.
[16] Whiyoung Jung, Populationguided parallel policy search for reinforcement learning
Abstact: Parallel learning is one of the main research areas in reinforcement learning to enhance the control performance. In this work, we propose a new populationbased parallel learning scheme to improve speed and stability of offpolicy reinforcement learning algorithms. The proposed scheme has multiple learners with their own valuefunctions and policies and a chief for sharing a common information to all learners, and searches a good policy by exploiting the best policy information. The main contribution of our work is that the best policy information is used in a soft manner by constructing an augmented loss function for policy update to guide the population of policies to search a wide area in the policy space centered at the previous best policy. The guidance by the previous best policy and the enlarged search area enable faster and better policy search by escaping bad local optima. Furthermore, we prove monotone improvement of the expected cumulative return by the proposed scheme. In order to show the effectiveness of the proposed parallel learning scheme, we apply our parallel learning scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm, and compare its performance with other parallel learning schemes combined with TD3. Numerical results show that the proposed algorithm outperforms most of the current stateoftheart parallel learning algorithms, and the gain is significant in the case of sparse reward environment.
[17] Woojun Kim, A variational approach to mutual informationbased coordination for multiagent reinforcement learning
Abstact: In this paper, we propose a new approach to the maximum mutual information (MMI) framework for multiagent reinforcement learning (MARL) to enable multiple agents to learn simultaneously coordinated behavior rather than causally coordinated behavior. By introducing a latent variable, we induce nonzero mutual information between multiagent simultaneous actions. Then, we derive a tractable lower bound on the considered MMIregularized objective function based on variational bound. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multiagent actorcritic (VM3AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3AC for several games requiring coordination, and numerical results show that VM3AC noticeably outperforms other MARL algorithms in multiagent tasks requiring coordination.
[18] Laixi Shi, Provable convolutional sparse coding via nonconvex optimization
Abstact: Multichannel sparse blind deconvolution, or convolutional sparse coding, refers to the problem of learning an unknown filter by observing its circulant convolutions with multiple input signals that are sparse. This problem finds numerous applications in signal processing, computer vision, and inverse problems. However, it is challenging to learn the filter efficiently due to the bilinear structure of the observations with respect to the unknown filter and inputs, as well as the sparsity constraint. In this paper, we propose a novel approach based on nonconvex optimization over the sphere manifold by minimizing a smooth surrogate of the sparsitypromoting loss function. It is demonstrated that manifold gradient descent with random initializations will provably recover the filter, up to scaling and shift ambiguity, as soon as the number of observations is sufficiently large under an appropriate random data model. Numerical experiments are provided to illustrate the performance of the proposed method with comparisons to existing ones.
