Thesis @ SequeL

Links :

Hadrien Glaude. Méthodes des moments pour l'inférence de systèmes séquentiels linéaires rationnels, 2016. [ bib | PDF | slides ]


Abstract :

During my PhD, I explored inference of distributions with latent variables over sequences of symbols modeled by Multiplicity Automata using the Method of Moment. Multiplicity Automata (MA), also called weighted finite automata, encompass a large variety of linear systems as,
  • Stochastic languages (like PFA),
  • Stochastic processes (like HMM),
  • Controlled processes (like POMDP),
  • Sequence to sequence model like transducers.
MA allow treating under a common framework the existing models for all these kind of linear systems. An obvious advantage of using a common formalism is to extend established results to all sequential linear systems and to derive common learning algorithms and analysis.

Traditional learning to learn latent variable model algorithms like Baum-Welch, or gradient ascent on the likelihood, are iterative, slow and prone to get stuck in local optima. A recent alternative is to use the Method of Moments (MoM) to design fast and consistent algorithms with PAC-style guarantees.

However, MoM-based algorithms has two main cons. First, PAC-style guarantees holds only if the dimension of the learned model matches the dimension of the true one. Secondly, most MoM-based algorithm are improper, meaning that they learn a function close to the target distribution but that is not constrained to be a distribution. So, with a finite training set, the learned function can return negative values or values that do not sum to one. Practical applications rely on heuristic like thresholding and normalizing the function outputs to recover probability-like values. These heuristics introduce errors that degrades the performances and has no theoretical guarantees.

So, in my thesis, I have shown some results on the performance of a well-known MoM-based algorithm, the Spectral algorithm, when the learned model is smaller than the true one. My results, in the form of finite sample bounds, reflect the fact that with few training data, it can be more interesting to learn smaller models to avoid overfitting even if it introduces a bias. Here, the gap between singular values of a certain matrices containing probabilities of sequences play a crucial role.

The rest of my thesis has been driven by the design of learning algorithms that return true probabilities with PAC-style guarantees for a particular class of MA. Interestingly, it involved new results on Non-negative Matrix Factorization for Separable matrices. Experiments on real and synthetic datasets demonstrated state of art performances in comparison to other MoM-based algorithms and Baum-Welch.

Although, I mainly addressed the learning of stochastic languages and processes, the natural next step to thesis is to apply my algorithms to learn controlled processes and jump into reinforcement learning problems in partially observable environments.