# Conditional random fields

Disponível somente no TrabalhosFeitos
• Páginas : 13 (3243 palavras )
• Publicado : 16 de abril de 2013

Amostra do texto
Conditional Random Fields
Andrea Passerini passerini@disi.unitn.it
Complex Systems

Conditional Random Fields

Generative vs discriminative models
joint distributions Traditional graphical models (both BN and MN) model joint probability distributions p(x, y) In many situations we know in advance which variables will be observed, and which will need to be predicted (i.e. x vs y) HiddenMarkov Models (as a special case of BN) also model joint probabilities of states and observations, even if they are often used to estimate the most probable sequence of states y given the observations x A problem with joint distributions is that they need to explicitly model the probability of x, which can be quite complex (e.g. a textual document)

Conditional Random Fields

Generative vsdiscriminative models
y y1 y2 y n−1 yn yn+1

x1

xi

xn

x1

x2

xn−1

xn

xn+1

Naive Bayes

Hidden Markov Model

generative models Directed graphical models are called generative when the joint probability decouples as p(x, y) = p(x|y)p(y) The dependencies between input and output are only from the latter to the former: the output generates the input Naive Bayes classiﬁers andHidden Markov Models are both generative models

Conditional Random Fields

Generative vs discriminative models

Discriminative models If the purpose is choosing the most probable conﬁguration for the output variables, we can directly model the conditional probability of the output given the input: p(y|x) The parameters of such distribution have higher freedom wrt those of the full p(x, y),as p(x) is not modelled This allows to effectively exploit the structure of x without modelling the interactions between its parts, but only those with the output Such models are called discriminative as they aim at modeling the discrimination between different outputs

Conditional Random Fields

Conditional Random Fields (CRF, Lafferty et al. 2001)

Deﬁnition Conditional random ﬁelds areconditional Markov networks: p(y|x) = 1 exp Z (x) (−E((x, y)C ))
(x,y)C

The partition function Z (x) is summed only over y to provide a proper conditional probability: Z (x) =
y

exp
(x,y )C

−E((x, y )C )

Conditional Random Fields

Conditional Random Fields

Feature functions 1 p(y|x) = exp Z (x)
K

λk fk ((x, y)C )
(x,y)C k=1

The negated energy function is often writtensimply as a weighted sum of real-valued feature functions Each feature function should capture a certain characteristic of the clique variables

Conditional Random Fields

Linear chain CRF

Description (simple form) 1 p(y|x) = exp Z (x)
K h

λk fk (yt , yt−1 ) +
t k=1 h=1

µh fh (xt , yt )

Models the relation between an input and an output sequence Output sequences are modelled asa linear chain, with a link between each consecutive output element Each output element is connected to the corresponding input.
Conditional Random Fields

Linear chain CRF
Description (more generic form) p(y|x) = 1 exp Z (x)
K

λk fk (yt , yt−1 , xt )
t k=1

the linear chain CRF can model arbitrary features of the input, not only identity of the current observation (like in HMMs) Wecan think of xt as a vector containing input information relevant for position t, possibly including inputs at previous or following positions We can easily make transition scores (between consecutive outputs yt−1 , yt ) dependent also on current input xt

Conditional Random Fields

Linear chain CRF
Parameter estimation Parameters λk of feature functions need to be estimated from data Weestimate them from a training set of i.i.d. input/output sequence pairs D = {(x (i) , y (i) )} i = 1, . . . , N

each example (x (i) , y (i) ) is made of a sequence of inputs and a corresponding sequence of outputs: x (i) = {x1 , . . . , xT }
(i) (i)

y (i) = {y1 , . . . , yT }

(i)

(i)

Note For simplicity of notation we assume each training sequence have the same length. The generic...