Bayesian Decision Model (베이지안 의사결정 모델) 쉽게 알아보기 - Part 2. 베이지안 의사결정 모델의 구성

Study

Bayesian Decision Model (베이지안 의사결정 모델) 쉽게 알아보기 - Part 2. 베이지안 의사결정 모델의 구성

Whaleen 2024. 12. 22. 15:44

728x90

이전 포스트:

2024.12.17 - [Study] - Bayesian Decision Model (베이지안 의사결정 모델) 쉽게 알아보기 - Part 1. 베이지안 의사결정 이론이란?

목차

1. 베이지안 모델의 가정

2. 베이지안 모델링 과정 4단계

3. 간단한 예시: 황금사과

**저도 아직 배우는 중이라 정확하지 않는 부분이 있을 수도 있으니 출처 참고 부탁드립니다!**

지난 포스팅에서는 베이지안 의사결정 이론에 대해서 간단하게 이야기했다.베이지안 의사결정 이론은 결정을 내리는 과정에 있는 불확실한 요소에 대해 확률적으로 접근하는 이론이다. 따라서 이를 바탕으로 한 베이지안 의사결정 모델은 불확실성 하에서의 의사결정 과정을 모델링하는 모델이다. 지금부터 편의상 이 포스트에서는 베이지안 의사결정 모델을 베이지안 모델로 줄여 부르도록 하겠다.

흔히 베이지안 이론과 관련된 개념들을 다룰 때 사용하는 특정 용어들이 있는데, 나는 개인적으로 이 용어들이 처음 배울 때 이해를 더 어렵게 만드는 것 같다. 그래서 최대한 그런 용어들을 풀어서 이야기해 보려고 노력하겠다.

베이지안 모델은 간단하게 말하면 어떤 사람이 정보를 획득하고 그것으로 본인의 사전 지식을 업데이트시킨 후, 그 업데이트된 지식(= 사후지식)을 바탕으로 이익을 최대화 또는 손해를 최소화하는 행동을 취하는 의사결정 과정을 모델링한 것이다.

1. 베이지안 모델의 가정

여기서 베이지안 모델은 두 가지 가정을 기반으로 한다.

가정 1. 사람이 새로운 정보를 얻으면, 기존의 사전지식(사전확률)을 베이즈 정리에 따라 업데이트하여 사후확률을 형성한다.

(베이즈 정리에 대해서는 따로 정리를 하는 포스트를 만들어보겠다.)

이 가정이 의미하는 것은, 사람은 새로운 관찰 정보가 주어졌을 때, 기존의 사전지식을 완전히 무시하지 않고, 관찰된 정보와 결합해 더 나은 판단을 위한 지식(= 사후지식)을 형성한다는 가정이다. 우리가 기존에 알고 있던 사실에서 관련된 새로운 경험을 하면 기존에 알고 있다고 생각했던 그 사실을 새로운 경험을 바탕으로 약간 수정할 것이라는 이야기이다. 이 것은 내 입장에서는 매우 합리적인 가정이라고 생각한다. 왜냐하면 인간은 생존을 위해 끊임없이 정보를 습득하는 동물이니까. 정보습득을 통해 기존의 지식을 발전시킨 뒤에 행하는 행동이 더 생존에 도움이 된다고 판단한 것이 아닐까 하고 생각한다. (이 사족은 내 사견이다.)

가정 2. 사람은 특정 행동의 결과로 얻게 되는 이익과 손해를 고려하며, 항상 이익을 극대화하거나 손해를 최소화시키는 쪽으로 결정을 내린다.

이 가정이 의미하는 것은 사람이 의식적이든 무의식적이든 단순히 확률에만 의존해 결정을 내리는 것이 아니라, 특정 행동이 가져올 실질적 이익과 잠재적 손해를 고려한다는 이야기이다. 이는 베이지안 확률 업데이트 후, 최적의 행동을 선택하는 데 중요한 역할을 한다.

베이즈 정리에 따른 사전지식을 업데이트해 보다 정확한 사후 지식을 얻었다고 하자. 그럼 이 지식을 바탕으로 인간은 이익을 극대화하고 손해를 최소화하는 행동을 할 것이라는 이야기이다. 이런 행동을 두고 '합리적'이라고 표현한다. 나는 이 부분에 완전히 동의하는지는 모르겠지만, 일리 있는 가설이라고는 생각한다. 인간은 대체로 이익과 손해에 따라 움직이는 것 같다. 그리고 그 이익과 손해를 어떻게 설정하느냐에 따라서, 몇몇 비합리적인 선택도 모델링 할 수 있다고 보기 때문에 좋은 뼈대 모델이라고 생각한다.

2. 베이지안 모델링 과정 4단계

1단계) 생성모델(Generative Model)의 정의

조금 부정확하지만 간단하게 이야기하자면, 사전지식을 의미한다. 좀 더 정확하게 말하자면, 결정자가 세상에 대해서 어떻게 생각하고 있는지를 나타낸다. 결정자가 세상의 어떠한 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 대해 알고 있는 사전확률 $P (s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ 와, 그 세상의 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 와 새롭게 관찰한 정보 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 사이의 관계를 조건부 확률 $P (x | s) P (x | s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo></math>$ 로 나타낸 것으로, 이 둘만 있으면, 관찰자는 본인이 지금 관찰한 내용 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 가 본인이 가지고 있는 지식(상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 가 관찰될 확률 $P (s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ 와 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 일 때 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 가 관찰될 확률 $P (x | s) P (x | s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo></math>$ 을 바탕으로 얼마나 일어날법한 일인지를 생각해 볼 수 있다.

따라서, 생성모델을 세우는 방법은 간단하다. $P (s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ , $P (x | s) P (x | s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo></math>$ 의 식을 세운다.

이 두 식을 통해 관찰 데이터 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 가 어떻게 생성되었는지 시뮬레이션할 수 있기 때문에 "생성모델"이라고 부른다.

2단계) 추론(Inference) 진행

추론 과정은 위에서 말한 대로 베이즈 정리에 따라 사전지식을 업데이트하는 과정을 말한다. 추론 과정을 통해 우리는 내가 관찰한 것을 바탕으로 세상의 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 에 대한 사전 지식 $P (s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ 을 수정하여 더 정확한 사후 지식으로 업데이트할 수 있게 해 준다. 이 업데이트된 사후 지식의 $P (s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ 는 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 관찰한 후의 업데이트 이므로, $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 관찰했다는 조건을 더한 조건부확률인 $P (s | x) P (s | x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo></math>$ 로 표현된다.

베이즈 정리를 사용해 기존 지식을 업데이트해보자:

P(s|x)=P(x|s)P(s)P(x)P(s|x)=P(x|s)P(s)P(x)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></mrow><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac></math>

이 때,

P (x) P (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></math>

는 아래의 과정으로 구할 수 있다:

P (x) = \sum s P (x | s) P (s) P (x) = \sum s P (x | s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo data-mjx-texclass="OP">\sum</mo><mi>s</mi></munder><mi>P</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>

3단계) 행동 또는 응답 결정

이 과정은 위의 두 번째 가정에 관련된 부분이다. 수정된 지식 $P (s | x) P (s | x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo></math>$ 을 활용하여 가장 이익이 극대화될 수 있는 행동을 취하는 과정이다. 이때, 결정자는 본인이 할 수 있는 행동 옵션을 모두 고려하여, 각 행동을 취했을 때 얻을 수 있는 이익과 손해를 계산한 뒤 최적의 결정을 내린다. 편의상 얻을 수 있는 이익과 손해를 "비용"이라는 단어로 축약하겠다. 비용은 손해와 비슷한 느낌으로, 결정자가 최소화하고 싶은 부분이다.

같은 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 를 취하더라도 어떤 배경에서 취했을 때에 따라서 비용이 달라지기 때문에, 비용을 나타내는 함수 $C (s, a) C (s, a) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi><mo stretchy="false">(</mo><mi>s</mi><mo>,</mo><mi>a</mi><mo stretchy="false">)</mo></math>$ 는 세상의 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 와 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 를 모두 변수로 둔다. 비용 함수는 상태와 행동의 조합에 따라 발생하는 손실을 수량화한 것이다.

자, 그럼 이제 내가 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 관찰한 후에 취할 행동을 생각해 보자. 가능한 세계의 상태가 $s 1, s 2, s 3 s_{1}, s_{2}, s_{3} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>1</mn></msub><mo>,</mo><msub><mi>s</mi><mn>2</mn></msub><mo>,</mo><msub><mi>s</mi><mn>3</mn></msub></math>$ 세 가지만 있다고 가정하고, 내가 취할 수 있는 행동의 가짓수는 $a 1, a 2, a 3, a 4 a_{1}, a_{2}, a_{3}, a_{4} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub><mo>,</mo><msub><mi>a</mi><mn>2</mn></msub><mo>,</mo><msub><mi>a</mi><mn>3</mn></msub><mo>,</mo><msub><mi>a</mi><mn>4</mn></msub></math>$ 네 가지라면 다음과 같이 진행된다.

1. $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 가 관찰될 때 세상의 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 가 일어날 확률을 계산한다.

내가 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 관찰했을 때, $s 1, s 2, s 3 s_{1}, s_{2}, s_{3} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>1</mn></msub><mo>,</mo><msub><mi>s</mi><mn>2</mn></msub><mo>,</mo><msub><mi>s</mi><mn>3</mn></msub></math>$ 의 확률을 확률 분포 $P (s | x) P (s | x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo></math>$ 로 나타낸다.
이 확률 분포는 새로운 관찰 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 반영해 사전 확률 $P (s) P (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ 를 갱신한 결과다.

2. 각 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 일 때 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 를 취했을 때 발생하는 비용을 계산한다.

각 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 일 때 내가 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 를 취하면 얼마의 비용이 나오는지를 계산해서 더하면, 내가 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 를 취할 때 발생할 비용을 계산할 ㅅ수 있다. 내가 관찰한 것은 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 이지만, 실제 현실에서는 어떤 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 인지를 100% 확신할 수 없으므로, 모든 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 에서의 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 로 인한 비용을 더해준다.
Ex. 내가 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 관찰했을 때 $a 1 a_{1} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 의 행동을 취하면 나오는 비용 = ( $s 1 s_{1} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>1</mn></msub></math>$ 일때 $a 1 a_{1} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 를 취할때의 비용 + $s 2 s_{2} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>2</mn></msub></math>$ 일 때 $a 1 a_{1} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 의 행동을 취할 때의 비용 + $s 3 s_{3} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>3</mn></msub></math>$ 일 때 $a 1 a_{1} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 의 행동을 취할 때의 비용)
이 때, 행동 $a 1 a_{1} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 에 대한 기대 비용은 각 상태별 비용을 해당 상태의 확률로 가중 평균을 내는 방식으로 아래 처럼 계산한다.

E C (a 1) = P (s 1 ∣ x) C (s 1, a 1) + P (s 2 ∣ x) C (s 2, a 1) + P (s 3 ∣ x) C (s 3, a 1) E C (a_{1}) = P (s_{1} ∣ x) C (s_{1}, a_{1}) + P (s_{2} ∣ x) C (s_{2}, a_{1}) + P (s_{3} ∣ x) C (s_{3}, a_{1}) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><msub><mi>a</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>s</mi><mn>1</mn></msub><mo>∣</mo><mi>x</mi><mo stretchy="false">)</mo><mi>C</mi><mo stretchy="false">(</mo><msub><mi>s</mi><mn>1</mn></msub><mo>,</mo><msub><mi>a</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>+</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>s</mi><mn>2</mn></msub><mo>∣</mo><mi>x</mi><mo stretchy="false">)</mo><mi>C</mi><mo stretchy="false">(</mo><msub><mi>s</mi><mn>2</mn></msub><mo>,</mo><msub><mi>a</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>+</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>s</mi><mn>3</mn></msub><mo>∣</mo><mi>x</mi><mo stretchy="false">)</mo><mi>C</mi><mo stretchy="false">(</mo><msub><mi>s</mi><mn>3</mn></msub><mo>,</mo><msub><mi>a</mi><mn>1</mn></msub><mo stretchy="false">)</mo></math>

각 행동 $a 2, a 3, a 4 a_{2}, a_{3}, a_{4} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>2</mn></msub><mo>,</mo><msub><mi>a</mi><mn>3</mn></msub><mo>,</mo><msub><mi>a</mi><mn>4</mn></msub></math>$ 에 대해서도 동일한 과정을 반복한다.

기대 비용이란, 관찰한 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 로부터 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 에 대한 확률을 반영하여 특정 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 로 인해 발생하는 평균 비용을 말한다.
기대 비용은 행동 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 의 결과로 발생하는 비용의 확률적 평균을 나타낸다. 이를 수학적으로는 나타내면 아래의 식과 같이 된다.

E C (a) = \sum s P (s ∣ x) C (s, a) . E C (a) = \sum s P (s ∣ x) C (s, a) . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mi>s</mi></mrow></munder><mi>P</mi><mo stretchy="false">(</mo><mi>s</mi><mo>∣</mo><mi>x</mi><mo stretchy="false">)</mo><mi>C</mi><mo stretchy="false">(</mo><mi>s</mi><mo>,</mo><mi>a</mi><mo stretchy="false">)</mo><mo>.</mo></math>

3. 모든 행동 옵션에 대한 기대 비용을 계산하고, 가장 작은 기대 비용을 가지는 행동을 선택한다.

예를 들어, $E C (a 1) = 10 E C (a_{1}) = 10 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><msub><mi>a</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mn>10</mn></math>$ , $E C (a 2) = 15 E C (a_{2}) = 15 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><msub><mi>a</mi><mn>2</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mn>15</mn></math>$ , $E C (a 3) = 7 E C (a_{3}) = 7 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><msub><mi>a</mi><mn>3</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mn>7</mn></math>$ , $E C (a 4) = 20 E C (a_{4}) = 20 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><msub><mi>a</mi><mn>4</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mn>20</mn></math>$ 라면 $a 3 a_{3} <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>3</mn></msub></math>$ 을 선택한다. 이는 기대 비용을 최소화하는 행동이므로, 최적의 선택으로 간주된다.

이를 수식으로 보면 다음과 같다. $a * = arg min a E C (a) . a^{*} = arg min a E C (a) . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msup><mi>a</mi><mo>*</mo></msup><mo>=</mo><mi>arg</mi><mo data-mjx-texclass="NONE"></mo><munder><mo data-mjx-texclass="OP" movablelimits="true">min</mo><mi>a</mi></munder><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo>.</mo></math>$ 이 때 $a * a^{*} <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>a</mi><mo>*</mo></msup></math>$ 가 우리가 만든 베이지안 모델이 예측하는 결정자의 행동이다.

만약 세계의 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ 가 연속적인 값이라면, 시그마가 아닌 적분기호를 사용한다: $E C (a) = \int p (s ∣ x) C (s, a) d s E C (a) = \int p (s ∣ x) C (s, a) d s <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo>=</mo><mo data-mjx-texclass="OP">\int</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>∣</mo><mi>x</mi><mo stretchy="false">)</mo><mi>C</mi><mo stretchy="false">(</mo><mi>s</mi><mo>,</mo><mi>a</mi><mo stretchy="false">)</mo><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mi>d</mi><mi>s</mi></math>$

4단계) 행동 또는 응답 데이터 생성

이렇게 위의 과정에서 우리는 결정자가 어떤 정보 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 습득하면 어떤 결정 $a a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ 를 내릴지에 대한 베이지안 모델을 완성했다. 이제 이 모델을 이용해 시뮬레이션을 돌려 어떻게 행동할지에 대해 예측해 볼 수도 있고, 또는 실제 사람이 특정 정보를 습득해 특정 결정을 내리는 행동을 한 데이터를 모델과 비교해서 모델을 약간 더 현실적으로 수정할 수 있다. (주로 매개변수를 최적화 시키는 과정으로, 베이지안 모델의 정확도를 높이는 과정이다.)

3. 간단한 예시: 황금사과

1단계: 생성모델 (Generative Model)

의사결정자인 농부는 대부분의 사과는 빨간색(일반적)이지만, 드물게 황금 사과가 과수원에 나타난다는 것을 알고 있다. 이 때, 생성모델은 다음과 같다:

세상의 상태 $s s <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$ : 사과는 진짜 황금 사과 ( $s = 1 s = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>=</mo><mn>1</mn></math>$ )이거나, 가짜 ( $s = 0 s = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>=</mo><mn>0</mn></math>$ )일 수 있다.
관찰 $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ : 어느 날 농부는 황금처럼 보이는 사과를 관찰한다.
사전 분포 $p (s) p (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ : 경험에 따르면: $황 금 사 과 는 드 물 다 p (s = 1) = 0.1 (황금 사과는 드물다), <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><mn>0.1</mn><mstyle scriptlevel="0"><mspace width="1em"></mspace></mstyle><mtext>(황금 사과는 드물다)</mtext><mo>,</mo></math>$ $일 반 적 인 빨 간 사 과 가 대 부 분 이 다 p (s = 0) = 0.9 (일반적인 빨간 사과가 대부분이다) . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><mn>0.9</mn><mstyle scriptlevel="0"><mspace width="1em"></mspace></mstyle><mtext>(일반적인 빨간 사과가 대부분이다)</mtext><mo>.</mo></math>$
우도(likelihood) $p (x | s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo></math>$ : 황금 사과를 관찰할 확률은 사과가 진짜일 때 더 높고 ( $p (x | s = 1) = 0.8 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo>=</mo><mn>1</mn><mo stretchy="false">)</mo><mo>=</mo><mn>0.8</mn></math>$ ), 가짜일 때 더 낮다 ( $p (x | s = 0) = 0.3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo>=</mo><mn>0</mn><mo stretchy="false">)</mo><mo>=</mo><mn>0.3</mn></math>$ ).

2단계: 추론 (Posterior Calculation)

농부는 관찰 $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 를 바탕으로 황금 사과가 진짜 ( $s = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>=</mo><mn>1</mn></math>$ )일 확률을 베이즈 정리를 사용하여 업데이트 한다:

$p(s|x)=p(x|s)p(s)p(x).<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></mrow><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mfrac><mo>.</mo></math>$

$s = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>=</mo><mn>1</mn></math>$ (진짜 황금 사과)인 경우:

$p(s=1|x)=p(x|s=1)p(s=1)p(x|s=1)p(s=1)+p(x|s=0)p(s=0).<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo>=</mo><mn>1</mn><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo>=</mo><mn>1</mn><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mo stretchy="false">)</mo><mo>+</mo><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo>=</mo><mn>0</mn><mo stretchy="false">)</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>0</mn><mo stretchy="false">)</mo></mrow></mfrac><mo>.</mo></math>$

값을 대입하면:

$p(s=1|x)=0.8⋅0.1(0.8⋅0.1)+(0.3⋅0.9)=0.080.08+0.27=0.080.35≈0.229.<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mn>0.8</mn><mo>⋅</mo><mn>0.1</mn></mrow><mrow><mo stretchy="false">(</mo><mn>0.8</mn><mo>⋅</mo><mn>0.1</mn><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mn>0.3</mn><mo>⋅</mo><mn>0.9</mn><mo stretchy="false">)</mo></mrow></mfrac><mo>=</mo><mfrac><mn>0.08</mn><mrow><mn>0.08</mn><mo>+</mo><mn>0.27</mn></mrow></mfrac><mo>=</mo><mfrac><mn>0.08</mn><mn>0.35</mn></mfrac><mo>≈</mo><mn>0.229</mn><mo>.</mo></math>$

따라서, 황금 사과가 진짜일 확률은 약 22.9%이다.

3단계: 행동 결정 (Taking an Action)

농부는 기대 효용을 기반으로 사과를 딸지 여부를 결정합니다:

비용 함수 $C (s, a) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi><mo stretchy="false">(</mo><mi>s</mi><mo>,</mo><mi>a</mi><mo stretchy="false">)</mo></math>$ :
- $C (s = 1, pick) = - 5 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mo>,</mo><mtext>pick</mtext><mo stretchy="false">)</mo><mo>=</mo><mo>-</mo><mn>5</mn></math>$ : 진짜 황금 사과를 판매하여 이익을 얻는다.
- $C (s = 0, pick) = 10 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>0</mn><mo>,</mo><mtext>pick</mtext><mo stretchy="false">)</mo><mo>=</mo><mn>10</mn></math>$ : 가짜 황금 사과를 따면 금전적 손실이 발생한다.
- $C (don’t pick) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi><mo stretchy="false">(</mo><mtext>don’t pick</mtext><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn></math>$ : 사과를 따지 않으면 비용도, 이익도 없다.

사과를 딸 때 기대 비용은 다음과 같다:

$E C (pick) = p (s = 1 | x) \cdot (- 5) + p (s = 0 | x) \cdot (+ 10), <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mtext>pick</mtext><mo stretchy="false">)</mo><mo>=</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>1</mn><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo><mo>\cdot</mo><mo stretchy="false">(</mo><mo>-</mo><mn>5</mn><mo stretchy="false">)</mo><mo>+</mo><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo>=</mo><mn>0</mn><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>x</mi><mo stretchy="false">)</mo><mo>\cdot</mo><mo stretchy="false">(</mo><mo>+</mo><mn>10</mn><mo stretchy="false">)</mo><mo>,</mo></math>$ $E C (pick) = 0.229 \cdot (- 5) + 0.771 \cdot (+ 10) = - 1.145 + 7.71 = 6.565 . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mtext>pick</mtext><mo stretchy="false">)</mo><mo>=</mo><mn>0.229</mn><mo>\cdot</mo><mo stretchy="false">(</mo><mo>-</mo><mn>5</mn><mo stretchy="false">)</mo><mo>+</mo><mn>0.771</mn><mo>\cdot</mo><mo stretchy="false">(</mo><mo>+</mo><mn>10</mn><mo stretchy="false">)</mo><mo>=</mo><mo>-</mo><mn>1.145</mn><mo>+</mo><mn>7.71</mn><mo>=</mo><mn>6.565</mn><mo>.</mo></math>$

사과를 따지 않을 경우 기대 비용은:

$E C (don’t pick) = 0. <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mtext>don’t pick</mtext><mo stretchy="false">)</mo><mo>=</mo><mn>0.</mn></math>$

따라서 $E C (pick) > E C (don’t pick) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mtext>pick</mtext><mo stretchy="false">)</mo><mo>></mo><mi>E</mi><mi>C</mi><mo stretchy="false">(</mo><mtext>don’t pick</mtext><mo stretchy="false">)</mo></math>$ 이므로, 농부는 사과를 따기로 결정한다.

3단계: 응답 분포 (Response Distribution)

농부의 응답 (딸지 여부)은 확률적으로 모델링된다. 모델은 유사한 황금 사과를 관찰한 경우, 농부가 약 22.9%의 확률로 사과를 딸 것이라고 예측한다.

농부가 다양한 조건에서 반복적으로 황금 사과를 관찰하면, 응답 분포 $p (a | s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>a</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo></math>$ 를 실제 행동과 비교하여 모델 매개변수 (예: $p (x | s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo stretchy="false">)</mo></math>$ 또는 $p (s) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></math>$ )를 조정할 수 있다.

여기까지 베이지안 의사결정 모델을 만드는 방법에 대해서 알아보았다. 최대한 쉽게 설명해보려고 햇지만, 개념이 개념인 만큼 오히려 설명이 길어져 더 보기 힘든 것은 아닌지 모르겠다. 만약 이상한 부분이나 질문이 있다면 언제든 댓글을 남겨주시면 좋을 것 같다.

다음 포스트에는 좀 더 본격적인 예시와 이를 베이지안 모델을 이용해 모델링하는 시간을 가져보려고 한다.

출처

Ma, Wei Ji, et al. “Bayesian Decision-Making in the Brain.” Neuron, vol. 92, no. 4, 2019, pp. 731–742, https ://doi .org /10.1016 /j .neuron .2019.09.034.

728x90

'Study' 카테고리의 다른 글

Bayesian Decision Model (베이지안 의사결정 모델) 쉽게 알아보기 - Part 1. 베이지안 의사결정 이론이란? (46)	2024.12.17
[Study] ADHD는 장애로 취급되어서는 안 된다? (7)	2024.11.18
쥐의 뇌 커넥텀 (Mouse Brain Connectome)과 신경과학 연구의 미래 (9)	2024.10.04
“그룹화(groupitizing)” 전략을 이용한 숫자 세기 (3)	2024.10.03
계산 사회과학: 집단 행동을 이해하는 새로운 렌즈 (Marr의 3단계 분석 방법) (11)	2024.10.03

현재글Bayesian Decision Model (베이지안 의사결정 모델) 쉽게 알아보기 - Part 2. 베이지안 의사결정 모델의 구성

blog.seashore 말도 생각도 많은 사람의 일상과 공부기록

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

blog.seashore