# Stochastic Data Processing

#### Members

R.Kimura, Y.Kusakabe

#### Purpose

Proposal of conservative estimation for statistics, development of application applying conservative estimation

#### Keyword

Conservative estimation, Observation frequency, Conditional probability, Likelihood ratio

#### Summary

Statistical estimation using the frequency of observation of events can be easily realized on a computer, and is often performed even recently when large-scale data is handled (For example, the probability of occurrence of a word is presumed from the number of occurrences of the word in the text). Most of these estimation use an unbiased estimator. However, when estimating from a low frequency, the estimated value becomes unstable and the estimated value may be overestimated in using an unbiased estimator. Therefore, it is often devised to estimate the statistic only from the frequency above the threshold, but this method cannot handle low-frequency events below the threshold.

We devised an approach called a "Conservative Estimation", in which the estimated values are intentionally biased to a lower level according to the frequency. So far, we have proposed a conservative estimation for two statistics, conditional probability and likelihood ratio. Furthermore, we applied conservative estimation to various practical tasks such as association rule mining, named-entity recognition, and multi-armed bandit problem, and confirmed their effectiveness. The conservative estimation makes it possible to treat high-frequency events with priority and statistically treat low-frequency but important events without ignoring them. Future prospects include the realization of conservative estimation for other statistics and the development of applications centered on conservative estimation.