1.H. Driven IFS and Data Analysis

Coarse-Graining the Data

We turn the data y₁, y₂, ..., y_N into a sequence i₁, i₂, ..., i_N of 1s, 2s, 3s, and 4s, called the symbol string associated with the data.

Often the data values y_i are measured as decimals and because we are converting these to only four values, the process of turning the y_k into i_k is called coarse-graining.

The range of y values for corresponding to a symbol is the bin of that symbol.

Though there are others, we use five kinds of coarse-graining:

equal-size bins Divide the range of values into four intervals of equal length.

equal weight bins Arrange the bin boundaries so (approximately) the same number of points lie in each bin.

zero-centered bins For data whose sign is important, take 0 as the boundary between bins 2 and 3; place the other boundaries symmetrically above and below 0. Unlike the first two cases, this is a family of coarse-grainings depending on the placements of the other two bin boundaries.

mean-centered bins Take the mean of the data to be the boundary between bins 2 and 3; place the other boundaries symmetrically above and below the mean, usually expressed as a multiple of the standard deviation.

median-centered bins Take the median of the data to be the boundary between bins 2 and 3; place the other boundaries symmetrically above and below the median, usually expressed as a multiple of the range. Note the equal-weight bins are a special case of this.

Return to Data-Driven IFS