Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Weight computation utilities for weighted least squares.

Classes:

NameDescription
WeightInfoMetadata for weights derived from factor columns.

Functions:

NameDescription
compute_inverse_variance_weightsCompute inverse-variance weights from a factor column.
detect_weight_typeCheck if a column is categorical (should use inverse-variance weights).

Classes

WeightInfo

WeightInfo(weights: np.ndarray, column: str, group_labels: list[str], group_variances: np.ndarray, group_counts: np.ndarray, group_indices: np.ndarray) -> None

Metadata for weights derived from factor columns.

This dataclass stores information needed for inference adjustments when weights come from a categorical column (inverse-variance weighting).

Attributes:

NameTypeDescription
weightsndarrayWeight array, shape (n,). Contains w_i = 1/var(y
columnstrOriginal column name used for weights.
group_labelslist[str]Group names (factor levels).
group_variancesndarrayVariance of y within each group, shape (k,).
group_countsndarrayNumber of observations per group, shape (k,).
group_indicesndarrayGroup membership per observation, shape (n,). Values are 0-indexed indices into group_labels.

Attributes

column
column: str
group_counts
group_counts: np.ndarray
group_indices
group_indices: np.ndarray
group_labels
group_labels: list[str]
group_variances
group_variances: np.ndarray
weights
weights: np.ndarray

Functions

compute_inverse_variance_weights

compute_inverse_variance_weights(data: pl.DataFrame, y_col: str, group_col: str, valid_mask: np.ndarray | None = None) -> WeightInfo

Compute inverse-variance weights from a factor column.

For each observation, computes w_i = 1/var(y|group_i), where var(y|group_i) is the variance of y within the observation’s group.

This implements the standard inverse-variance weighting used in meta-analysis and Welch’s t-test. When combined with WLS, this gives more weight to groups with less variability.

Parameters:

NameTypeDescriptionDefault
dataDataFrameDataFrame containing both columns.required
y_colstrName of the response variable column.required
group_colstrName of the grouping column (factor).required
valid_maskndarray | NoneBoolean mask for valid (non-missing) observations. If None, all observations are considered valid.None

Returns:

TypeDescription
WeightInfoWeightInfo containing weights and group statistics.

Notes:

Examples:

>>> import polars as pl
>>> df = pl.DataFrame({
...     "y": [1.0, 2.0, 3.0, 10.0, 11.0, 12.0],
...     "group": ["A", "A", "A", "B", "B", "B"],
... })
>>> info = compute_inverse_variance_weights(df, "y", "group")
>>> # Group A and B both have var=1.0, so weights are equal
>>> info.weights
array([1., 1., 1., 1., 1., 1.])

detect_weight_type

detect_weight_type(data: pl.DataFrame, col: str) -> bool

Check if a column is categorical (should use inverse-variance weights).

Returns True for String, Categorical, and Enum dtypes, which indicate the column represents factor levels rather than numeric weights.

Parameters:

NameTypeDescriptionDefault
dataDataFrameDataFrame containing the column.required
colstrColumn name to check.required

Returns:

TypeDescription
boolTrue if the column is categorical, False if numeric.

Examples:

>>> import polars as pl
>>> df = pl.DataFrame({"group": ["A", "B", "A"], "w": [1.0, 2.0, 1.0]})
>>> detect_weight_type(df, "group")
True
>>> detect_weight_type(df, "w")
False