weights - bossanova

Weight computation utilities for weighted least squares.

Classes:

Name	Description
`WeightInfo`	Metadata for weights derived from factor columns.

Functions:

Name	Description
`compute_inverse_variance_weights`	Compute inverse-variance weights from a factor column.
`detect_weight_type`	Check if a column is categorical (should use inverse-variance weights).

Classes¶

WeightInfo¶

WeightInfo(weights: np.ndarray, column: str, group_labels: list[str], group_variances: np.ndarray, group_counts: np.ndarray, group_indices: np.ndarray) -> None

Metadata for weights derived from factor columns.

This dataclass stores information needed for inference adjustments when weights come from a categorical column (inverse-variance weighting).

Attributes:

Name	Type	Description
`weights`	`ndarray`	Weight array, shape (n,). Contains w_i = 1/var(y
`column`	`str`	Original column name used for weights.
`group_labels`	`list[str]`	Group names (factor levels).
`group_variances`	`ndarray`	Variance of y within each group, shape (k,).
`group_counts`	`ndarray`	Number of observations per group, shape (k,).
`group_indices`	`ndarray`	Group membership per observation, shape (n,). Values are 0-indexed indices into group_labels.

Attributes¶

column¶

column: str

group_counts¶

group_counts: np.ndarray

group_indices¶

group_indices: np.ndarray

group_labels¶

group_labels: list[str]

group_variances¶

group_variances: np.ndarray

weights¶

weights: np.ndarray

Functions¶

compute_inverse_variance_weights¶

compute_inverse_variance_weights(data: pl.DataFrame, y_col: str, group_col: str, valid_mask: np.ndarray | None = None) -> WeightInfo

Compute inverse-variance weights from a factor column.

For each observation, computes w_i = 1/var(y|group_i), where var(y|group_i) is the variance of y within the observation’s group.

This implements the standard inverse-variance weighting used in meta-analysis and Welch’s t-test. When combined with WLS, this gives more weight to groups with less variability.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	DataFrame containing both columns.	required
`y_col`	`str`	Name of the response variable column.	required
`group_col`	`str`	Name of the grouping column (factor).	required
`valid_mask`	`ndarray \| None`	Boolean mask for valid (non-missing) observations. If None, all observations are considered valid.	`None`

Returns:

Type	Description
`WeightInfo`	WeightInfo containing weights and group statistics.

Notes:

Groups with a single observation get infinite weight (var=0); we use a small epsilon to avoid division by zero.
The weights are computed on valid observations only, but the returned weight array has length equal to the original data.

Examples:

>>> import polars as pl
>>> df = pl.DataFrame({
...     "y": [1.0, 2.0, 3.0, 10.0, 11.0, 12.0],
...     "group": ["A", "A", "A", "B", "B", "B"],
... })
>>> info = compute_inverse_variance_weights(df, "y", "group")
>>> # Group A and B both have var=1.0, so weights are equal
>>> info.weights
array([1., 1., 1., 1., 1., 1.])

detect_weight_type¶

detect_weight_type(data: pl.DataFrame, col: str) -> bool

Check if a column is categorical (should use inverse-variance weights).

Returns True for String, Categorical, and Enum dtypes, which indicate the column represents factor levels rather than numeric weights.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	DataFrame containing the column.	required
`col`	`str`	Column name to check.	required

Returns:

Type	Description
`bool`	True if the column is categorical, False if numeric.

Examples:

>>> import polars as pl
>>> df = pl.DataFrame({"group": ["A", "B", "A"], "w": [1.0, 2.0, 1.0]})
>>> detect_weight_type(df, "group")
True
>>> detect_weight_type(df, "w")
False