Weight computation utilities for weighted least squares.
Classes:
| Name | Description |
|---|---|
WeightInfo | Metadata for weights derived from factor columns. |
Functions:
| Name | Description |
|---|---|
compute_inverse_variance_weights | Compute inverse-variance weights from a factor column. |
detect_weight_type | Check if a column is categorical (should use inverse-variance weights). |
Classes¶
WeightInfo¶
WeightInfo(weights: np.ndarray, column: str, group_labels: list[str], group_variances: np.ndarray, group_counts: np.ndarray, group_indices: np.ndarray) -> NoneMetadata for weights derived from factor columns.
This dataclass stores information needed for inference adjustments when weights come from a categorical column (inverse-variance weighting).
Attributes:
| Name | Type | Description |
|---|---|---|
weights | ndarray | Weight array, shape (n,). Contains w_i = 1/var(y |
column | str | Original column name used for weights. |
group_labels | list[str] | Group names (factor levels). |
group_variances | ndarray | Variance of y within each group, shape (k,). |
group_counts | ndarray | Number of observations per group, shape (k,). |
group_indices | ndarray | Group membership per observation, shape (n,). Values are 0-indexed indices into group_labels. |
Attributes¶
column¶
column: strgroup_counts¶
group_counts: np.ndarraygroup_indices¶
group_indices: np.ndarraygroup_labels¶
group_labels: list[str]group_variances¶
group_variances: np.ndarrayweights¶
weights: np.ndarrayFunctions¶
compute_inverse_variance_weights¶
compute_inverse_variance_weights(data: pl.DataFrame, y_col: str, group_col: str, valid_mask: np.ndarray | None = None) -> WeightInfoCompute inverse-variance weights from a factor column.
For each observation, computes w_i = 1/var(y|group_i), where var(y|group_i) is the variance of y within the observation’s group.
This implements the standard inverse-variance weighting used in meta-analysis and Welch’s t-test. When combined with WLS, this gives more weight to groups with less variability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | DataFrame containing both columns. | required |
y_col | str | Name of the response variable column. | required |
group_col | str | Name of the grouping column (factor). | required |
valid_mask | ndarray | None | Boolean mask for valid (non-missing) observations. If None, all observations are considered valid. | None |
Returns:
| Type | Description |
|---|---|
WeightInfo | WeightInfo containing weights and group statistics. |
Notes:
Groups with a single observation get infinite weight (var=0); we use a small epsilon to avoid division by zero.
The weights are computed on valid observations only, but the returned weight array has length equal to the original data.
Examples:
>>> import polars as pl
>>> df = pl.DataFrame({
... "y": [1.0, 2.0, 3.0, 10.0, 11.0, 12.0],
... "group": ["A", "A", "A", "B", "B", "B"],
... })
>>> info = compute_inverse_variance_weights(df, "y", "group")
>>> # Group A and B both have var=1.0, so weights are equal
>>> info.weights
array([1., 1., 1., 1., 1., 1.])detect_weight_type¶
detect_weight_type(data: pl.DataFrame, col: str) -> boolCheck if a column is categorical (should use inverse-variance weights).
Returns True for String, Categorical, and Enum dtypes, which indicate the column represents factor levels rather than numeric weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | DataFrame | DataFrame containing the column. | required |
col | str | Column name to check. | required |
Returns:
| Type | Description |
|---|---|
bool | True if the column is categorical, False if numeric. |
Examples:
>>> import polars as pl
>>> df = pl.DataFrame({"group": ["A", "B", "A"], "w": [1.0, 2.0, 1.0]})
>>> detect_weight_type(df, "group")
True
>>> detect_weight_type(df, "w")
False