berechnet account_length als kumulative Anzahl von Datensätzen pro Konto < /li>
Dann berechnen Sie den Median von Account_Length pro Status. .with_columns () zweimal? < /p>
Code: Select all
df = pl.DataFrame({
"account_id": ["A", "A", "A", "B", "B", "C", "C", "C", "C"],
"status_date": ["2023-01-01", "2023-01-02", "2023-01-03", "2023-01-01", "2023-01-02", "2023-01-01", "2023-01-02", "2023-01-03", "2023-01-04"],
"value": [10, 20, 30, 40, 50, 60, 70, 80, 90]
})
print("Original data:")
print(df)
Original data:
shape: (9, 3)
┌────────────┬─────────────┬───────┐
│ account_id ┆ status_date ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞════════════╪═════════════╪═══════╡
│ A ┆ 2023-01-01 ┆ 10 │
│ A ┆ 2023-01-02 ┆ 20 │
│ A ┆ 2023-01-03 ┆ 30 │
│ B ┆ 2023-01-01 ┆ 40 │
│ B ┆ 2023-01-02 ┆ 50 │
│ C ┆ 2023-01-01 ┆ 60 │
│ C ┆ 2023-01-02 ┆ 70 │
│ C ┆ 2023-01-03 ┆ 80 │
│ C ┆ 2023-01-04 ┆ 90 │
└────────────┴─────────────┴───────┘
< /code>
Ich würde es gerne tun: < /p>
df = df.with_columns(
pl.col("status_date")
.cum_count()
.over("account_id")
.median()
.over("status_date")
.alias("account_length_median")
)
Out:
Error: ComputeError: cannot nest window expressions