Code: Select all
import numpy as np
import pandas as pd
df = pd.DataFrame({
"Digit": [
1, 3, 5, 7, 0, 0, 0,
4, 8, 9, 7, 7, 7, 7,
9, 3, 3, 1, 6, 8, 0,
8, 8, 8, 8, 8, 3, 1,
]
})
Code: Select all
Digit
0 1
1 3
2 5
3 7
4 0
5 0
6 0
7 4
8 8
9 9
10 7
11 7
12 7
13 7
14 9
15 3
16 3
17 1
18 6
19 8
20 0
21 8
22 8
23 8
24 8
25 8
26 3
27 1
Ich dachte mir, es so zu machen: Erstellen Sie zunächst eine Spalte für jeden Wert von k, sodass dieser gleich 1 ist, wenn Digit mit den nächsten k Ziffern identisch ist
Code: Select all
for i in np.arange(2, 5+1):
df[f"k{i}"] = np.prod(
[df.Digit == df.Digit.shift(-j) for j in np.arange(1, i)],
axis=0
)
Code: Select all
Digit k2 k3 k4 k5
0 1 0 0 0 0
1 3 0 0 0 0
2 5 0 0 0 0
3 7 0 0 0 0
4 0 1 1 0 0
5 0 1 0 0 0
6 0 0 0 0 0
7 4 0 0 0 0
8 8 0 0 0 0
9 9 0 0 0 0
10 7 1 1 1 0
11 7 1 1 0 0
12 7 1 0 0 0
13 7 0 0 0 0
14 9 0 0 0 0
15 3 1 0 0 0
16 3 0 0 0 0
17 1 0 0 0 0
18 6 0 0 0 0
19 8 0 0 0 0
20 0 0 0 0 0
21 8 1 1 1 1
22 8 1 1 1 0
23 8 1 1 0 0
24 8 1 0 0 0
25 8 0 0 0 0
26 3 0 0 0 0
27 1 0 0 0 0
Code: Select all
for i in np.arange(2, 5+1):
df[f"k{i}"] = (df[f"k{i}"] == 1) & (df[f"k{i}"] != df[f"k{i}"].shift(1))
Code: Select all
Digit k2 k3 k4 k5
0 1 False False False False
1 3 False False False False
2 5 False False False False
3 7 False False False False
4 0 True True False False
5 0 False False False False
6 0 False False False False
7 4 False False False False
8 8 False False False False
9 9 False False False False
10 7 True True True False
11 7 False False False False
12 7 False False False False
13 7 False False False False
14 9 False False False False
15 3 True False False False
16 3 False False False False
17 1 False False False False
18 6 False False False False
19 8 False False False False
20 0 False False False False
21 8 True True True True
22 8 False False False False
23 8 False False False False
24 8 False False False False
25 8 False False False False
26 3 False False False False
27 1 False False False False
Code: Select all
def index_of_identical_consecutive_digits(_df, colname, max_id=6):
for i in np.arange(2, max_id+1):
_df[f"k{i}"] = np.prod(
[_df[colname] == _df[colname].shift(-j) for j in np.arange(1, i)],
axis=0
)
for i in np.arange(2, max_id+1):
_df[f"k{i}"] = (_df[f"k{i}"] == 1) & (_df[f"k{i}"] != _df[f"k{i}"].shift(1))
return _df
np.random.seed(0)
DF = pd.DataFrame({
"Digit": np.random.randint(10, size=1_000_000_000)
})
DF = index_of_identical_consecutive_digits(DF, "Digit")
Mobile version