Ersetzen Sie die Werte in Pandas DataFrame anhand eines anderen DataFrame als Nachschlagetabelle
Posted: 30 Apr 2025, 21:24
Ich möchte Werte in einem Datenrahmen durch die Werte in einem zweiten Datenrahmen ersetzen, indem ich die Werte im ersten Datenrahmen mit den Spalten aus dem zweiten Datenrahmen übereinstimmt. < /p>
Beispiel: < /p>
import numpy as np
import pandas as pd
dt_index = pd.to_datetime(['2003-05-01', '2003-05-02', '2003-05-03', '2003-05-04'])
df = pd.DataFrame({'A':[1,1,3,12], 'B':[12,1,3,3], 'C':[3,12,12,1]}, index = dt_index)
df2 = pd.DataFrame({1:[1.4,4.2,1.3,5.6], 12:[2.3,7.3,9.5,0.4], 3:[8.8,0.1,8.7,2.4], 4:[9.6,9.8,5.5,1.8]}, index = dt_index)
< /code>
df =
A B C
2003-05-01 1 12 3
2003-05-02 1 1 12
2003-05-03 3 3 12
2003-05-04 12 3 1
df2 =
1 12 3 4
2003-05-01 1.4 2.3 8.8 9.6
2003-05-02 4.2 7.3 0.1 9.8
2003-05-03 1.3 9.5 8.7 5.5
2003-05-04 5.6 0.4 2.4 1.8
< /code>
Expected output:
expect = pd.DataFrame({'A':[1.4,4.2,8.7,0.4], 'B':[2.3,4.2,8.7,2.4], 'C':[8.8,7.3,9.5,5.6]}, index = dt_index)
< /code>
expect =
A B C
2003-05-01 1.4 2.3 8.8
2003-05-02 4.2 4.2 7.3
2003-05-03 8.7 8.7 9.5
2003-05-04 0.4 2.4 5.6
< /code>
Attempt:
X = df.copy()
for i in np.unique(df):
X.mask(df == i, df2, axis=0, inplace=True)
< /code>
My attempt seems to work but I'm not sure if it has any pitfalls and how it would scale as the sizes of the Dataframe increase.
Are there better or faster solutions?
EDIT:
After cottontail's helpful answer, I realised I've made an oversimplification in my example. The values in df and columns of df and df2 cannot be assumed to be sequential.
I've now modified the example to reflect that.
Beispiel: < /p>
import numpy as np
import pandas as pd
dt_index = pd.to_datetime(['2003-05-01', '2003-05-02', '2003-05-03', '2003-05-04'])
df = pd.DataFrame({'A':[1,1,3,12], 'B':[12,1,3,3], 'C':[3,12,12,1]}, index = dt_index)
df2 = pd.DataFrame({1:[1.4,4.2,1.3,5.6], 12:[2.3,7.3,9.5,0.4], 3:[8.8,0.1,8.7,2.4], 4:[9.6,9.8,5.5,1.8]}, index = dt_index)
< /code>
df =
A B C
2003-05-01 1 12 3
2003-05-02 1 1 12
2003-05-03 3 3 12
2003-05-04 12 3 1
df2 =
1 12 3 4
2003-05-01 1.4 2.3 8.8 9.6
2003-05-02 4.2 7.3 0.1 9.8
2003-05-03 1.3 9.5 8.7 5.5
2003-05-04 5.6 0.4 2.4 1.8
< /code>
Expected output:
expect = pd.DataFrame({'A':[1.4,4.2,8.7,0.4], 'B':[2.3,4.2,8.7,2.4], 'C':[8.8,7.3,9.5,5.6]}, index = dt_index)
< /code>
expect =
A B C
2003-05-01 1.4 2.3 8.8
2003-05-02 4.2 4.2 7.3
2003-05-03 8.7 8.7 9.5
2003-05-04 0.4 2.4 5.6
< /code>
Attempt:
X = df.copy()
for i in np.unique(df):
X.mask(df == i, df2, axis=0, inplace=True)
< /code>
My attempt seems to work but I'm not sure if it has any pitfalls and how it would scale as the sizes of the Dataframe increase.
Are there better or faster solutions?
EDIT:
After cottontail's helpful answer, I realised I've made an oversimplification in my example. The values in df and columns of df and df2 cannot be assumed to be sequential.
I've now modified the example to reflect that.