Ich trainiere einen zufälligen ForestClassifier aus sklearn.ensemble mit dem folgenden Code: < /p>
adata = ad.read_h5ad(f'{data_dir}{ct}_clean_log1p_normalized.h5ad')
adata = adata[:, adata.var.highly_variable]
print(f'AnnData for {ct}: {adata}')
# Extract feature matrix (X) and target vector (y)
X = adata.X
y = adata.obs['clinical_dx']
# Convert sparse matrix to dense for [insert reason]
if issparse(X):
X = X.toarray()
# Encode the target variable for [insert reason]
le = LabelEncoder()
y_encoded = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)
# Initialize the classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the classifier
rf_classifier.fit(X_train, y_train)
# Validate on the test set
y_pred_rf = rf_classifier.predict(X_test)
# View validation report
validation_report = classification_report(y_test, y_pred_rf, target_names=le.classes_)
print(validation_report)
with open(f'{rfc_dir}validation_report.txt', "w") as report_file:
report_file.write(validation_report)
# Generate the confusion matrix
cm = confusion_matrix(y_test, y_pred_rf, labels=rf_classifier.classes_)
# Make percentage
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize=(10, 9))
sns.heatmap(cm_normalized, annot=True, fmt='.1%', cmap='Blues',
xticklabels=le.inverse_transform(rf_classifier.classes_),
yticklabels=le.inverse_transform(rf_classifier.classes_))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix (Random Forest)')
plt.savefig(f"{rfc_dir}confusion_matrix", bbox_inches='tight')
# Get feature importances
feature_importances_rf = rf_classifier.feature_importances_
number_of_features = 200
# Create a DataFrame for better visualization
feature_importance_rf_df = pd.DataFrame({
'Ensembl': adata.var_names,
'Importance': feature_importances_rf
})
top_features = feature_importance_rf_df.sort_values(
by='Importance', ascending=False
).head(number_of_features)
top_features.to_csv(f'{rfc_dir}markers.csv', index=False)
< /code>
Leider kann ich die Daten nicht weitergeben, da sie aus irgendeinem Grund HIPAA-geschützt sind. Die höchste Wertschöpfungsbewertung liegt auf jeden Fall bei etwa 0,01. Hat jemand das schon einmal erlebt?
Danke < /p>
In Ensemble. ⇐ Python
-
- Similar Topics
- Replies
- Views
- Last post