In Ensemble.Python

Python-Programme
Anonymous
 In Ensemble.

Post by Anonymous »

Ich trainiere einen zufälligen ForestClassifier aus sklearn.ensemble mit dem folgenden Code: < /p>
adata = ad.read_h5ad(f'{data_dir}{ct}_clean_log1p_normalized.h5ad')
adata = adata[:, adata.var.highly_variable]
print(f'AnnData for {ct}: {adata}')

# Extract feature matrix (X) and target vector (y)
X = adata.X
y = adata.obs['clinical_dx']

# Convert sparse matrix to dense for [insert reason]
if issparse(X):
X = X.toarray()

# Encode the target variable for [insert reason]
le = LabelEncoder()
y_encoded = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

# Initialize the classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier
rf_classifier.fit(X_train, y_train)

# Validate on the test set
y_pred_rf = rf_classifier.predict(X_test)

# View validation report
validation_report = classification_report(y_test, y_pred_rf, target_names=le.classes_)
print(validation_report)

with open(f'{rfc_dir}validation_report.txt', "w") as report_file:
report_file.write(validation_report)

# Generate the confusion matrix
cm = confusion_matrix(y_test, y_pred_rf, labels=rf_classifier.classes_)

# Make percentage
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

plt.figure(figsize=(10, 9))
sns.heatmap(cm_normalized, annot=True, fmt='.1%', cmap='Blues',
xticklabels=le.inverse_transform(rf_classifier.classes_),
yticklabels=le.inverse_transform(rf_classifier.classes_))

plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix (Random Forest)')

plt.savefig(f"{rfc_dir}confusion_matrix", bbox_inches='tight')

# Get feature importances
feature_importances_rf = rf_classifier.feature_importances_

number_of_features = 200

# Create a DataFrame for better visualization
feature_importance_rf_df = pd.DataFrame({
'Ensembl': adata.var_names,
'Importance': feature_importances_rf
})

top_features = feature_importance_rf_df.sort_values(
by='Importance', ascending=False
).head(number_of_features)

top_features.to_csv(f'{rfc_dir}markers.csv', index=False)

< /code>
Leider kann ich die Daten nicht weitergeben, da sie aus irgendeinem Grund HIPAA-geschützt sind. Die höchste Wertschöpfungsbewertung liegt auf jeden Fall bei etwa 0,01. Hat jemand das schon einmal erlebt?
Danke < /p>

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post