Mehrere REST -API -Aufrufe von 1M -Dateneinträgen mit Datenbanken + Scala? - Programmiererforum

Mehrere REST -API -Aufrufe von 1M -Dateneinträgen mit Datenbanken + Scala? ⇐ Python

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Anonymous

Mehrere REST -API -Aufrufe von 1M -Dateneinträgen mit Datenbanken + Scala?

Post by Anonymous » 24 Feb 2025, 02:00

Ich versuche einen API -Anruf zu erhalten, um alle Gebäude in La County zu erhalten. Die Website für den Datensatz ist hier < /p>
Der Landkreis verfügt über 3 Millionen Gebäude, die Gebäude auf 1 Million gefiltert haben. Sie können sich meine query_params im Code ansehen. >
Auf der ESRI -Entwickler -Website verstehe ich, dass 1 Einzel -API -Anruf auf 10.000 Ergebnisse begrenzt ist. Aufgrund meines Problems muss ich jedoch alle 1 Million Gebäude abrufen. Br />import aiohttp
import asyncio
import nest_asyncio

nest_asyncio.apply() # Required if running in Jupyter Notebook

# Base URL for the API query
BASE_URL = "https://services.arcgis.com/RmCCgQtiZLD ... er/1/query"

# Parameters for the query
QUERY_PARAMS = {
"where": "(HEIGHT < 33) AND UseType = 'RESIDENTIAL' AND SitusCity IN('LOS ANGELES CA','BEVERLY HILLS CA', 'PALMDALE')",
"outFields": "*",
"outSR": "4326",
"f": "json",
"resultRecordCount": 1000, # Fetch 1000 records per request
}

async def fetch_total_count():
"""Fetch total number of matching records."""
params = QUERY_PARAMS.copy()
params["returnCountOnly"] = "true"

async with aiohttp.ClientSession() as session:
async with session.get(BASE_URL, params=params) as response:
data = await response.json()
return data.get("count", 0) # Extract total count

async def fetch(session, offset):
"""Fetch a batch of records using pagination."""
params = QUERY_PARAMS.copy()
params["resultOffset"] = offset

async with session.get(BASE_URL, params=params) as response:
return await response.json()

async def main():
"""Fetch all records asynchronously with pagination."""
all_data = []
total_count = await fetch_total_count()
print(f"Total Records to Retrieve: {total_count}")

semaphore = asyncio.Semaphore(10) # Limit concurrency to prevent API overload

async with aiohttp.ClientSession() as session:
async def bound_fetch(offset):
async with semaphore:
data = await fetch(session, offset)
return data

# Generate tasks for pagination
tasks = [bound_fetch(offset) for offset in range(0, total_count, 1000)]
results = await asyncio.gather(*tasks)

for data in results:
if "features" in data:
all_data.extend(data["features"])

print(f"Total Records Retrieved: {len(all_data)}")
return all_data

# Run the async function
all_data = asyncio.run(main())
< /code>
Ich habe mich an Databricks + Scala gewandt, um das Datenabruf schneller zu beschleunigen. Aber ich bin brandneu für Big Data Computing. Ich bin mir ein wenig bewusst, dass Sie Ihre API -Anrufe "parallisieren" und sie zu einem Big DataFrame kombinieren müssen?
Kann mir jemand Vorschläge geben?

1740358801

Anonymous

Ich versuche einen API -Anruf zu erhalten, um alle Gebäude in La County zu erhalten. Die Website für den Datensatz ist hier < /p>
Der Landkreis verfügt über 3 Millionen Gebäude, die Gebäude auf 1 Million gefiltert haben. Sie können sich meine query_params  im Code ansehen. >
Auf der ESRI -Entwickler -Website verstehe ich, dass 1 Einzel -API -Anruf auf 10.000 Ergebnisse begrenzt ist. Aufgrund meines Problems muss ich jedoch alle 1 Million Gebäude abrufen. Br />import aiohttp
import asyncio
import nest_asyncio

nest_asyncio.apply()  # Required if running in Jupyter Notebook

# Base URL for the API query
BASE_URL = "https://services.arcgis.com/RmCCgQtiZLDCtblq/arcgis/rest/services/Countywide_Building_Outlines/FeatureServer/1/query"

# Parameters for the query
QUERY_PARAMS = {
"where": "(HEIGHT < 33) AND UseType = 'RESIDENTIAL' AND SitusCity IN('LOS ANGELES CA','BEVERLY HILLS CA',  'PALMDALE')",
"outFields": "*",
"outSR": "4326",
"f": "json",
"resultRecordCount": 1000,  # Fetch 1000 records per request
}

async def fetch_total_count():
"""Fetch total number of matching records."""
params = QUERY_PARAMS.copy()
params["returnCountOnly"] = "true"

async with aiohttp.ClientSession() as session:
async with session.get(BASE_URL, params=params) as response:
data = await response.json()
return data.get("count", 0)  # Extract total count

async def fetch(session, offset):
"""Fetch a batch of records using pagination."""
params = QUERY_PARAMS.copy()
params["resultOffset"] = offset

async with session.get(BASE_URL, params=params) as response:
return await response.json()

async def main():
"""Fetch all records asynchronously with pagination."""
all_data = []
total_count = await fetch_total_count()
print(f"Total Records to Retrieve: {total_count}")

semaphore = asyncio.Semaphore(10)  # Limit concurrency to prevent API overload

async with aiohttp.ClientSession() as session:
async def bound_fetch(offset):
async with semaphore:
data = await fetch(session, offset)
return data

# Generate tasks for pagination
tasks = [bound_fetch(offset) for offset in range(0, total_count, 1000)]
results = await asyncio.gather(*tasks)

for data in results:
if "features" in data:
all_data.extend(data["features"])

print(f"Total Records Retrieved: {len(all_data)}")
return all_data

# Run the async function
all_data = asyncio.run(main())
< /code>
Ich habe mich an Databricks + Scala gewandt, um das Datenabruf schneller zu beschleunigen. Aber ich bin brandneu für Big Data Computing. Ich bin mir ein wenig bewusst, dass Sie Ihre API -Anrufe "parallisieren" und sie zu einem Big DataFrame kombinieren müssen? 
Kann mir jemand Vorschläge geben?

Post Reply Previous topic Next topic

1 post • Page 1 of 1

Quick Reply

Username:

Change Text Case:

Smilies

View more smilies

Similar Topics

Replies

Views

Last post

Wie synchronisiere ich serverseitige Änderungen mit mobilen Datenbanken mithilfe der Spring REST API?

Last post by Anonymous « 06 Jan 2025, 04:41
Posted in Android

by Anonymous » 06 Jan 2025, 04:41 » in Android

Ich bin neu in den Spring-REST-API-Konzepten und kann REST-Methoden über mobile Apps (Android und iOS) aufrufen.
Meine Frage ist jedoch, ob Änderungen vorgenommen werden Wie sende ich auf der...

0 Replies

16 Views

Last post by Anonymous
06 Jan 2025, 04:41
Wie synchronisiere ich serverseitige Änderungen mit mobilen Datenbanken mithilfe der Spring REST API?

Last post by Anonymous « 06 Jan 2025, 04:41
Posted in IOS

by Anonymous » 06 Jan 2025, 04:41 » in IOS

Ich bin neu in den Spring-REST-API-Konzepten und kann REST-Methoden über mobile Apps (Android und iOS) aufrufen.
Meine Frage ist jedoch, ob Änderungen vorgenommen werden Wie sende ich auf der...

0 Replies

12 Views

Last post by Anonymous
06 Jan 2025, 04:41
Microsoft Fabric REST API – So rufen Sie die REST API auf, ohne Zugriff auf Workspace zu gewähren

Last post by Guest « 03 Jan 2025, 09:36
Posted in Python

by Guest » 03 Jan 2025, 09:36 » in Python

Ich verwende eine App-Registrierung, um mich über ein Python-Skript bei der Microsoft Fabric REST API zu authentifizieren.
Dazu habe ich die folgenden Schritte ausgeführt:

Ich habe eine Anwendung...

0 Replies

29 Views

Last post by Guest
03 Jan 2025, 09:36
API zum Löschen von Elementen für die iOS-App (Aufrufe an PHP für den API-Aufruf)

Last post by Anonymous « 29 Dec 2024, 10:49
Posted in Php

by Anonymous » 29 Dec 2024, 10:49 » in Php

Hier ist unser aktuelles Setup. Wir haben eine iOS-App, die API-Aufrufe an mein PHP-Skript durchführt, das die Anfrage verarbeitet und eine Datenbank über PDO und MySQL abfragt. In diesem Fall gibt...

0 Replies

25 Views

Last post by Anonymous
29 Dec 2024, 10:49
Snapchat -ADS -API - DateTime & TimeZone für API -Aufrufe führt zu einem Fehler aufgrund von TimeZone -Änderungen

Last post by Anonymous « 04 Feb 2025, 23:14
Posted in Python

by Anonymous » 04 Feb 2025, 23:14 » in Python

Ich arbeite an Snapchat -Anzeigen -API. (Schreiben Sie einen Anschluss zu einer großen Abfrage). Derzeit erhalte ich Metriken (Leistungsstatistiken) bei Day's Granularity. Ich übergeben Sie Folgendes...

0 Replies

25 Views

Last post by Anonymous
04 Feb 2025, 23:14

Return to “Python”