by Anonymous » 07 Feb 2025, 08:34
Ich versuche Scrapy in Django implementieren. Dafür hat mir dieses Thema geholfen. Ich schrotte keine Website.
:
Code: Select all
from scrapers.items import ScrapersItem
class ErascraperSpider(scrapy.Spider):
name = "erascraper"
allowed_domains = ["example.com"]
start_urls = ["https://example.com"]
def parse(self, response):
return ScrapersItem(name="Argus")
< /code>
mypipeline.py
:
Code: Select all
class ScrapersPipeline(object):
def process_item(self, item, spider):
item.save()
print("pipeline ok")
return item
< /code>
Also, I use Scrapy_DjangoItem
in meinen items.py .
Code: Select all
2024-05-21 22:01:37 [scrapy.core.engine] DEBUG: Crawled (200) (referer: None)
2024-05-21 22:01:37 [scrapy.core.scraper] ERROR: Error processing {'name': 'Argus'}
Traceback (most recent call last):
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1078, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy/utils/defer.py", line 340, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/scrapers/scrapers/pipelines.py", line 14, in process_item
item.save()
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy_djangoitem/__init__.py", line 35, in save
self.instance.save()
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 822, in save
self.save_base(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 909, in save_base
updated = self._save_table(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1071, in _save_table
results = self._do_insert(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1112, in _do_insert
return manager._insert(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/query.py", line 1847, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1821, in execute_sql
with self.connection.cursor() as cursor:
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/utils/asyncio.py", line 24, in inner
raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.
< /code>
Lösung < /li>
< /ol>
Ich habe viele Dinge darüber gelesen Lösung, insbesondere unter Verwendung von Sync_to_Sync und warte, aber ich sehe nicht, wo ich es in MyScript
verwenden kann, um diesen Fehler in meinen Schaker -Projekteinstellungen zu umgehen. Py
, Ich benutze < /p>
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"
< /code>
Howerver, according to the django documentation, this solution would be not convenient in in production environment as per documentation warnings. See this topic.
So, have you already met this issue and how have you solved it?
Ich versuche Scrapy in Django implementieren. Dafür hat mir dieses Thema geholfen. Ich schrotte keine Website.[code]myspider.py[/code]:
[code]from scrapers.items import ScrapersItem
class ErascraperSpider(scrapy.Spider):
name = "erascraper"
allowed_domains = ["example.com"]
start_urls = ["https://example.com"]
def parse(self, response):
return ScrapersItem(name="Argus")
< /code>
mypipeline.py[/code]:
[code]class ScrapersPipeline(object):
def process_item(self, item, spider):
item.save()
print("pipeline ok")
return item
< /code>
Also, I use Scrapy_DjangoItem[/code] in meinen items.py .[code]2024-05-21 22:01:37 [scrapy.core.engine] DEBUG: Crawled (200) (referer: None)
2024-05-21 22:01:37 [scrapy.core.scraper] ERROR: Error processing {'name': 'Argus'}
Traceback (most recent call last):
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1078, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy/utils/defer.py", line 340, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/scrapers/scrapers/pipelines.py", line 14, in process_item
item.save()
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy_djangoitem/__init__.py", line 35, in save
self.instance.save()
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 822, in save
self.save_base(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 909, in save_base
updated = self._save_table(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1071, in _save_table
results = self._do_insert(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1112, in _do_insert
return manager._insert(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/query.py", line 1847, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1821, in execute_sql
with self.connection.cursor() as cursor:
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/utils/asyncio.py", line 24, in inner
raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.
< /code>
Lösung < /li>
< /ol>
Ich habe viele Dinge darüber gelesen Lösung, insbesondere unter Verwendung von Sync_to_Sync und warte, aber ich sehe nicht, wo ich es in MyScript
verwenden kann, um diesen Fehler in meinen Schaker -Projekteinstellungen zu umgehen. Py [/code] , Ich benutze < /p>
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"
< /code>
Howerver, according to the django documentation, this solution would be not convenient in in production environment as per documentation warnings. See this topic.
So, have you already met this issue and how have you solved it?