Aioscpy 0.3.12 | Coderz Repository

aioscpy 0.3.12

Last updated:

0 purchases

aioscpy 0.3.12 Image
aioscpy 0.3.12 Images

Free

Languages

Categories

Add to Cart

Description:

aioscpy 0.3.12

Aioscpy
An asyncio + aiolibs crawler imitate scrapy framework
English | 中文
Overview
Aioscpy framework is base on opensource project Scrapy & scrapy_redis.
Aioscpy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Dynamic variable injection is implemented and asynchronous coroutine feature support.
Distributed crawling/scraping.
Requirements

Python 3.8+
Works on Linux, Windows, macOS, BSD

Install
The quick way:
# default
pip install aioscpy

# at latest version
pip install git+https://github.com/ihandmine/aioscpy

# install all dependencies
pip install aioscpy[all]

# install extra packages
pip install aioscpy[aiohttp,httpx]

Usage
create project spider:
aioscpy startproject project_quotes

cd project_quotes
aioscpy genspider quotes


quotes.py:
from aioscpy.spider import Spider


class QuotesSpider(Spider):
name = 'quotes'
custom_settings = {
"SPIDER_IDLE": False
}
start_urls = [
'https://quotes.toscrape.com/tag/humor/',
]

async def parse(self, response):
for quote in response.css('div.quote'):
yield {
'author': quote.xpath('span/small/text()').get(),
'text': quote.css('span.text::text').get(),
}

next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)

create single script spider:
aioscpy onespider single_quotes

single_quotes.py:
from aioscpy.spider import Spider
from anti_header import Header
from pprint import pprint, pformat


class SingleQuotesSpider(Spider):
name = 'single_quotes'
custom_settings = {
"SPIDER_IDLE": False
}
start_urls = [
'https://quotes.toscrape.com/',
]

async def process_request(self, request):
request.headers = Header(url=request.url, platform='windows', connection=True).random
return request

async def process_response(self, request, response):
if response.status in [404, 503]:
return request
return response

async def process_exception(self, request, exc):
raise exc

async def parse(self, response):
for quote in response.css('div.quote'):
yield {
'author': quote.xpath('span/small/text()').get(),
'text': quote.css('span.text::text').get(),
}

next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)

async def process_item(self, item):
self.logger.info("{item}", **{'item': pformat(item)})


if __name__ == '__main__':
quotes = SingleQuotesSpider()
quotes.start()

run the spider:
aioscpy crawl quotes
aioscpy runspider quotes.py


start.py:
from aioscpy.crawler import call_grace_instance
from aioscpy.utils.tools import get_project_settings

"""start spider method one:
from cegex.baidu import BaiduSpider
from cegex.httpbin import HttpBinSpider

process = CrawlerProcess()
process.crawl(HttpBinSpider)
process.crawl(BaiduSpider)
process.start()
"""


def load_file_to_execute():
process = call_grace_instance("crawler_process", get_project_settings())
process.load_spider(path='./cegex', spider_like='baidu')
process.start()


def load_name_to_execute():
process = call_grace_instance("crawler_process", get_project_settings())
process.crawl('baidu', path="./cegex")
process.start()


if __name__ == '__main__':
load_file_to_execute()

more commands:
aioscpy -h

Ready
please submit your sugguestion to owner by issue
Thanks
aiohttp
scrapy
loguru
httpx

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product: (if this is empty don't purchase this product)

Customer Reviews

There are no reviews.