H5W3
当前位置:H5W3 > 其他技术问题 > 正文

scrapy写爬虫 却返回不出东西

我的想法是 输入一个电影名 然后返回它的信息

# -*- coding: utf-8 -*-
import sys
sys.path.append("..")
reload(sys)
sys.setdefaultencoding('utf8')
from scrapy.spider import Spider
from scrapy.http import Request
from scrapy.selector import Selector
from scrapy.spiders import Rule,CrawlSpider
from items import doubanSpiderItem
from scrapy.contrib.linkextractors import LinkExtractor

class doubanSpider(CrawlSpider):

    name = 'doubanSpider'
    allowed_domains=[]
    start_urls = ['http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB']

    def start_requests(self):
        movie_name = raw_input("输入电影名:")
        try:
            url_head = "http://movie.douban.com/subject_search?search_text="
            self.start_urls.append(url_head+str(movie_name))
            for url in self.start_urls:
                yield self.make_requests_from_url(url)
        except:
            print "can not connect"
            # 获取搜索电影界面

    def parse(self, response):

        sel=Selector(response)
        print sel

        movie_link = sel.xpath("//div[@class='pl2']/a/@href/text()").extract()
        print movie_link
        if movie_link:
             yield Request(movie_link[0],callback=self.parse_item)
        #进入所搜索电影界面
    def parse_item(self,response):
        sel = Selector(response)
        movie_name = sel.xpath("//span[@property = 'v:itemreviewed']/text()").extract()
        print movie_name
        

这是我的代码 下面是terminal 的反应

timmys-MacBook-Pro:spiders apple$ scrapy crawl doubanSpider
/Users/apple/Desktop/doubanSpider/doubanSpider/spiders/doubanSpider.py:6: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
  from scrapy.spider import Spider
/Users/apple/Desktop/doubanSpider/doubanSpider/spiders/doubanSpider.py:11: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors` is deprecated, use `scrapy.linkextractors` instead
  from scrapy.contrib.linkextractors import LinkExtractor
2015-11-08 20:50:51 [scrapy] INFO: Scrapy 1.0.3 started (bot: doubanSpider)
2015-11-08 20:50:51 [scrapy] INFO: Optional features available: ssl, http11
2015-11-08 20:50:51 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'doubanSpider.spiders', 'SPIDER_MODULES': ['doubanSpider.spiders'], 'BOT_NAME': 'doubanSpider'}
2015-11-08 20:50:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-11-08 20:50:51 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-11-08 20:50:51 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-11-08 20:50:51 [scrapy] INFO: Enabled item pipelines: doubanSpiderPipeline
2015-11-08 20:50:51 [scrapy] INFO: Spider opened
2015-11-08 20:50:51 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-11-08 20:50:51 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
输入电影名:移动迷宫
2015-11-08 20:50:58 [scrapy] DEBUG: Crawled (200) <GET http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB> (referer: None)
2015-11-08 20:50:58 [scrapy] DEBUG: Crawled (200) <GET http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB> (referer: None)
<Selector xpath=None data=u'<html lang="zh-CN" class="">\n<head>\n    '>
[]
<Selector xpath=None data=u'<html lang="zh-CN" class="">\n<head>\n    '>
[]
2015-11-08 20:50:58 [scrapy] INFO: Closing spider (finished)
2015-11-08 20:50:58 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 554,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 16250,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 11, 8, 12, 50, 58, 566941),
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2015, 11, 8, 12, 50, 51, 888328)}
2015-11-08 20:50:58 [scrapy] INFO: Spider closed (finished)
timmys-MacBook-Pro:spiders apple$  

然后是豆瓣html
图片描述

回答:

你的xpath改改,哪有这样写的?@href/text()

本文地址:H5W3 » scrapy写爬虫 却返回不出东西

评论 0

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址