nutch1.13表单提交验证失败,如何解决?

想用nutch去抓取一个需要登录的网站用solr去做索引。实际执行后显示以下错误:

java.lang.IllegalArgumentException: No form exists: lzform
2017-07-07 14:24:52,256 ERROR httpclient.Http - Failed to get protocol output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form exists: lzform
 at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:505)
 at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:183)
 at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271)
 at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
Caused by: java.lang.IllegalArgumentException: No form exists: lzform
 at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
 at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
 at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:503)
 ... 3 more

nutch-site.xml里的plugin.includes已经做了如下配置:

<value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

httpclient-auth.xml做了如下配置:

<auth-configuration>
   <credentials authMethod="formAuth"
                loginUrl="xxxlogin"
                loginFormId="lzform"
                loginRedirect="true">
     <loginPostData>
       <field name="form_email"
              value="xxxxxxxxxx@gmail.com"/>
       <field name="form_password"
              value="xxxxxxx"/>
     </loginPostData>
     <additionalPostHeaders>
       <field name="User-Agent"
              value="Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko" />
     </additionalPostHeaders>
     <removedFormFields>
       <field name="remember"/>
     </removedFormFields>
     <loginCookie>
       <policy>BROWSER_COMPATIBILITY</policy>
     </loginCookie>
   </credentials>
</auth-configuration>

无论我指定的页面里面有没有id为lzform的表单,运行的结果都会报这个错。运行环境是虚拟机的Ubuntu16.04。

阅读 2.5k
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题