Question

0 0

PHP 爬取网页的时候遇到JS定时跳转

抓取到的网页内容为


 <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="cache-control" content="no-store" />
<meta http-equiv="Connection" content="Close" />
<script>
function JumpSelf()
{   
    self.location="/?WebShieldSessionVerify=PIHIFboME3yzpTl2p9T2";
}
</script>
<script>setTimeout("JumpSelf()",700);
</script>
</head>
<body>
</body>
</html>

爬取程序得到状态码是200
用fiddler抓取浏览器链接发现状态码是302
header中有Location: /(e24a2c455vo1xe45nlqfme55)/default2.aspx

是不是因为curl爬取到JS页面因为是定时700毫秒跳转所以curl就以为没有跳转就停止了？这该如何解决？用正则去匹配吗？

php curl 数据抓取 JavaScript

11 years, 11 months ago

拔哥小树杈

share

拔哥小树杈 11 years, 11 months ago

Answer 1

0

分2步，第一步抓你这个页面，提取js中的链接。第二步抓取js中链接对应页面。
可能要注意在多个页面中同步cookie. 具体可以查手册里 curl cookieJar cookieFile。

answered 11 years, 11 months ago

你们就好啦

share

你们就好啦 answered 11 years, 11 months ago

Answer 2

0

因为抓取程序不是浏览器，并不会执行script内的代码

answered 11 years, 11 months ago

虚无交响曲

share

虚无交响曲 answered 11 years, 11 months ago

PHP 爬取网页的时候遇到JS定时跳转

拔哥小树杈

Answers

你们就好啦

虚无交响曲

Your Answer