起因

Enable JavaScript and cookies to continue
us.shein.com needs to review the security of your connection before proceeding.
Ray ID: 78698c2b8b837cdd
Performance & security by Cloudflare
JS渲染chromedpscrapySplashJS渲染Splashchromedp

Splash运行环境

DockerSplash
# 下载 splash 镜像
sudo docker pull scrapinghub/splash
# 启动 splash 容器。因需要使用宿主机的HTTP代理服务。故添加 --net host 参数
sudo docker run -it --net host -p 8050:8050 --rm scrapinghub/splash
docker run--net hosthttp://127.0.0.1:8050

在Golang中使用

GET方法splashJS渲染
发起HTTP请求splashAPI接口splashJS渲染API接口JS渲染
Splash 的API接口lua_source
lua_source
function main(splash)
    return 'hello'
end

通过curl命令调用API接口示例:

# execute 方法
curl 'http://127.0.0.1:8050/execute?lua_source=function+main%28splash%29%0D%0A++return+%27hello%27%0D%0Aend'
net/urlurl.QueryEscapelua_sourceurlEncode

伪代码如下所示:

import (
	"fmt"
	neturl "net/url"
	"strconv"
	"strings"
	"time"
	"github.com/gocolly/colly/v2"
)

func GetSplashUrl(url string) string {
	luaSourceFmt := `splash:on_request(function(request)
	request:set_proxy{"0.0.0.0",1079}
	end)
	splash:set_user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36")
	assert(splash:go("%s"))
	return splash:html()
	`
	q := neturl.QueryEscape(fmt.Sprintf(luaSourceFmt, url))
	return "http://127.0.0.1:8050/run?lua_source=" + q
}

func main(){
    reqUrl := GetSplash("https://www.example.com")
    // TODO 通过reqUrl发起HTTP请求,返回结果已被Splash处理过,为JS渲染过的HTML页面。
}

通过HTTP请求调用Splash的API接口

  1. http://localhost:8050/render.html
  2. http://localhost:8050/render.png
  3. http://localhost:8050/execute
  4. http://localhost:8050/run
lua_source
function main(splash, args)
    assert(splash:go(args.url))
    assert(splash:wait(1.0))
    return splash:html()
end
lua_source
assert(splash:go(args.url))
assert(splash:wait(1.0))
return splash:html()

Splash操作脚本

操作方法

splash:gosplash:set_user_agent(value)splash:select

request 对象

splash:on_request(callback)
request:set_proxy{host, port, username=nil, password=nil, type='HTTP'}request:set_header(name, value)

splash:on_request

splash:on_request(callback)
# 添加代理
splash:on_request(function(request)
    request:set_proxy{
        host = "0.0.0.0",
        port = 8990,
        username = splash.args.username,
        password = splash.args.password,
    }
end)

https://splash.readthedocs.io/en/stable/scripting-tutorial.html
https://splash.readthedocs.io/en/stable/scripting-ref.html#splash-on-request
request 请求对象 https://splash.readthedocs.io/en/stable/scripting-request-object.html
Splash安装和使用 https://blog.csdn.net/qq_53582111/article/details/121649717
深入使用 Splash 服务 https://www.5axxw.com/wiki/content/hf16nn