Check if a URL goes to a page containing the text "404"

I have a bash script to check the HTTP status code of a list of urls, but I realize that some, while appearing to be "200", display actually a page containing "error 404". How could I check for that ?

Here's my current script :

#!/bin/bash
while read LINE; do
  curl -o /dev/null --silent --head --write-out '%{http_code}n' "$LINE"
done < url-list.txt

(I got it from a precedent question : script to get the HTTP status code of a list of urls ?)

EDIT There seems to be a bug in the script : it returns "200" but if I wget -o log that same adress I get "404 not found"


为了好玩 - 这里有一个BASH解决方案:

dosomething() {
        code="$1"; url="$2"
        case "$code" in
                200) echo "OK for $url";;
                302) echo "redir for $url";;
                404) echo "notfound for $url";;
                *) echo "other $code for $url";;
        esac
}

#MAIN program
while read url
do
        uri=($(echo "$url" | sed 's~http://([^/][^/]*)(.*)~1 2~'))
        HOST=${uri[0]:=localhost}
        FILE=${uri[1]:=/}
        exec {SOCKET}<>/dev/tcp/$HOST/80
        echo -ne "GET $FILE HTTP/1.1nHost: $HOSTnn" >&${SOCKET}
        res=($(<&${SOCKET} sed '/^.$/,$d' | grep '^HTTP'))
        dosomething ${res[1]} "$url"
done << EOF
http://stackoverflow.com
http://stackoverflow.com/some/bad/url
EOF

Well, you could grok the response body and look for "404", "Error 404", "Not Found", "404 Not Found" etc printed in plaintext, but that is likely to give both false negatives and false positives. Though if the server sends 200 for what's supposed to be a 404 somebody didn't do their job right.

链接地址: http://www.djcxy.com/p/45602.html

上一篇: 数据库的HTTP状态码已关闭

下一篇: 检查网址是否转到包含文字“404”的页面