Weird behavior when downloading html using HttpURLConnection

2018-06-28 02:25:58

In my Wikipedia reader app for Android, I'm downloading an article's html by using HttpURLConnection, some users report that they are unable to see articles, instead they see some css, so it seems like their carrier is somehow preprocessing the html before it's downloaded, while other wikipedia readers seem to work fine.

Example url: http://en.m.wikipedia.org/wiki/Black_Moon_(album)

My method:

public static String downloadString(String url) throws Exception
{
    StringBuilder downloadedHtml = new StringBuilder(); 

    HttpURLConnection urlConnection = null;
    String line = null;
    BufferedReader rd = null;

    try
    {
        URL targetUrl = new URL(url);

        urlConnection = (HttpURLConnection) targetUrl.openConnection();

        if (url.toLowerCase().contains("/special"))
            urlConnection.setInstanceFollowRedirects(true);
        else
            urlConnection.setInstanceFollowRedirects(false);

        //read the result from the server
        rd = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));

        while ((line = rd.readLine()) != null)
            downloadedHtml.append(line + 'n');
    }
    catch (Exception e)
    {
        AppLog.e("An exception occurred while downloading data.rn: " + e);
        e.printStackTrace();
    }
    finally
    {
        if (urlConnection != null)
        {
            AppLog.i("Disconnecting the http connection");
            urlConnection.disconnect();
        }

        if (rd != null)
            rd.close();
    }

    return downloadedHtml.toString();
}

I'm unable to reproduce this problem, but there must be a way to get around that? I even disabled redirects by setting setInstanceFollowRedirects to 'false' but it didn't help.

Am I missing something?

Example of what the users are reporting:

http://pastebin.com/1E3Hn2yX

carrier is somehow preprocessing the html before it's downloaded

a way to get around that?

Use HTTPS to prevent carriers from rewriting pages. (no citation)

Am I missing something?

not that I can see

链接地址: http://www.djcxy.com/p/78510.html

上一篇: 如何实现一个简单的马尔可夫模型来指定作者匿名文本？

下一篇: 使用HttpURLConnection下载html时出现奇怪的行为