我們經(jīng)常需要用到互聯(lián)網(wǎng)上的一些共享資源,圖片就是資源的一種,怎么把網(wǎng)頁上的圖片批量下載下來?有時(shí)候我們需要把網(wǎng)頁上的圖片下載下來,但網(wǎng)頁上圖片那么多,怎么下載我們想要的東西呢,如果這個(gè)網(wǎng)頁都是我們想要的圖片,難道我們要一點(diǎn)一點(diǎn)一張一張右鍵下載嗎? 當(dāng)然不好,這里提供一段Java實(shí)現(xiàn)的網(wǎng)絡(luò)爬蟲抓圖片代碼,程序員同志有喜歡的記得收藏哦。
材料:必須會(huì)java開發(fā),用到的核心jar Jsoup自己去網(wǎng)上下載很多。以下是我已經(jīng)實(shí)現(xiàn)的界面化的抓取圖片的在線工具,有興趣的朋友可以按照?qǐng)D片地址打開看看
下圖是抓取效果網(wǎng)絡(luò)上隨便找第一個(gè)美女圖片網(wǎng)站
下面是實(shí)現(xiàn)代碼:
/**
*模擬用戶請(qǐng)求
*/
public final static String UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.6821.400
QQBrowser/10.3.3040.400";
/*
*
*抓取全部圖片地址 備注:zfilepath是zip文件路徑 url是網(wǎng)頁地址 pp是img的其中屬性一般是src即可
*/
public static boolean getImgSrc(String zfilepath,String url,String pp){
boolean isb =false;
// 利用Jsoup獲得連接
Connection connect = Jsoup.connect(url).timeout(5000);
connect.header("Connection", "Keep-Alive");
connect.header("Content-Type", "application/x-www-form-urlencoded");
connect.header("Accept-Encoding", "gzip, deflate, sdch");
connect.header("Accept", "*/*");
connect.header("User-Agent",Const.UserAgent);
ZipOutputStream out = null;
try {
// 得到Document對(duì)象
Document document = connect.ignoreContentType(true).timeout(5000).get();
// 查找所有img標(biāo)簽
Elements imgs = document.getElementsByTag("img");
File zipfile = new File(zfilepath);
out=new ZipOutputStream(new FileOutputStream(zipfile));
int i=1;
Listlistimg = new ArrayList();
for (Element element : imgs) {
//獲取每個(gè)img標(biāo)簽URL "abs:"表示絕對(duì)路徑
String imgSrc = element.attr("abs:"+pp);
listimg.add(imgSrc);
}
listimg = removeCf(listimg);
if(listimg!=null && listimg.size()>0){
for(int x=0;x<listimg.size();x++){< p="">
long stime = System.currentTimeMillis();
String imgSrc =listimg.get(x);
// 打印URL
System.out.println(imgSrc);
//下載圖片到本地
boolean is = downImages(imgSrc,out);
long etime = System.currentTimeMillis();
float alltime = (float)(etime - stime)/1000;
Map<string,string> rest = new HashMap<string,string>();
rest.put("img",imgSrc);
rest.put("time",(alltime)+"");
rest.put("num",i+"");
rest.put("status","true");
if(is){
rest.put("http","成功");
}else{
rest.put("http","失敗");
}
i++;
}
Map<string,string> rest1 = new HashMap<string,string>();
rest1.put("status","true");
rest1.put("msg","打包完成");
System.out.println("下載完成");
isb =true;
}else{
Map<string,string> rest1 = new HashMap<string,string>();
rest1.put("status","true");
rest1.put("msg","未抓取到數(shù)據(jù),有可能反爬蟲了");
client.sendEvent("chatevent", rest1);
}
} catch (IOException e) {
e.printStackTrace();
Map<string,string> rest = new HashMap<string,string>();
rest.put("status","false");
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
try {
if(out!=null){
out.close();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return isb;
}
/**
* 下載圖片到指定目錄
*
* @param filePath 文件路徑
* @param imgUrl 圖片URL
*/
public static boolean downImages(/*String filePath,*/ String imgUrl,ZipOutputStream outStream) {
boolean is = false;
// 若指定文件夾沒有,則先創(chuàng)建
/* File dir = new File(filePath);
if (!dir.exists()) {
dir.mkdirs();
}*/
// 截取圖片文件名
String fileName = imgUrl.substring(imgUrl.lastIndexOf('/') + 1, imgUrl.length());
try {
// 文件名里面可能有中文或者空格,所以這里要進(jìn)行處理。但空格又會(huì)被URLEncoder轉(zhuǎn)義為加號(hào)
String urlTail = URLEncoder.encode(fileName, "UTF-8");
// 因此要將加號(hào)轉(zhuǎn)化為UTF-8格式的%20
imgUrl = imgUrl.substring(0, imgUrl.lastIndexOf('/') + 1) + urlTail.replaceAll("\+", "\%20");
/**
* 驗(yàn)證圖片格式保證獲取動(dòng)態(tài)圖片
*/
fileName = vidImg(fileName);
if(fileName.equals("")){
return is;
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
// 寫出的路徑
InputStream in = null;
try {
// 獲取圖片URL
URL url = new URL(imgUrl);
// 獲得連接
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent",Const.UserAgent);
// 設(shè)置10秒的相應(yīng)時(shí)間
connection.setConnectTimeout(10 * 1000);
// 獲得輸入流
in = connection.getInputStream();
byte[] data=readInputStream(in);
outStream.putNextEntry(new ZipEntry(fileName));
outStream.write(data);
is = true;
return is;
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
try {
outStream.closeEntry();
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return is;
}
/**
* 去除重復(fù)的圖片
* @param list
* @return
*/
public static ListremoveCf(Listlist){
ListlistTemp = new ArrayList();
for(int i=0;i<list.size();i++){< p="">
if(!listTemp.contains(list.get(i))){
listTemp.add(list.get(i));
}
}
return listTemp;
}
喜歡的記得收藏哦
這個(gè)工具我已經(jīng)發(fā)布了,地址就是:http://www.yzcopen.com/img/imgdown
申請(qǐng)創(chuàng)業(yè)報(bào)道,分享創(chuàng)業(yè)好點(diǎn)子。點(diǎn)擊此處,共同探討創(chuàng)業(yè)新機(jī)遇!