精华内容
下载资源
问答
  • pdf转图片

    2020-12-22 15:54:36
    pdf转图片

    最近,一直在研究PDF转图片,了解到流行的大约有4种方案,三叔实验了其中3种,现和大家分享。

    背景

    PDF一般有两种格式,一种是扫描版,鼠标点不进去,一种是可编辑版,鼠标能够点进去复制里面文字,我手头上有2000+个PDF文件,大约有700个是扫描版,1300+个是可编辑版,为了后续工作,我需要将其统一转为图片,然后再OCR工作进行光学文字识别提取。

    设计思路

    • 访问PDF文件所在路径,找出所有PDF文件名形成一个PDF文件名列表;
    • 对PDF文件名列表循环,调用pdf2pic函数,将PDF分割成一张一张图片,存入临时文件夹下,并记录图片数目;
    • 利用PIL模块将同一个PDF产生的图片拼接起来形成一张长图;

    fitz方案

    # -*- coding: utf-8 -*-
    """
    Project_name:pdf2pic
    Description:
    Created on Tue Dec  8 08:59:21 2020
    @author: 帅帅de三叔
    """
    import os
    import os.path
    import fitz
    from PIL import Image
    
    pdfpath = r"D:\项目\pdf提取信息\pdf转图片" #原pdf文件路径
    temp_imagepath = r"D:\项目\pdf提取信息\pdf转图片\临时图片" #用来存放临时图片路径
    imagepath = r"D:\项目\pdf提取信息\pdf转图片\转化后的图片" #用来存放转化后的图片路径
    
    def mergePic(m, temp_imagepath): #合并分割后的png图片形成一张长图
        img_list = [] #用来存放png图片名称
        for parent, dirname, filenames in os.walk(temp_imagepath):
           for filename in filenames:
               if ".png" in filename:
                   img_list.append(filename) 
        print(img_list[0:m])
    
        if img_list:
            img_name = img_list[0]
            color_mod = 'RGBA' if img_name.endswith('.png') else 'RGB'  # jpeg格式不支持RGBA
            first_img = Image.open(temp_imagepath+os.sep+img_list[0])
            height_size = first_img.size[1]
            total_width = first_img.size[0]
            total_height = height_size * m
            left = 0
            right = height_size
            target = Image.new(color_mod, (total_width, total_height))  # 最终拼接的图像的大小
            for img in img_list[0:m]:
                target.paste(Image.open(temp_imagepath+os.sep+img), (0, left, total_width, right))
                left += height_size
                right += height_size
            target.save(imagepath + os.sep + pdfname[:-4] + '_fitz.png', quality=100)
            return img_name
    
    def pdf2pic(): #将pdf一页一页切割转为一页一页的png图片
        pdf = fitz.open(pdfpath+os.sep+pdfname) #打开pdf文件
        for pg in range(0, pdf.pageCount):  
            page = pdf[pg] # 获得每一页的对象
            trans = fitz.Matrix(3.0, 3.0).preRotate(0)
            pm = page.getPixmap(matrix=trans, alpha=False) # 获得每一页的流对象
            pm.writePNG(temp_imagepath + os.sep + '{:0>3d}.png'.format(pg + 1))  # 保存到临时图片文件夹下
        pagecount = pdf.pageCount #pdf总页数
        pdf.close() #关闭pdf文件
        return pagecount
    
    if __name__=="__main__":
        pdfnames = [] # 用来存放pdf源文件名称
        for parent, dirname, filenames in os.walk(pdfpath):
           for filename in filenames:
               if ".pdf" in filename:
                   pdfnames.append(filename)
        for idx, pdfname in enumerate(pdfnames): 
            print("正在处理第 %d(5)  张名为 %s 文件"%(idx, pdfname))
            pagecount = pdf2pic()
            mergePic(pagecount, temp_imagepath)
    

    pdf2image

    # -*- coding: utf-8 -*-
    """
    Project_name:pdf2image
    Description: 利用pdf2image库转pdf为图片
    Created on Tue Dec  8 13:19:09 2020
    @author: 帅帅de三叔
    """
    import os
    import os.path
    from PIL import Image
    from pdf2image import convert_from_path
    pdfpath = r"D:\项目\pdf提取信息\pdf转图片" #原pdf文件路径
    temp_imagepath = r"D:\项目\pdf提取信息\pdf转图片\临时图片" #用来存放临时图片路径
    imagepath = r"D:\项目\pdf提取信息\pdf转图片\转化后的图片" #用来存放转化后的图片路径
    
    
    def pdf2image(): #将pdf一页一页切割转为一页一页的png图片
        images = convert_from_path(pdfpath+os.sep+pdfname, dpi = 300)
        for i, image in enumerate(images):
            image.save(temp_imagepath+os.sep+'{:0>3d}.png'.format(i+1), "PNG")   
        pagecount = len(images) #pdf总页数
        return pagecount
    
    def mergePic(m, temp_imagepath): #合并分割后的png图片形成一张长图
        img_list = [] #用来存放png图片名称
        for parent, dirname, filenames in os.walk(temp_imagepath):
           for filename in filenames:
               if ".png" in filename:
                   img_list.append(filename) 
        print(img_list[0:m])
    
        if img_list:
            img_name = img_list[0]
            color_mod = 'RGBA' if img_name.endswith('.png') else 'RGB'  # jpeg格式不支持RGBA
            first_img = Image.open(temp_imagepath+os.sep+img_list[0])
            height_size = first_img.size[1]
            total_width = first_img.size[0]
            total_height = height_size * m
            left = 0
            right = height_size
            target = Image.new(color_mod, (total_width, total_height))  # 最终拼接的图像的大小
            for img in img_list[0:m]:
                target.paste(Image.open(temp_imagepath+os.sep+img), (0, left, total_width, right))
                left += height_size
                right += height_size
            target.save(imagepath + os.sep + pdfname[:-4] + '_images.png', quality=100)
            return img_name
    
    
    pdfnames = [] # 用来存放pdf源文件名称
    for parent, dirname, filenames in os.walk(pdfpath):
       for filename in filenames:
           if ".pdf" in filename:
               pdfnames.append(filename)
                   
    for idx, pdfname in enumerate(pdfnames): 
        print("正在处理第 %d(5)  张 %s 文件"%(idx, pdfname))
        pagecount = pdf2image()
        mergePic(pagecount, temp_imagepath)
        
    

    wand方案

    # -*- coding: utf-8 -*-
    """
    Project_name:pdf2imageghosts
    Description: wind方法将pdf转图片
    Created on Tue Dec  8 17:16:00 2020
    @author: 帅帅de三叔
    """
    import os
    import os.path
    from PIL import Image as PILImage
    from wand.image import Image
    pdfpath = r"D:\项目\pdf提取信息\pdf转图片" #原pdf文件路径
    temp_imagepath = r"D:\项目\pdf提取信息\pdf转图片\临时图片" #用来存放临时图片路径
    if not os.path.exists(temp_imagepath):
        os.mkdir(temp_imagepath)
    imagepath = r"D:\项目\pdf提取信息\pdf转图片\转化后的图片" #用来存放转化后的图片路径
    if not os.path.exists(temp_imagepath):
        os.mkdir(temp_imagepath)
    
    
    
    def mergePic(m, temp_imagepath): #合并分割后的png图片形成一张长图
        img_list = [] #用来存放png图片名称
        for parent, dirname, filenames in os.walk(temp_imagepath):
           for filename in filenames:
               if ".jpeg" in filename:
                   img_list.append(filename) 
        print(img_list[0:m])
    
        if img_list:
            img_name = img_list[0]
            color_mod = 'RGBA' if img_name.endswith('.png') else 'RGB'  # jpeg格式不支持RGBA
            first_img = PILImage.open(temp_imagepath+os.sep+img_list[0])
            height_size = first_img.size[1]
            total_width = first_img.size[0]
            total_height = height_size * m
            left = 0
            right = height_size
            target = PILImage.new(color_mod, (total_width, total_height))  # 最终拼接的图像的大小
            for img in img_list[0:m]:
                target.paste(PILImage.open(temp_imagepath+os.sep+img), (0, left, total_width, right))
                left += height_size
                right += height_size
            target.save(imagepath + os.sep + pdfname[:-4] + '_winds.png', quality=100)
            return img_name
    
    def wind_imagemagick_ghostscript(pdf_path, imgs_dir):
        # 将pdf文件转为jpg图片文件
        # ./PDF_FILE_NAME 为pdf文件路径和名称
        # image_pdf = Image(filename='./demo1.pdf', resolution=300)
        image_pdf = Image(filename=pdf_path, resolution =300)
        image_jpeg = image_pdf.convert('png')
    
        # wand已经将PDF中所有的独立页面都转成了独立的二进制图像对象。我们可以遍历这个大对象,并把它们加入到req_image序列中去。
        req_image = []
        for img in image_jpeg.sequence:
            img_page = Image(image=img)
            req_image.append(img_page.make_blob('jpeg'))
    
        # 遍历req_image,保存为图片文件
        i = 0
        for img in req_image:
            ff = open(imgs_dir + '\\' + str(i) + '.jpeg', 'wb')
            ff.write(img)
            ff.close()
            i += 1
        #print(len(req_image))
        return len(req_image)
    
    
    if __name__ == '__main__':
        pdfnames = [] # 用来存放pdf源文件名称
        for parent, dirname, filenames in os.walk(pdfpath):
           for filename in filenames:
               if ".pdf" in filename:
                   pdfnames.append(filename)
                   
        for idx, pdfname in enumerate(pdfnames): 
            print("正在处理第 %d(5)  张 %s 文件"%(idx, pdfname))
            pdf_path = pdfpath + os.sep +pdfname
            print(pdf_path)
            imgs_dir = temp_imagepath
            req_image = wind_imagemagick_ghostscript(pdf_path, imgs_dir)
            mergePic(req_image, temp_imagepath)
    
    

    结论

    fitz方案简单高效,pdf2image最后出来的图片灰度有点大,wand方案出来的图片对比度比较高,适合这种文书类的,但是wand方案前期准备工作略微复杂,建议使用fitz方案;

    参考文献
    1,https://pypi.org/project/pdf2image/;https://www.cnblogs.com/justaman/p/12213353.html

    2,https://blog.csdn.net/weixin_42081389/article/details/103712181?utm_medium=distribute.pc_relevant_download.none-task-blog-BlogCommendFromBaidu-6.nonecase&depth_1-utm_source=distribute.pc_relevant_download.none-task-blog-BlogCommendFromBaidu-6.nonecas
    在这里插入图片描述

    展开全文
  • PDF转图片

    2017-08-07 11:39:40
    PDF转图片

    PDF转图片

    需求:电子保单系统返回pdf下载文件的url。访问url,将返回的pdf文件流转换成图片保存在本地

    需要引入jar包 :pdfbox-2.0.0.jar, fontbox-2.0.0.jar

    package test;
    
    
    import java.awt.image.BufferedImage;
    import java.io.File;
    import java.io.IOException;
    import java.io.InputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    
    import javax.imageio.ImageIO;
    
    
    import org.apache.commons.lang.StringUtils;
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.rendering.PDFRenderer;
    
    
    import bjca.org.apache.log4j.Logger;
    
    
    /**
     * pdf文件转换成png图片
     * @author lijc
     *
     */
    public class Pdf2Pic {
    	Logger log = Logger.getLogger(getClass());
    
    
    	/**
    	 * 获取url返回的pdf,转换成图片保存到本地
    	 * 
    	 * @param url 
    	 * @param plyseq 
    	 * @return
    	 */
    	private void pdfToPng(String urlStr, String plyseq) {
    		// File file = new File("D:\\test.pdf");
    		// 暂时测试使用的url 后改成库中取得
    		// String urlStr =
    		// "http://58.251.33.182:18080/elec/netSaleQueryElecPlyServlet?c_ply_no=1M1084920171004735&idCard=411722197202132411";
    		PDDocument pdDocument = null;
    		if (StringUtils.isNotEmpty(urlStr)) {
    			try {
    				// 获取电子保单url文件输入流
    				InputStream input = getPdfInputStream(urlStr);
    				// 加载pdf文件
    				pdDocument = PDDocument.load(input);
    				PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
    				int pageCount = pdDocument.getNumberOfPages();
    				String pdfImgPath = "webapps" + File.separator + "pdfImg" + File.separator + plyseq + ".png";
    				for (int i = 0; i < pageCount; i++) {
    					BufferedImage image = pdfRenderer.renderImageWithDPI(i, 296);
    					ImageIO.write(image, "PNG", new File(pdfImgPath));
    				}
    				input.close();
    				log.info("流水号:" + plyseq + " pdf文件转换png图片成功!");
    			} catch (IOException ex) {
    				ex.printStackTrace();
    				log.error("pdf文件转换png图片失败!");
    			} finally {
    				if (pdDocument != null) {
    					try {
    						pdDocument.close();
    					} catch (IOException ex) {
    						ex.printStackTrace();
    					}
    				}
    			}
    		} else {
    			log.error("流水号:" + plyseq + "电子保单url地址为空!");
    		}
    
    
    	}
    
    
    	/**
    	 * 获取pdf文件 inputstream流 by 电子保单url
    	 * 
    	 * @param url
    	 * @return
    	 */
    	private static InputStream getPdfInputStream(String urlStr) {
    		URL url = null;
    		try {
    			url = new URL(urlStr);
    			HttpURLConnection httpConnect = (HttpURLConnection) url.openConnection();
    			// 设置连接超时时间
    			httpConnect.setConnectTimeout(30000);
    			// 设置读取数据超时时间
    			httpConnect.setReadTimeout(60000);
    			httpConnect.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
    			httpConnect.setRequestProperty("User-Agent",
    					"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36");
    			return httpConnect.getInputStream();
    		} catch (IOException ex) {
    			ex.printStackTrace();
    		}
    		return null;
    	}
    	
    	public static void main(String[] args) {
    		String plyseq = "123456";
    		String urlStr = "http://localhost:18080/elec/netSaleQueryElecPlyServlet?c_ply_no=1M1084920171004735&idCard=411722197202132411";
    		new Pdf2Pic().pdfToPng(urlStr, plyseq);
    	}
    }


    
    

    展开全文
  • pdf 图片

    2017-06-09 15:36:30
    pdf img  org.sejda sejda-icepdf 1.0.0.RELEASE org.icepdf.os icepdf-core 6.1.2 org.icepdf.os icepdf-viewer 6.1.2 org.apache.pdfbox fontbox 2.0.3 or

    pdf 转图片

    借鉴了别人的实现,但是,原文没有贴出import代码,导致使用的人,无法很清晰的知道需要导入的包。


    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.rendering.PDFRenderer;
    import org.icepdf.core.exceptions.PDFException;
    import org.icepdf.core.exceptions.PDFSecurityException;
    import org.icepdf.core.pobjects.Document;
    import org.icepdf.core.util.GraphicsRenderingHints;
    
    import javax.imageio.ImageIO;
    import java.awt.image.BufferedImage;
    import java.awt.image.RenderedImage;
    import java.io.File;
    import java.io.IOException;
    
    /**
     * Created by Cheng Jinquan on 2017/6/9.
     */
    public class Pdf2Img {
    
    
        public static void main(String[] args) {
            new Thread(
                    new Runnable() {
                        @Override
                        public void run() {
                            Pdf2Img.pdf2Img_icepdf("E:\\test\\pdf\\test.pdf", "E:\\test\\pdf\\icepdf");
                        }
                    }
            ).start();
    
            new Thread(
                    new Runnable() {
                        @Override
                        public void run() {
                            Pdf2Img.pdf2Img_pdfbox("E:\\test\\pdf\\test.pdf", "E:\\test\\pdf\\pdfbox");
                        }
                    }
            ).start();
    
            try {
                Thread.sleep(120000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    
    
        public static void pdf2Img_icepdf(String pdfPath, String imgPath) {
            System.out.println("icepdf start...");
            long time = System.currentTimeMillis();
            Document document = new Document();
            try {
                document.setFile(pdfPath);
            } catch (PDFException e) {
                e.printStackTrace();
            } catch (PDFSecurityException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
            float scale = 2.5f;//缩放比例
            float rotation = 0f;//旋转角度
            long timeTmp = System.currentTimeMillis();
            for (int i = 0; i < document.getNumberOfPages(); i++) {
                BufferedImage image = (BufferedImage)document.getPageImage(i, GraphicsRenderingHints.SCREEN, org.icepdf.core.pobjects.Page.BOUNDARY_CROPBOX, rotation, scale);
                RenderedImage rendImage = image;
                try {
                    File file = new File(imgPath+"\\iecPDF_" + i + ".png");
                    ImageIO.write(rendImage, "png", file);
                } catch (IOException e) {
                    e.printStackTrace();
                }
                image.flush();
                long time2 = System.currentTimeMillis();
                System.out.println("icepdf "+(time2-timeTmp));
                timeTmp = time2;
            }
            document.dispose();
            System.out.println("icepdf over..."+(System.currentTimeMillis()-time));
        }
    
        /**
         * rederImageWithDPI的第二个参数为dpi分辨率单位,可根据需求调节大小,代码第八行提供了架包里另一种转图片的方法,第二个参数为缩放比。
         * @param pdfPath
         * @param imgPath
         */
        public static void pdf2Img_pdfbox(String pdfPath, String imgPath) {
            System.out.println("icepdf start...");
            long time = System.currentTimeMillis();
            File file = new File(pdfPath);
            try {
                PDDocument doc = PDDocument.load(file);
                PDFRenderer renderer = new PDFRenderer(doc);
                int pageCount = doc.getNumberOfPages();
                long timeTmp = System.currentTimeMillis();
                for (int i = 0; i < pageCount; i++) {
                    BufferedImage image = renderer.renderImageWithDPI(i, 144);
                    //BufferedImage image = renderer.renderImage(i, 2.5f);
                    ImageIO.write(image, "PNG", new File(imgPath+"\\pdfbox_image"+i+".png"));
                    long time2 = System.currentTimeMillis();
                    System.out.println("icepdf " + (time2 - timeTmp));
                    timeTmp = time2;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
            System.out.println("icepdf over..."+(System.currentTimeMillis()-time));
        }
    
        public void pdf2Img_jpedal() {
            /*PdfDecoder decode_pdf = new PdfDecoder(true);
            try {
                decode_pdf.openPdfFile("c:\\test.pdf"); //file
    //       decode_pdf.openPdfFile("C:/jpedalPDF.pdf", "password"); //encrypted file
    //      decode_pdf.openPdfArray(bytes); //bytes is byte[] array with PDF
    //      decode_pdf.openPdfFileFromURL("http://www.mysite.com/jpedalPDF.pdf",false);
    //      decode_pdf.openPdfFileFromInputStream(in, false);
    
                int start = 1, end = decode_pdf.getPageCount();
                for (int i = start; i < end + 1; i++) {
                    BufferedImage img = decode_pdf.getPageAsImage(i);
                    try {
                        ImageIO.write(img, "png", new File("C:\\jpedal_image.png"));
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                decode_pdf.closePdfFile();
            } catch (PdfException e) {
                e.printStackTrace();
            }*/
        }
    }
    



    <!-- icepdf start -->
    <dependency>
    	<groupId>org.sejda</groupId>
    	<artifactId>sejda-icepdf</artifactId>
    	<version>1.0.0.RELEASE</version>
    </dependency>
    <dependency>
    	<groupId>org.icepdf.os</groupId>
    	<artifactId>icepdf-core</artifactId>
    	<version>6.1.2</version>
    </dependency>
    <dependency>
    	<groupId>org.icepdf.os</groupId>
    	<artifactId>icepdf-viewer</artifactId>
    	<version>6.1.2</version>
    </dependency>
    <!-- icepdf end -->
    <!-- pdfbox start -->
    <dependency>
    	<groupId>org.apache.pdfbox</groupId>
    	<artifactId>fontbox</artifactId>
    	<version>2.0.3</version>
    </dependency>
    <dependency>
    	<groupId>org.apache.pdfbox</groupId>
    	<artifactId>pdfbox</artifactId>
    	<version>2.0.3</version>
    </dependency>
    <!-- pdfbox end -->
    <!-- jpedal pdf start -->
    <dependency>
    	<groupId>org.jpedal</groupId>
    	<artifactId>OpenViewerFX</artifactId>
    	<version>7.2.30</version>
    </dependency>
    <!-- jpedal pdf end -->        


    借鉴链接   http://www.cnblogs.com/pcheng/p/5704470.html



    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.rendering.PDFRenderer;
    import org.icepdf.core.exceptions.PDFException;
    import org.icepdf.core.exceptions.PDFSecurityException;
    import org.icepdf.core.pobjects.Document;
    import org.icepdf.core.util.GraphicsRenderingHints;
    
    import javax.imageio.ImageIO;
    import java.awt.image.BufferedImage;
    import java.awt.image.RenderedImage;
    import java.io.File;
    import java.io.IOException;
    
    /**
     * Created by Cheng Jinquan on 2017/6/9.
     */
    public class Pdf2Img {
    
    
        public static void main(String[] args) {
            new Thread(
                    new Runnable() {
                        @Override
                        public void run() {
                            Pdf2Img.pdf2Img_icepdf("E:\\test\\pdf\\test.pdf", "E:\\test\\pdf\\icepdf");
                        }
                    }
            ).start();
    
            new Thread(
                    new Runnable() {
                        @Override
                        public void run() {
                            Pdf2Img.pdf2Img_pdfbox("E:\\test\\pdf\\test.pdf", "E:\\test\\pdf\\pdfbox");
                        }
                    }
            ).start();
    
            try {
                Thread.sleep(120000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    
    
        public static void pdf2Img_icepdf(String pdfPath, String imgPath) {
            System.out.println("icepdf start...");
            long time = System.currentTimeMillis();
            Document document = new Document();
            try {
                document.setFile(pdfPath);
            } catch (PDFException e) {
                e.printStackTrace();
            } catch (PDFSecurityException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
            float scale = 2.5f;//缩放比例
            float rotation = 0f;//旋转角度
            long timeTmp = System.currentTimeMillis();
            for (int i = 0; i < document.getNumberOfPages(); i++) {
                BufferedImage image = (BufferedImage)document.getPageImage(i, GraphicsRenderingHints.SCREEN, org.icepdf.core.pobjects.Page.BOUNDARY_CROPBOX, rotation, scale);
                RenderedImage rendImage = image;
                try {
                    File file = new File(imgPath+"\\iecPDF_" + i + ".jpg");
                    ImageIO.write(rendImage, "png", file);
                } catch (IOException e) {
                    e.printStackTrace();
                }
                image.flush();
                long time2 = System.currentTimeMillis();
                System.out.println("icepdf "+(time2-timeTmp));
                timeTmp = time2;
            }
            document.dispose();
            System.out.println("icepdf over..."+(System.currentTimeMillis()-time));
        }
    
        /**
         * rederImageWithDPI的第二个参数为dpi分辨率单位,可根据需求调节大小,代码第八行提供了架包里另一种转图片的方法,第二个参数为缩放比。
         * @param pdfPath
         * @param imgPath
         */
        public static void pdf2Img_pdfbox(String pdfPath, String imgPath) {
            System.out.println("icepdf start...");
            long time = System.currentTimeMillis();
            File file = new File(pdfPath);
            try {
                PDDocument doc = PDDocument.load(file);
                PDFRenderer renderer = new PDFRenderer(doc);
                int pageCount = doc.getNumberOfPages();
                long timeTmp = System.currentTimeMillis();
                for (int i = 0; i < pageCount; i++) {
                    BufferedImage image = renderer.renderImageWithDPI(i, 144);
                    //BufferedImage image = renderer.renderImage(i, 2.5f);
                    ImageIO.write(image, "PNG", new File(imgPath+"\\pdfbox_image"+i+".png"));
                    long time2 = System.currentTimeMillis();
                    System.out.println("icepdf " + (time2 - timeTmp));
                    timeTmp = time2;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
            System.out.println("icepdf over..."+(System.currentTimeMillis()-time));
        }
    
        public void pdf2Img_jpedal() {
            /*PdfDecoder decode_pdf = new PdfDecoder(true);
            try {
                decode_pdf.openPdfFile("c:\\test.pdf"); //file
    //       decode_pdf.openPdfFile("C:/jpedalPDF.pdf", "password"); //encrypted file
    //      decode_pdf.openPdfArray(bytes); //bytes is byte[] array with PDF
    //      decode_pdf.openPdfFileFromURL("http://www.mysite.com/jpedalPDF.pdf",false);
    //      decode_pdf.openPdfFileFromInputStream(in, false);
    
                int start = 1, end = decode_pdf.getPageCount();
                for (int i = start; i < end + 1; i++) {
                    BufferedImage img = decode_pdf.getPageAsImage(i);
                    try {
                        ImageIO.write(img, "png", new File("C:\\jpedal_image.png"));
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                decode_pdf.closePdfFile();
            } catch (PdfException e) {
                e.printStackTrace();
            }*/
        }
    }
    

    展开全文
  • 图片转pdf pdf 图片

    热门讨论 2014-03-04 09:20:04
    pdf 图片 图片转pdf 内置方法 ,可以读到 pdf 几页,可以 向 向pdf 写入数据
  • PDF转word文档 PDF转图片 ,word文档转PDF, 下载后免费试用
  • 1.Pdf转word、 2.PDF转图片(png,jpg,bmp) 3.Word转PDf、Word转Xps 4.Excel转Pdf
  • Pdf转图片

    千次阅读 2014-06-03 17:31:16
    由于项目需要将Pdf转化为jpg,所以花了一点时间研究了一下,主要采用一下两种

    由于项目需要将pdf转化为jpg,所以花了一点时间研究了一下,主要采用以下两种方式实现:

    1.ImageMagick

    ImageMagick 是一个用来创建、编辑、合成图片的软件。它可以读取、转换、写入多种格式的图片。图片切割、颜色替换、各种效果的应用,图片的旋转、组合,文本,直线, 多边形,椭圆,曲线,附加到图片伸展旋转。ImageMagick是免费软件:全部源码开放,可以自由使用,复制,修改,发布。最主要的是,其提供了.net平台的一个支持库:Magick.NET,这个库相对比较稳定,使用也很简单,转化的主要代码如下:

    public void TransformToJPG(string filePath,bool isThumbnails) {
    var fileInfo = new FileInfo(filePath);
    var dirInfo = fileInfo.Directory;
    var settings = new MagickReadSettings();
    if(isThumbnails){
    settings.Density = new MagickGeometry(15, 15);
    }else{
    settings.Density = new MagickGeometry(300, 300);
    }
    using (MagickImageCollection images = new MagickImageCollection()) {
    var fileName = "";
    images.Read(filePath,settings);
    int page = 1;
    foreach (MagickImage image in images) {
    if(isThumbnails){
    fileName = String.Format("_page-{0}.jpg",page.ToString().PadLeft(4,'0'));;
    }else{
    fileName = String.Format("page-{0}.jpg",page.ToString().PadLeft(4,'0'));
    }
    image.Write(fileName);
    ++page;

    }
    }

    其中,主要的几个参数说明如下:

    -trim:裁剪图像四周空白区域; 
    -transparent color:去除图像中指定的颜色; 
    -density geometry:设定图像的 DPI 值; 
    -antialias:让图像具有抗锯齿的效果; 
    -quality:图像压缩等级。

    2.Acrobat的一个接口

    这个方法需要安装Adobe Acrobat,从安装文件copy出Acrobat.dll就可以,这个方式的转化方式就我看来是最好的,无论从转化的效率还是从转化的图片效果来看,本人最终也是采用的这种方式,其实现主要代码如下:

    public static void ConvertPDF2Image(string pdfInputPath, string imageOutputPath,
    string imageName, int startPageNum, int endPageNum, ImageFormat imageFormat, double resolution)
    {
    Acrobat.CAcroPDDoc pdfDoc = null;
    Acrobat.CAcroPDPage pdfPage = null;
    Acrobat.CAcroRect pdfRect = null;
    Acrobat.CAcroPoint pdfPoint = null;
    // Create the document (Can only create the AcroExch.PDDoc object using late-binding)
    // Note using VisualBasic helper functions, have to add reference to DLL
    pdfDoc = (Acrobat.CAcroPDDoc)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.PDDoc", "");
    // validate parameter
    if (!pdfDoc.Open(pdfInputPath)) { throw new FileNotFoundException(); }
    if (!Directory.Exists(imageOutputPath)) { Directory.CreateDirectory(imageOutputPath); }
    if (startPageNum <= 0) { startPageNum = 1; }
    if (endPageNum > pdfDoc.GetNumPages() || endPageNum <= 0) {
    endPageNum = pdfDoc.GetNumPages(); 
    if (startPageNum > endPageNum) { 
    int tempPageNum = startPageNum; 
    startPageNum = endPageNum; endPageNum = startPageNum; 
    }
    if (imageFormat == null) { imageFormat = ImageFormat.Jpeg; }
    if (resolution <= 0) { resolution = 1; }
    // start to convert each page
    for (int i = startPageNum; i <= endPageNum; i++){
    pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(i - 1);
    pdfPoint = (Acrobat.CAcroPoint)pdfPage.GetSize();
    pdfRect = (Acrobat.CAcroRect)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.Rect", "");
    int imgWidth = (int)((double)pdfPoint.x * resolution);
    int imgHeight = (int)((double)pdfPoint.y * resolution);
    pdfRect.Left = 0;
    pdfRect.right = (short)imgWidth;
    pdfRect.Top = 0;
    pdfRect.bottom = (short)imgHeight;
    // Render to clipboard, scaled by 100 percent (ie. original size)
    // Even though we want a smaller image, better for us to scale in .NET
    // than Acrobat as it would greek out small text
    pdfPage.CopyToClipboard(pdfRect, 0, 0, (short)(100 * resolution));
    IDataObject clipboardData = Clipboard.GetDataObject();
    if (clipboardData.GetDataPresent(DataFormats.Bitmap)){
    Bitmap pdfBitmap = (Bitmap)clipboardData.GetData(DataFormats.Bitmap);
    pdfBitmap.Save(Path.Combine(imageOutputPath, imageName) + ".jpg", imageFormat);
    pdfBitmap.Dispose();
    }
    }
    pdfDoc.Close();

    Marshal.ReleaseComObject(pdfPage);
    Marshal.ReleaseComObject(pdfRect);
    Marshal.ReleaseComObject(pdfDoc);
    Marshal.ReleaseComObject(pdfPoint);
    }
    说明:如果是在新开的Thread中运行此代码,需要设置Thread的ApartmentState为ApartmentState.STA。


    展开全文
  • 利用pdfbox,jacob,实现。pdf转图片并合成一张图片输出,图片转pdf,word转html,word转pdf
  • word转pdf,pdf转图片所需jar包,导入项目中即可使用.word转pdf使用的是openoffice.pdf转图片使用的是icepdf.
  • PDF转图片,PDF转HTML

    2017-11-06 11:20:20
    1.pdf转图片常用的四种方式,包和代码全有(icepdf只有包),注释也很清晰, 2.是做个老项目时找来的,要对jdk1.5用,所以基本下载后都能用。 3.因为部署环境没在windows,所以最后没使用jacob。 4.代码最后是直接转...
  • 因为图片传输信息的直观性和方便性,所以现在很多的信息都是通过图片进行传递的,那么如果我们手上有一份需要以图片格式进行传输的PDF文件,那么我们应该怎样实现PDF转图片的问题呢?下面跟随小编的步伐一起来看一下...
  • PDF转图片怎么转?想必很多人在日常工作中,经常都会遇到PDF文件格式转换的事情,比如PDF转换word文档、PDF转换excel表格、PDF转换PPT文件等等。除了这些之外,最常见的就是PDF转图片了,那么你知道PDF转换图片的...
  • Android PDF转图片

    2018-09-06 12:32:44
    Android PDF转图片,PdfContext pdfContext =new PdfContext(); PdfDocument pdfDocument=(PdfDocument)pdfContext.openDocument(pdfPath);//path为要截图的pdf的路径,String类型 int pageCount = pdfDocument....
  • 为您提供PDF猫PDF转图片下载,PDF猫PDF转图片是一款在线PDF转图片软件,PDF猫PDF转图片不仅可以将PDF转换成jpg,还可以转换成png,bmp,支持一键批量转换,软件界面简洁美观、操作简单、是您办公时处理PDF文档不可...
  • pdf转图转jpg/png

    2019-04-26 09:46:38
    pdf转图,可转换jpg,png,bmp等文件
  • C#实现PDF转图片image

    2018-05-09 14:54:13
    C#实现PDF转图片imageC#实现PDF转图片imageC#实现PDF转图片image
  • 利用pdfbox实现图片转pdf 和pdf转图片功能,网络上有大部分源码,我只是个搬运工,如果你没积分请别懒,自行百度谷歌bing。图片可按比例转换成pdf。方法很简单只有几句代码没有写注释。
  • pdf转图片插件

    2018-09-17 10:25:07
    Adobe官方接口,可以实现PDF转图片。Acrobat.dll 的转换效率要比其他第三方DLL 快,稳定,不会出现中文乱码的情况,代码用csharp实现很简单,网上很多这里就不贴了
  • 但是需要编辑图片的时候,我们又要PDF转图片。那么问题来了,PDF转图片是个什么操作?很多人会说,直接截图不就好了。的确截图很方便快捷,但是这种方法可能会导致源图片尺寸改变或者色彩失真。那怎么做才能保证PDF...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 12,705
精华内容 5,082
关键字:

pdf转图