精华内容
下载资源
问答
  • 2007Excel转换XML: 直接将后缀xlsx 改为 zip, 打开zip即可 zip包中的xml文件解读:(1)对于docProps目录下这里core是文件的创建时间和修改时间,标题,主题和作者,app是文档的其他属性,文档类型,版本,是否只读...

    2007Excel转换XML: 直接将后缀xlsx 改为 zip, 打开zip即可

    zip包中的xml文件解读:

    (1)对于docProps目录下  这里core是文件的创建时间和修改时间,标题,主题和作者,app是文档的其他属性,文档类型,版本,是否只读,是否共享,安全属性等文档属性信息。

    (2)在xl目录下是文档的具体内容信息

    (3)workbook.xml文件包含一对标签,其中的每个元素都代表Excel 2007文件中的一个,工作表的名称就是其name属性的值,这里有三个sheet。

    (4)xl/_rels/workbook.xml.rels定义每个sheetid对应的sheet内容文件sheet1.xml,共享的单元格内容文件sharedstring.xml,样式文件style.xml是当前单元格的样式字体,颜色等样式的xml配置。Theme存放的是当前的设置导航栏的默认样式。这两个看看大概也就能明白。关键我们看看下面每个sheet的内容格式

    (5)打开一个sheet1.xml看看,

    row标签是表示每一行的数据,r表示第几行,其他几个都是这几行的样式

    c标签表示每个单元格的内容,这里A1 第一行的第一列,r表示位置,s表示这个单元格的样式,

    s=9对应style.xml的的index为9的样式即为这个单元格的样式,

    t=s表示这个单元格有值,里面的v标签即为值的id,id对应到sharedstring.xm里的id对应的值

    没有t属性,表示这个单元格没有值设置

    关于各种数据格式的处理:http://blog.csdn.net/l081307114/article/details/46009015

    大数据量导入:http://blog.csdn.net/lee_guang/article/details/8936178

    http://blog.csdn.net/wenyi_y/article/details/6180739

    http://blog.csdn.net/kingo0/article/details/43191407

    2003大数据量读取:http://www.iteye.com/topic/624967

    2007大数据量读取:http://www.iteye.com/topic/624969

    2003 + 2007 类型处理 http://blog.csdn.net/goodkuang2012/article/details/7350985

    大数据量导出:http://lyjilu.iteye.com/blog/2083106

    http://kimnote.com/2010/04/%E5%A4%A7%E6%95%B0%E6%8D%AE%E9%87%8F%E5%AF%BC%E5%87%BAexcel%E7%9A%84%E6%96%B9%E6%A1%88-java-%E5%A4%A7excel%E6%96%87%E4%BB%B6/

    xml方式:http://blog.csdn.net/u011728105/article/details/47147225

    POI事件处理模型:http://www.iteye.com/topic/1124534

    性能优化:https://www.cnblogs.com/ahhuiyang/p/3871918.html

    展开全文
  • 此方式是将excel数据读取以List<String[]> 形式存放,需要传入excel文件路径,sheetName,excel列数,其中excel列数需要与excel列数一致 package pdata; /* ======================================...

    此方式是将excel数据读取以List<String[]> 形式存放,需要传入excel文件路径,sheetName,excel列数,其中excel列数需要与excel列数一致

    package pdata;
    /* ==================================================================== 
       Licensed to the Apache Software Foundation (ASF) under one or more 
       contributor license agreements.  See the NOTICE file distributed with 
       this work for additional information regarding copyright ownership. 
       The ASF licenses this file to You under the Apache License, Version 2.0 
       (the "License"); you may not use this file except in compliance with 
       the License.  You may obtain a copy of the License at 
     
           http://www.apache.org/licenses/LICENSE-2.0 
     
       Unless required by applicable law or agreed to in writing, software 
       distributed under the License is distributed on an "AS IS" BASIS, 
       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
       See the License for the specific language governing permissions and 
       limitations under the License. 
    ==================================================================== */  
      
    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.PrintStream;
    import java.text.SimpleDateFormat;
    import java.util.ArrayList;
    import java.util.Date;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
     
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.parsers.SAXParser;
    import javax.xml.parsers.SAXParserFactory;
     
    import org.apache.poi.hssf.usermodel.HSSFCellStyle;
    import org.apache.poi.hssf.usermodel.HSSFDateUtil;
    import org.apache.poi.hssf.usermodel.HSSFPalette;
    import org.apache.poi.hssf.util.HSSFColor;
    import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
    import org.apache.poi.openxml4j.opc.OPCPackage;
    import org.apache.poi.openxml4j.opc.PackageAccess;
    import org.apache.poi.ss.usermodel.BuiltinFormats;
    import org.apache.poi.ss.usermodel.Cell;
    import org.apache.poi.ss.usermodel.CellStyle;
    import org.apache.poi.ss.usermodel.DataFormatter;
    import org.apache.poi.ss.usermodel.IndexedColors;
    import org.apache.poi.ss.usermodel.Row;
    import org.apache.poi.ss.usermodel.Sheet;
    import org.apache.poi.ss.usermodel.Workbook;
    import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
    import org.apache.poi.xssf.eventusermodel.XSSFReader;
    import org.apache.poi.xssf.model.StylesTable;
    import org.apache.poi.xssf.streaming.SXSSFWorkbook;
    import org.apache.poi.xssf.usermodel.XSSFCellStyle;
    import org.apache.poi.xssf.usermodel.XSSFColor;
    import org.apache.poi.xssf.usermodel.XSSFRichTextString;
    import org.apache.poi.xssf.usermodel.XSSFWorkbook;
    import org.xml.sax.Attributes;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    import org.xml.sax.XMLReader;
    import org.xml.sax.helpers.DefaultHandler;
      
    /** 
     * 使用CVS模式解决XLSX文件,可以有效解决用户模式内存溢出的问题 
     * 该模式是POI官方推荐的读取大数据的模式,在用户模式下,数据量较大、Sheet较多、或者是有很多无用的空行的情况 
     * ,容易出现内存溢出,用户模式读取Excel的典型代码如下: FileInputStream file=new 
     * FileInputStream("c:\\test.xlsx"); Workbook wb=new XSSFWorkbook(file); 
     *  
     *  
     * @author 山人 
     */  
    public class XLSXCovertCSVReader {  
      
        /** 
         * The type of the data value is indicated by an attribute on the cell. The 
         * value is usually in a "v" element within the cell. 
         */  
        enum xssfDataType {  
            BOOL, ERROR, FORMULA, INLINESTR, SSTINDEX, NUMBER,  
        }  
      
        /** 
         * 使用xssf_sax_API处理Excel,请参考: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api 
         * <p/> 
         * Also see Standard ECMA-376, 1st edition, part 4, pages 1928ff, at 
         * http://www.ecma-international.org/publications/standards/Ecma-376.htm 
         * <p/> 
         * A web-friendly version is http://openiso.org/Ecma/376/Part4 
         */  
        class MyXSSFSheetHandler extends DefaultHandler {  
      
            /** 
             * Table with styles 
             */  
            private StylesTable stylesTable;  
      
            /** 
             * Table with unique strings 
             */  
            private ReadOnlySharedStringsTable sharedStringsTable;  
      
            /** 
             * Destination for data 
             */  
            private final PrintStream output;  
      
            /** 
             * Number of columns to read starting with leftmost 
             */  
            private final int minColumnCount;  
      
            // Set when V start element is seen  
            private boolean vIsOpen;  
      
            // Set when cell start element is seen;  
            // used when cell close element is seen.  
            private xssfDataType nextDataType;  
      
            // Used to format numeric cell values.  
            private short formatIndex;  
            private String formatString;  
            private final DataFormatter formatter;  
      
            private int thisColumn = -1;  
            // The last column printed to the output stream  
            private int lastColumnNumber = -1;  
      
            // Gathers characters as they are seen.  
            private StringBuffer value;  
            private String[] record;  
            private List<String[]> rows = new ArrayList<String[]>();  
            private boolean isCellNull = false;  
      
            /** 
             * Accepts objects needed while parsing. 
             *  
             * @param styles 
             *            Table of styles 
             * @param strings 
             *            Table of shared strings 
             * @param cols 
             *            Minimum number of columns to show 
             * @param target 
             *            Sink for output 
             */  
            public MyXSSFSheetHandler(StylesTable styles,  
                    ReadOnlySharedStringsTable strings, int cols, PrintStream target) {  
                this.stylesTable = styles;  
                this.sharedStringsTable = strings;  
                this.minColumnCount = cols;  
                this.output = target;  
                this.value = new StringBuffer();  
                this.nextDataType = xssfDataType.NUMBER;  
                this.formatter = new DataFormatter();  
                record = new String[this.minColumnCount];  
                rows.clear();// 每次读取都清空行集合  
            }  
      
            /* 
             * (non-Javadoc) 
             *  
             * @see 
             * org.xml.sax.helpers.DefaultHandler#startElement(java.lang.String, 
             * java.lang.String, java.lang.String, org.xml.sax.Attributes) 
             */  
            public void startElement(String uri, String localName, String name,  
                    Attributes attributes) throws SAXException {  
      
                if ("inlineStr".equals(name) || "v".equals(name)) {  
                    vIsOpen = true;  
                    // Clear contents cache  
                    value.setLength(0);  
                }  
                // c => cell  
                else if ("c".equals(name)) {  
                    // Get the cell reference  
                    String r = attributes.getValue("r");  
                    int firstDigit = -1;  
                    for (int c = 0; c < r.length(); ++c) {  
                        if (Character.isDigit(r.charAt(c))) {  
                            firstDigit = c;  
                            break;  
                        }  
                    }  
                    thisColumn = nameToColumn(r.substring(0, firstDigit));  
      
                    // Set up defaults.  
                    this.nextDataType = xssfDataType.NUMBER;  
                    this.formatIndex = -1;  
                    this.formatString = null;  
                    String cellType = attributes.getValue("t");  
                    String cellStyleStr = attributes.getValue("s");  
                    if ("b".equals(cellType))  
                        nextDataType = xssfDataType.BOOL;  
                    else if ("e".equals(cellType))  
                        nextDataType = xssfDataType.ERROR;  
                    else if ("inlineStr".equals(cellType))  
                        nextDataType = xssfDataType.INLINESTR;  
                    else if ("s".equals(cellType))  
                        nextDataType = xssfDataType.SSTINDEX;  
                    else if ("str".equals(cellType))  
                        nextDataType = xssfDataType.FORMULA;  
                    else if (cellStyleStr != null) {  
                        // It's a number, but almost certainly one  
                        // with a special style or format  
                        int styleIndex = Integer.parseInt(cellStyleStr);  
                        XSSFCellStyle style = stylesTable.getStyleAt(styleIndex);  
                        this.formatIndex = style.getDataFormat();  
                        this.formatString = style.getDataFormatString();  
                        if (this.formatString == null)  
                            this.formatString = BuiltinFormats  
                                    .getBuiltinFormat(this.formatIndex);  
                    }  
                }  
      
            }  
      
            /* 
             * (non-Javadoc) 
             *  
             * @see org.xml.sax.helpers.DefaultHandler#endElement(java.lang.String, 
             * java.lang.String, java.lang.String) 
             */  
            public void endElement(String uri, String localName, String name)  
                    throws SAXException {  
      
                String thisStr = null;  
      
                // v => contents of a cell  
                if ("v".equals(name)) {  
                    // Process the value contents as required.  
                    // Do now, as characters() may be called more than once  
                    switch (nextDataType) {  
      
                    case BOOL:  
                        char first = value.charAt(0);  
                        thisStr = first == '0' ? "FALSE" : "TRUE";  
                        break;  
      
                    case ERROR:  
                        thisStr = "\"ERROR:" + value.toString() + '"';  
                        break;  
      
                    case FORMULA:  
                        // A formula could result in a string value,  
                        // so always add double-quote characters.  
                        //thisStr = '"' + value.toString() + '"';  
                        thisStr =value.toString();  
                        break;  
      
                    case INLINESTR:  
                        // TODO: have seen an example of this, so it's untested.  
                        XSSFRichTextString rtsi = new XSSFRichTextString(  
                                value.toString());  
                       // thisStr = '"' + rtsi.toString() + '"';  
                        thisStr = rtsi.toString() ;  
                        break;  
      
                    case SSTINDEX:  
                        String sstIndex = value.toString();  
                        try {  
                            int idx = Integer.parseInt(sstIndex);  
                            XSSFRichTextString rtss = new XSSFRichTextString(  
                                    sharedStringsTable.getEntryAt(idx));  
                            //thisStr = '"' + rtss.toString() + '"';  
                            thisStr =rtss.toString();  
                        } catch (NumberFormatException ex) {  
                            output.println("Failed to parse SST index '" + sstIndex  
                                    + "': " + ex.toString());  
                        }  
                        break;  
      
                    case NUMBER:  
                        String n = value.toString();  
                        // 判断是否是日期格式  
                        if (HSSFDateUtil.isADateFormat(this.formatIndex, n)) {  
                            Double d = Double.parseDouble(n);  
                            Date date=HSSFDateUtil.getJavaDate(d);  
                            thisStr=formateDateToString(date);  
                        } else if (this.formatString != null)  
                            thisStr = formatter.formatRawCellContents(  
                                    Double.parseDouble(n), this.formatIndex,  
                                    this.formatString);  
                        else  
                            thisStr = n;  
                        break;  
      
                    default:  
                        thisStr = "(TODO: Unexpected type: " + nextDataType + ")";  
                        break;  
                    }  
      
                    // Output after we've seen the string contents  
                    // Emit commas for any fields that were missing on this row  
                    if (lastColumnNumber == -1) {  
                        lastColumnNumber = 0;  
                    }  
                    //判断单元格的值是否为空  
                    if (thisStr == null || "".equals(isCellNull)) {  
                        isCellNull = true;// 设置单元格是否为空值  
                    }  
                    record[thisColumn] = thisStr;  
                    // Update column  
                    if (thisColumn > -1)  
                        lastColumnNumber = thisColumn;  
      
                } else if ("row".equals(name)) {  
      
                    // Print out any missing commas if needed  
                    if (minColumns > 0) {  
                        // Columns are 0 based  
                        if (lastColumnNumber == -1) {  
                            lastColumnNumber = 0;  
                        }  
    //                    if (isCellNull == false && record[0] != null  
    //                            && record[1] != null)// 判断是否空行  
                        if (isCellNull == false )// 判断是否空行  
                        {  
                            rows.add(record.clone());  
                            isCellNull = false;  
                            for (int i = 0; i < record.length; i++) {  
                                record[i] = null;  
                            }  
                        }  
                    }  
                    lastColumnNumber = -1;  
                }  
      
            }  
      
            public List<String[]> getRows() {  
                return rows;  
            }  
      
            public void setRows(List<String[]> rows) {  
                this.rows = rows;  
            }  
      
            /** 
             * Captures characters only if a suitable element is open. Originally 
             * was just "v"; extended for inlineStr also. 
             */  
            public void characters(char[] ch, int start, int length)  
                    throws SAXException {  
                if (vIsOpen)  
                    value.append(ch, start, length);  
            }  
      
            /** 
             * Converts an Excel column name like "C" to a zero-based index. 
             *  
             * @param name 
             * @return Index corresponding to the specified name 
             */  
            private int nameToColumn(String name) {  
                int column = -1;  
                for (int i = 0; i < name.length(); ++i) {  
                    int c = name.charAt(i);  
                    column = (column + 1) * 26 + c - 'A';  
                }  
                return column;  
            }  
      
            private String formateDateToString(Date date) {  
                SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");//格式化日期  
                return sdf.format(date);  
      
            }  
      
        }  
      
        // /  
      
        private OPCPackage xlsxPackage;  
        private int minColumns;  
        private PrintStream output;  
        private String sheetName;  
      
        /** 
         * Creates a new XLSX -> CSV converter 
         *  
         * @param pkg 
         *            The XLSX package to process 
         * @param output 
         *            The PrintStream to output the CSV to 
         * @param minColumns 
         *            The minimum number of columns to output, or -1 for no minimum 
         */  
        public XLSXCovertCSVReader(OPCPackage pkg, PrintStream output,  
                String sheetName, int minColumns) {  
            this.xlsxPackage = pkg;  
            this.output = output;  
            this.minColumns = minColumns;  
            this.sheetName = sheetName;  
        }  
      
        /** 
         * Parses and shows the content of one sheet using the specified styles and 
         * shared-strings tables. 
         *  
         * @param styles 
         * @param strings 
         * @param sheetInputStream 
         */  
        public List<String[]> processSheet(StylesTable styles,  
                ReadOnlySharedStringsTable strings, InputStream sheetInputStream)  
                throws IOException, ParserConfigurationException, SAXException {  
      
            InputSource sheetSource = new InputSource(sheetInputStream);  
            SAXParserFactory saxFactory = SAXParserFactory.newInstance();  
            SAXParser saxParser = saxFactory.newSAXParser();  
            XMLReader sheetParser = saxParser.getXMLReader();  
            MyXSSFSheetHandler handler = new MyXSSFSheetHandler(styles, strings,  
                    this.minColumns, this.output);  
            sheetParser.setContentHandler(handler);  
            sheetParser.parse(sheetSource);  
            return handler.getRows();  
        }  
      
        /** 
         * 初始化这个处理程序 将 
         *  
         * @throws IOException 
         * @throws OpenXML4JException 
         * @throws ParserConfigurationException 
         * @throws SAXException 
         */  
        public List<String[]> process() throws IOException, OpenXML4JException,  
                ParserConfigurationException, SAXException {  
      
            ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(  
                    this.xlsxPackage);  
            XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);  
            List<String[]> list = null;  
            StylesTable styles = xssfReader.getStylesTable();  
            XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader  
                    .getSheetsData();  
            int index = 0;  
            while (iter.hasNext()) {  
                InputStream stream = iter.next();  
                String sheetNameTemp = iter.getSheetName();  
                if (this.sheetName.equals(sheetNameTemp)) {  
                    list = processSheet(styles, strings, stream);  
                    stream.close();  
                    ++index;  
                }  
            }  
            return list;  
        }  
      
        /** 
         * 读取Excel 
         *  
         * @param path 
         *            文件路径 
         * @param sheetName 
         *            sheet名称 
         * @param minColumns 
         *            列总数 
         * @return 
         * @throws SAXException 
         * @throws ParserConfigurationException 
         * @throws OpenXML4JException 
         * @throws IOException 
         */  
        private static List<String[]> readerExcel(String path, String sheetName,  
                int minColumns) throws IOException, OpenXML4JException,  
                ParserConfigurationException, SAXException {  
            OPCPackage p = OPCPackage.open(path, PackageAccess.READ);  
            XLSXCovertCSVReader xlsx2csv = new XLSXCovertCSVReader(p, System.out,  
                    sheetName, minColumns);  
            List<String[]> list = xlsx2csv.process();  
            p.close();  
            return list;  
        }  
        
        public static void main(String[] args) throws Exception {  
        	System.out.println("开始导入数据...");
            List<String[]> list = XLSXCovertCSVReader  
                    .readerExcel(  
                            "D:\\book.xlsx",  
                            "Sheet1", 10);  
            Map map1=new HashMap();
            Map map2=new HashMap();
            for(int i=0;i<list.size();i++){
            	String[] record=list.get(i);
            	if(record[9]!=null&&record[9].equals("已有")){
            		map1.put(record[2], record[2]);
            	}else{
            		map2.put(record[2], record[2]);
            	}
            }
            System.out.println("数据书目数据为"+list.size()+"条");
            System.out.println("查出书目数据为"+map1.size()+"条");
            System.out.println("已有书目数据为"+map2.size()+"条");
            System.out.println("导入数据成功,开始分析数据...");
            boolean bool=createExcelFile(list,map1,map2);
            if(bool){
            	System.out.println("执行成功!");
            }
        }
        
        static String newExcelPath=("d:\\book2.xlsx");
        public static boolean createExcelFile(List list ,Map map1,Map map2) {
        	
        	         boolean isCreateSuccess = false;
        	         SXSSFWorkbook  workbook = null;
        	         try {
        	             // XSSFWork used for .xslx (>= 2007), HSSWorkbook for 03 .xsl
        	             workbook = new  SXSSFWorkbook(500);//HSSFWorkbook();//WorkbookFactory.create(inputStream);
        	         }catch(Exception e) {
        	             System.out.println("It cause Error on CREATING excel workbook: ");
        	             e.printStackTrace();
        	         }
        	         if(workbook != null) {
        	        	 CellStyle  style =workbook.createCellStyle() ;
        	        	 style.setFillForegroundColor(IndexedColors.LIGHT_GREEN.getIndex());
        	        	 HSSFPalette customPalette = workbook.getCustomPalette();  
        	        	// HSSFColor newColor = customPalette.addColor((byte) 153, (byte) 204, (byte) 255);
        	        	 style.setFillPattern(CellStyle.SOLID_FOREGROUND);
        	             Sheet sheet = workbook.createSheet("testdata");
        	             System.out.println(sheet.getColumnWidth(0));
        	             for (int rowNum = 0; rowNum < list.size(); rowNum++) {
        	                 Row row = sheet.createRow(rowNum);
        	                 String[] record=(String[])list.get(rowNum);
        	                 for(int i=0;i<record.length;i++){
        	                	  sheet.setColumnWidth(i, 3000);
        	                	  sheet.setColumnWidth(2, 12000);
        	                	  Cell cell = row.createCell(i, Cell.CELL_TYPE_STRING);
        	                	  cell.setCellValue(record[i]);
        	                	  if(record[9]!=null&&record[9].equals("已有")){
        	                		  if(map2.get(record[2])!=null){
        	                			  row.createCell(10, Cell.CELL_TYPE_STRING).setCellValue("重复");
        	                		  }
        	                	  }else{
        	                		  if(map1.get(record[2])!=null){
        	                			  row.createCell(10, Cell.CELL_TYPE_STRING).setCellValue("重复");
        	                		  }
        	                		  cell.setCellStyle(style);
        	                	  }
        	                 }
        	             }
        	             System.out.println("分析数据完成,开始写入文件...");
        	             try {
        	                 FileOutputStream outputStream = new FileOutputStream(newExcelPath);
        	                 workbook.write(outputStream);
        	                 outputStream.flush();
        	                 outputStream.close();
        	                 isCreateSuccess = true;
        	             } catch (Exception e) {
        	                 System.out.println("It cause Error on WRITTING excel workbook: ");
        	                 e.printStackTrace();
        	             }
        	         }
        	         File sss = new File(newExcelPath);
        	         System.out.println(sss.getAbsolutePath());
        	         return isCreateSuccess;
        	     }
    }

    转自:https://blog.csdn.net/xuyelong1/article/details/46928627
     

    展开全文
  • java读取数据量大的excel文件(包括2003版及2007版)
  • JAVA 读取大量文件

    2015-06-26 11:03:19
    需要写一个程序,对某个目录下7000多个文本文件读取,整个目录大小为100多G,文本文件一行为一条数据。 要求尽快读取数据。   主机配置:24颗逻辑CPU,40G内存   初步设计是:采用24个线程的固定线程池, ...

    需要写一个程序,对某个目录下7000多个文本文件读取,整个目录大小为100多G,文本文件一行为一条数据。 要求尽快读取完数据。

     

    主机配置:24颗逻辑CPU,40G内存

     

    初步设计是:采用24个线程的固定线程池,

                 目录读取任务:1个,负责目录内文件名的读取,读取后交给文件读取线程

                 文件读取任务: 11个,负责读取文件内容,即同时读取11个文件,每个文件读取出一条时,交给行处理线程。

                 行处理任务:12个,仅做输出日志(每处理1000行输出一条日志)。

                                       每一行数据生成一个Runable,丢线线程池。 

     

              每一个任务对应一个Runnabe,存入线程池,  由于行数据太多,生成的Runnabe在线程池队列中过多,从而产生抢占其它线程,即行处理线程数可能会多于12个,其它线程没有时间片可用,所以限制了任务数量,即在线程池中: 

               运行中+队列中:目录读取任务 只有1个。(总共也只有一个)

                                        文件读取任务 只有11个。(总共==文件数)

                                        行处理任务只有12个。(总共==所有文件行数

             程序中控制只有任务数小于应有任务数时,才调用线程池execute(新任务),否则阻塞。 

     

         问题1:

              测试时发现,内存一直增加,越来越高。 

              分析认为: 由于行处理任务,每处理1000行输出一条日志,运行时间太短,基本上0ms,任务退出行,当前线程发现队列中无可用任务,阻塞,当前行处理任务小于6,则添加任务线程变为运行状态,添加行处理任务到队列,线程池中阻塞的线程运行,0ms后再次阻塞。 

     

     

             简单说:就是由于任务运行时间太短,任务量太多,线程不断的在 阻塞-->运行-->阻塞-->运行状态间切换,造成内存消耗过多。 

     

           改造方案:           

                                         目录读取任务 只有1个。(总共也只有一个)

                                        文件读取任务 只有11个。(共享一个文件队列,处理完当前文件后,从队列中取出下一个文件处理,即线程不阻塞,在持续的处理)

                                        行处理任务只有12个。(共享一个行数据队列,处理完当前行后,从队列中取出下一个行数据,即线程不阻塞,在持续的处理

     

          结果:内存到达一个值后不再增加。

     

          

     

        问题2:

                发现程序刚开始时运行很快,后来,越来越慢

                

                用top命令(top后按1)查看时各个cpu 运行情况:

           Tasks: 469 total,   2 running, 466 sleeping,   0 stopped,   1 zombie

    Cpu0  : 39.4%us,  0.7%sy,  0.0%ni,  0.0%id, 59.9%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu1  :  2.7%us,  0.3%sy,  0.0%ni, 54.2%id, 42.9%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu2  :  1.3%us,  0.3%sy,  0.0%ni, 87.7%id, 10.6%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu3  :  0.7%us,  0.0%sy,  0.0%ni, 95.7%id,  3.7%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu4  :  0.3%us,  0.0%sy,  0.0%ni, 97.4%id,  2.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu5  :  0.7%us,  0.0%sy,  0.0%ni, 94.4%id,  5.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu6  :  1.3%us,  0.3%sy,  0.0%ni, 84.4%id, 13.9%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu7  :  1.3%us,  0.0%sy,  0.0%ni, 90.0%id,  8.6%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu8  :  0.3%us,  0.3%sy,  0.0%ni, 97.0%id,  2.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu9  :  0.7%us,  0.3%sy,  0.0%ni, 93.7%id,  5.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu10 :  0.3%us,  0.0%sy,  0.0%ni, 97.4%id,  2.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu11 : 34.0%us,  0.7%sy,  0.0%ni,  1.0%id, 64.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu12 :  8.9%us,  0.3%sy,  0.0%ni, 24.2%id, 66.6%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu13 :  0.7%us,  0.3%sy,  0.0%ni, 96.7%id,  2.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu14 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu15 :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu16 :  1.3%us,  0.3%sy,  0.0%ni, 89.4%id,  8.6%wa,  0.0%hi,  0.3%si,  0.0%st

    Cpu17 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu18 :  1.0%us,  0.7%sy,  0.0%ni, 95.0%id,  3.3%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu19 :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu20 :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu21 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu22 :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

    Cpu23 :  7.3%us,  0.3%sy,  0.0%ni,  0.0%id, 92.4%wa,  0.0%hi,  0.0%si,  0.0%st

     

    发现cpu等待IO时间过长,

    找到运行的java进程,用jstack命令查看:

    jstack 进程号, 发现有11个线程都在文件读取状态:

      java.lang.Thread.State: RUNNABLE

            at sun.nio.ch.FileDispatcher.read0(Native Method)

            at sun.nio.ch.FileDispatcher.read(FileDispatcher.java:26)

            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

            at sun.nio.ch.IOUtil.read(IOUtil.java:171)

            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:144)

            - locked <0x00000007b47a0d20> (a java.lang.Object)

     

    再用iostat命令查看IO情况:(iostat -x 1 100)每陋一条显示一次io信息

      avg-cpu:  %user   %nice %system %iowait  %steal   %idle

               4.34    0.00    0.21   17.28    0.00   78.17

     

    Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

    sd1               0.00     7.00    0.00    9.00     0.00   128.00    14.22     0.15   16.78   4.11   3.70

    sd2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd4               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd5               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd7               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd8               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd9               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

    sd10             101.00     0.00  179.00    0.00 71176.00     0.00   397.63    21.37   76.37   5.59 100.00

     

    %util:(上面命令每隔1秒显示一次)1秒内IO时间有多少,即io时间占总时间比例,如100%,即一直在读取硬盘文件。

     

     

    所有需要读取的文件都存在sd10上,即11条线程同时读一块硬盘上数据,造成io过高。

     

    后一来把文件读取任务改成1个线程,整个程序运行时间与11条线程差不多。 

     

     后来在网上查,说持续读硬盘写硬盘,尽量用一条线程。 

     

    如果需要提高处理速度,可以把文件分布在不同的硬盘上,多条线程读取。 跟Cpu类似,密集计算型线程数要与逻辑Cpu数相等。 

     

    另外:感觉可以查看io情况适当的增加线程数,如一条线程硬盘 %utils= 20% ,是否可以增加1条线程读取呢,未测。 

     

    问题三:

       线程数量问题

       

      1个文件读取线程供应不上12条行处理线程的速度,造成 12条行线程有一半在阻塞状态,浪费线程,(原因行处理线程任务时间太短),通过查看线程待处理队列(12个行处理线程共享一个行数据队列,从中取数据,行处理数据队列大小时常为0),适当调整线程数。

     

     

    问题四: 

      任务拆分

      当查看一个任务的CPU的占有率为 90%以上时,是否考虑任务拆分成多个任务,则多个CPU同时处理,以增加处理速度

     

     

     

     

     

     

     

     

             

     

                 

                 

               

     

                

                

    展开全文
  • 参考: https://github.com/apache/poi ... 每个类都是调用 readerExcelInputStream 方法 返回 List<List<String>> xlsx 文件读取 package cn.zhangfusheng.util.base.poi.csv; import org.apache.poi.

    参考:
    https://github.com/apache/poi
    https://blog.csdn.net/rexueqingchun/article/details/89306707?utm_source=app

    每个类都是调用 readerExcelInputStream 方法 返回 List<List<String>>
    注: 文中 arrayList 可替换成 jdk8 的 arrayList

    xlsx 文件读取

    package cn.zhangfusheng.util.base.poi.csv;
    
    import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
    import org.apache.poi.openxml4j.opc.OPCPackage;
    import org.apache.poi.openxml4j.opc.PackageAccess;
    import org.apache.poi.ss.usermodel.DataFormatter;
    import org.apache.poi.ss.util.CellAddress;
    import org.apache.poi.ss.util.CellReference;
    import org.apache.poi.util.XMLHelper;
    import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
    import org.apache.poi.xssf.eventusermodel.XSSFReader;
    import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
    import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
    import org.apache.poi.xssf.model.SharedStrings;
    import org.apache.poi.xssf.model.Styles;
    import org.apache.poi.xssf.model.StylesTable;
    import org.apache.poi.xssf.usermodel.XSSFComment;
    import org.xml.sax.ContentHandler;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    import org.xml.sax.XMLReader;
    
    import javax.xml.parsers.ParserConfigurationException;
    import java.io.File;
    import java.io.IOException;
    import java.io.InputStream;
    import java.util.ArrayList;
    import java.util.List;
    
    public class XLSX2CSV {
    
        private class SheetToCSV implements SheetContentsHandler {
            private boolean firstCellOfRow;
            private int currentRow = -1;
            private int currentCol = -1;
    
            @Override
            public void startRow(int rowNum) {
                // Prepare for this row
                firstCellOfRow = true;
                currentRow = rowNum;
                currentCol = -1;
            }
    
            @Override
            public void endRow(int rowNum) {
                // Ensure the minimum number of columns
                for (int i = currentCol; i < minColumns; i++) {
                    row.add("");
                }
                allRows.add(new ArrayList<>(row));
                row.clear();
            }
    
            @Override
            public void cell(String cellReference, String formattedValue, XSSFComment comment) {
                if (firstCellOfRow) {
                    firstCellOfRow = false;
                }
                // gracefully handle missing CellRef here in a similar way as XSSFCell does
                if (cellReference == null) {
                    cellReference = new CellAddress(currentRow, currentCol).formatAsString();
                }
    
                // Did we miss any cells?
                int thisCol = (new CellReference(cellReference)).getCol();
                int missedCols = thisCol - currentCol - 1;
                for (int i = 0; i < missedCols; i++) {
                    row.add("");
                }
                currentCol = thisCol;
                row.add(formattedValue);
            }
        }
    
    
        private final OPCPackage xlsxPackage;
    
        /**
         * Number of columns to read starting with leftmost
         */
        private final int minColumns;
        private final int sheetIndex;
    
        private final List<List<String>> allRows = new ArrayList<>();
        private final List<String> row;
    
        public List<List<String>> getRows() {
            return allRows;
        }
    
        /**
         * Creates a new XLSX -> CSV examples
         * @param pkg        The XLSX package to process
         * @param minColumns The minimum number of columns to output, or -1 for no minimum
         */
        public XLSX2CSV(OPCPackage pkg, int minColumns, int sheetIndex) {
            this.xlsxPackage = pkg;
            this.minColumns = minColumns;
            this.row = new ArrayList<>(minColumns);
            this.sheetIndex = sheetIndex;
        }
    
        /**
         * Parses and shows the content of one sheet
         * using the specified styles and shared-strings tables.
         * @param styles           The table of styles that may be referenced by cells in the sheet
         * @param strings          The table of strings that may be referenced by cells in the sheet
         * @param sheetInputStream The stream to read the sheet-data from.
         * @throws java.io.IOException An IO exception from the parser,
         *                             possibly from a byte stream or character stream
         *                             supplied by the application.
         * @throws SAXException        if parsing the XML data fails.
         */
        public void processSheet(
                Styles styles,
                SharedStrings strings,
                SheetContentsHandler sheetHandler,
                InputStream sheetInputStream) throws IOException, SAXException {
            DataFormatter formatter = new DataFormatter();
            InputSource sheetSource = new InputSource(sheetInputStream);
            try {
                XMLReader sheetParser = XMLHelper.newXMLReader();
                ContentHandler handler = new XSSFSheetXMLHandler(
                        styles, null, strings, sheetHandler, formatter, false);
                sheetParser.setContentHandler(handler);
                sheetParser.parse(sheetSource);
            } catch (ParserConfigurationException e) {
                throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
            }
        }
    
        /**
         * Initiates the processing of the XLS workbook file to CSV.
         * @throws IOException  If reading the data from the package fails.
         * @throws SAXException if parsing the XML data fails.
         */
        public void process() throws IOException, OpenXML4JException, SAXException {
            ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
            XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
            StylesTable styles = xssfReader.getStylesTable();
            XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
            int index = 1;
            while (iter.hasNext()) {
                try (InputStream stream = iter.next()) {
                    if (index++ != sheetIndex) {
                        continue;
                    }
                    processSheet(styles, strings, new SheetToCSV(), stream);
                    if (index >= sheetIndex) {
                        break;
                    }
                }
            }
        }
    
        /**
         * 读取 excel
         * @param file
         * @param minColumns
         * @param sheetIndex
         * @return
         * @throws IOException
         */
        public static List<List<String>> readerExcelInputStream(File file, int minColumns, int sheetIndex) throws Exception {
            try (OPCPackage p = OPCPackage.open(file, PackageAccess.READ)) {
                XLSX2CSV xlsx2csv = new XLSX2CSV(p, minColumns, sheetIndex);
                xlsx2csv.process();
                return xlsx2csv.getRows();
            }
        }
    }
    

    xls 文件读取

    package cn.zhangfusheng.util.base.poi.csv;
    
    import cn.zhangfusheng.util.base.exception.GlobalSystemException;
    import org.apache.poi.hssf.eventusermodel.EventWorkbookBuilder.SheetRecordCollectingListener;
    import org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener;
    import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
    import org.apache.poi.hssf.eventusermodel.HSSFListener;
    import org.apache.poi.hssf.eventusermodel.HSSFRequest;
    import org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener;
    import org.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord;
    import org.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord;
    import org.apache.poi.hssf.model.HSSFFormulaParser;
    import org.apache.poi.hssf.record.BOFRecord;
    import org.apache.poi.hssf.record.BlankRecord;
    import org.apache.poi.hssf.record.BoolErrRecord;
    import org.apache.poi.hssf.record.BoundSheetRecord;
    import org.apache.poi.hssf.record.FormulaRecord;
    import org.apache.poi.hssf.record.LabelRecord;
    import org.apache.poi.hssf.record.LabelSSTRecord;
    import org.apache.poi.hssf.record.NoteRecord;
    import org.apache.poi.hssf.record.NumberRecord;
    import org.apache.poi.hssf.record.RKRecord;
    import org.apache.poi.hssf.record.SSTRecord;
    import org.apache.poi.hssf.record.StringRecord;
    import org.apache.poi.hssf.usermodel.HSSFWorkbook;
    import org.apache.poi.poifs.filesystem.POIFSFileSystem;
    
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;
    
    /**
     * A XLS -> CSV processor, that uses the MissingRecordAware
     * EventModel code to ensure it outputs all columns and rows.
     * @author Nick Burch
     */
    @SuppressWarnings({"java:S106", "java:S4823"})
    public class XLS2CSV implements HSSFListener {
        private final int minColumns;
        private final POIFSFileSystem fs;
    
        private int lastRowNumber;
        private int lastColumnNumber;
    
        /**
         * Should we output the formula, or the value it has?
         */
        private final boolean outputFormulaValues = true;
    
        /**
         * For parsing Formulas
         */
        private SheetRecordCollectingListener workbookBuildingListener;
        private HSSFWorkbook stubWorkbook;
    
        // Records we pick up as we process
        private SSTRecord sstRecord;
        private FormatTrackingHSSFListener formatListener;
    
        /**
         * So we known which sheet we're on
         */
        private int sheetIndex = -1;
        private final int readSheetIndex;
        private BoundSheetRecord[] sheetRecords;
        private final List<BoundSheetRecord> boundSheetRecords = new ArrayList<>();
    
        // For handling formulas with string results
        private int nextRow;
        private int nextColumn;
        private boolean outputNextStringRecord;
    
        private final List<List<String>> rows = new ArrayList<>();
        private final List<String> record;
    
        public List<List<String>> getRows() {
            return this.rows;
        }
    
        /**
         * Creates a new XLS -> CSV converter
         * @param fs         The POIFSFileSystem to process
         * @param minColumns The minimum number of columns to output, or -1 for no minimum
         */
        public XLS2CSV(POIFSFileSystem fs, int minColumns, int readSheetIndex) {
            this.fs = fs;
            this.minColumns = minColumns;
            this.record = new ArrayList<>(minColumns);
            this.readSheetIndex = readSheetIndex;
        }
    
        /**
         * Creates a new XLS -> CSV converter
         * @param filename   The file to process
         * @param minColumns The minimum number of columns to output, or -1 for no minimum
         */
        public XLS2CSV(String filename, int minColumns, int readSheetIndex) throws IOException, FileNotFoundException {
            this(new POIFSFileSystem(new FileInputStream(filename)), minColumns, readSheetIndex);
        }
    
        public XLS2CSV(File file, int minColumns, int readSheetIndex) throws IOException, FileNotFoundException {
            this(new POIFSFileSystem(new FileInputStream(file)), minColumns, readSheetIndex);
        }
    
        /**
         * Initiates the processing of the XLS file to CSV
         */
        public void process() throws IOException {
            MissingRecordAwareHSSFListener listener = new MissingRecordAwareHSSFListener(this);
            formatListener = new FormatTrackingHSSFListener(listener);
    
            HSSFEventFactory factory = new HSSFEventFactory();
            HSSFRequest request = new HSSFRequest();
    
            if (outputFormulaValues) {
                request.addListenerForAllRecords(formatListener);
            } else {
                workbookBuildingListener = new SheetRecordCollectingListener(formatListener);
                request.addListenerForAllRecords(workbookBuildingListener);
            }
    
            factory.processWorkbookEvents(request, fs);
        }
    
        /**
         * Main HSSFListener method, processes events, and outputs the
         * CSV as the file is processed.
         */
        @Override
        public void processRecord(org.apache.poi.hssf.record.Record record) {
            int thisRow = -1;
            int thisColumn = -1;
            String thisStr = null;
    
            switch (record.getSid()) {
                case BoundSheetRecord.sid:
                    boundSheetRecords.add((BoundSheetRecord) record);
                    break;
                case BOFRecord.sid:
                    BOFRecord br = (BOFRecord) record;
                    if (br.getType() == BOFRecord.TYPE_WORKSHEET) {
                        // Create sub workbook if required
                        if (workbookBuildingListener != null && stubWorkbook == null) {
                            stubWorkbook = workbookBuildingListener.getStubHSSFWorkbook();
                        }
    
                        // Output the worksheet name
                        // Works by ordering the BSRs by the location of
                        //  their BOFRecords, and then knowing that we
                        //  process BOFRecords in byte offset order
                        sheetIndex++;
                        if (sheetIndex != readSheetIndex - 1) {
                            return;
                        }
                        if (sheetRecords == null) {
                            sheetRecords = BoundSheetRecord.orderByBofPosition(boundSheetRecords);
                        }
                    }
                    break;
    
                case SSTRecord.sid:
                    sstRecord = (SSTRecord) record;
                    break;
    
                case BlankRecord.sid:
                    BlankRecord brec = (BlankRecord) record;
    
                    thisRow = brec.getRow();
                    thisColumn = brec.getColumn();
                    thisStr = "";
                    break;
                case BoolErrRecord.sid:
                    BoolErrRecord berec = (BoolErrRecord) record;
    
                    thisRow = berec.getRow();
                    thisColumn = berec.getColumn();
                    thisStr = "";
                    break;
    
                case FormulaRecord.sid:
                    FormulaRecord frec = (FormulaRecord) record;
    
                    thisRow = frec.getRow();
                    thisColumn = frec.getColumn();
    
                    if (outputFormulaValues) {
                        if (Double.isNaN(frec.getValue())) {
                            // Formula result is a string
                            // This is stored in the next record
                            outputNextStringRecord = true;
                            nextRow = frec.getRow();
                            nextColumn = frec.getColumn();
                        } else {
                            thisStr = formatListener.formatNumberDateCell(frec);
                        }
                    } else {
                        thisStr = HSSFFormulaParser.toFormulaString(stubWorkbook, frec.getParsedExpression());
                    }
                    break;
                case StringRecord.sid:
                    if (outputNextStringRecord) {
                        // String for formula
                        StringRecord srec = (StringRecord) record;
                        thisStr = srec.getString();
                        thisRow = nextRow;
                        thisColumn = nextColumn;
                        outputNextStringRecord = false;
                    }
                    break;
    
                case LabelRecord.sid:
                    LabelRecord lrec = (LabelRecord) record;
    
                    thisRow = lrec.getRow();
                    thisColumn = lrec.getColumn();
                    thisStr = lrec.getValue();
                    break;
                case LabelSSTRecord.sid:
                    LabelSSTRecord lsrec = (LabelSSTRecord) record;
    
                    thisRow = lsrec.getRow();
                    thisColumn = lsrec.getColumn();
                    if (sstRecord == null) {
                        throw new GlobalSystemException("No SST Record, can't identify string");
                    } else {
                        thisStr = sstRecord.getString(lsrec.getSSTIndex()).toString();
                    }
                    break;
                case NoteRecord.sid:
                    NoteRecord nrec = (NoteRecord) record;
    
                    thisRow = nrec.getRow();
                    thisColumn = nrec.getColumn();
                    // TODO: Find object to match nrec.getShapeId()
                    thisStr = "(TODO)";
                    break;
                case NumberRecord.sid:
                    NumberRecord numrec = (NumberRecord) record;
    
                    thisRow = numrec.getRow();
                    thisColumn = numrec.getColumn();
    
                    // Format
                    thisStr = formatListener.formatNumberDateCell(numrec);
                    break;
                case RKRecord.sid:
                    RKRecord rkrec = (RKRecord) record;
    
                    thisRow = rkrec.getRow();
                    thisColumn = rkrec.getColumn();
                    thisStr = "(TODO)";
                    break;
                default:
                    break;
            }
            /*if (sheetIndex == readSheetIndex - 1) {
                return;
            }*/
            if (sheetIndex != -1 && sheetIndex == readSheetIndex - 1) {
                // Handle new row
                if (thisRow != -1 && thisRow != lastRowNumber) {
                    lastColumnNumber = -1;
                }
    
                // Handle missing column
                if (record instanceof MissingCellDummyRecord) {
                    MissingCellDummyRecord mc = (MissingCellDummyRecord) record;
                    thisRow = mc.getRow();
                    thisColumn = mc.getColumn();
                    thisStr = "";
                }
    
                // If we got something to print out, do so
                if (thisStr != null) {
                    this.record.add(thisStr);
                }
    
                // Update column and row count
                if (thisRow > -1) {
                    lastRowNumber = thisRow;
                }
                if (thisColumn > -1) {
                    lastColumnNumber = thisColumn;
                }
    
                // Handle end of row
                if (record instanceof LastCellOfRowDummyRecord) {
                    // Print out any missing commas if needed
                    if (minColumns > 0) {
                        // Columns are 0 based
                        if (lastColumnNumber == -1) {
                            lastColumnNumber = 0;
                        }
                        for (int i = lastColumnNumber; i < (minColumns); i++) {
                            this.record.add("");
                        }
                    }
    
                    // We're onto a new row
                    lastColumnNumber = -1;
    
                    // End the row
                    this.rows.add(new ArrayList<>(this.record));
                    this.record.clear();
                }
            }
        }
    
        /**
         * 读取 xls 大文件
         * @param file       文件
         * @param minColumns 列数
         * @param sheetIndex sheet index
         * @return
         * @throws IOException
         */
        public static List<List<String>> readerExcelInputStream(File file, int minColumns, int sheetIndex) throws IOException {
            XLS2CSV xls2csv = new XLS2CSV(file, minColumns, sheetIndex);
            xls2csv.process();
            return xls2csv.getRows();
        }
    }
    
    展开全文
  • 每张表的数据量大概在2000W左右,因为是只需要读一张表指定字段的全部数据,也不需要考虑查询优化,只针对表读取规模进行优化即可。 版本oracle 11G 思路一: 使用oracle语句对数据表进行分页读取,这里应注意到的是...
  • 1、有一个maven工程,或者一个java工程 maven工程的话比较简单 pom文件中直接引入下面依赖即可 <dependencies> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi...
  • 近日由于相关业务,需要读取一个将近1G的文件,然后将符合条件的数据insert进数据库。而执行完程序之后,最后写入数据库的数据是将近100w条,在linux系统下运行时间将近3个小时,操作还是上了一个相当的规模。由于...
  • 处理大批的excel表格数据只能是 xlsx格式的excel表格常用的读取:FileInputStream fi...然后再获取对应的Sheet、Row和Cell,然后获取excel中的内容,但是这种方式POI会把文件的所有内容都加载到内存中,读取大的ex...
  • 这个虽然算不上大文件,但是考虑到以后如果数据量增加可能到2G、3G、甚至更,所以就不打算用普通读取文件的方式去读取这个文件。因为普通读取文件是将文件全部加载进内存,然后再内存中读取。这种情况下,文件比较...
  • 该代码可以处理100万数据量的excel文件,xlsx文件数据量太大,用普通的读法会报内存溢出错误,所以用官网提供的方法,一条一条的读取大excel文件,本例子从这点出发,组装excel里读取的单条数据为list,在根据需求...
  • 代码如下,做一个数据脚本的备份与还原,从数据库查询所有记录,拼接成插入语句,数据库中有600多万条数据,在自的电脑上试,写到txt文件中足足花了半个小时的时间啊。。。。。不知道哪里有问题,麻烦各位指点一下/*...
  • Java读取超大文本文件

    千次阅读 2011-02-20 13:55:00
    近日由于相关业务,需要读取一个将近1G的文件,然后将符合条件的数据insert进数据库。而执行完程序之后,最后写入数据库的数据是将近100w条,在linux系统下运行时间将近3个小时,操作还是上了一个相当的规模...
  • 读取本地的xml文件,通过DOM进行解析,DOM解析的特点就是把整个xml文件装载入内存中,形成一颗DOM树形结构,树结构是方便遍历和和操纵。DOM解析的特性就是读取xml文件转换为 dom树形结构,通过节点进行遍历。这是W3c...
  • JAVA大数据量Excel

    万次阅读 热门讨论 2014-12-09 13:05:28
    JAVA大数据量Excel,对Excel各种类型的单元格数据进行分类处理,最后封装到一个类中。 只需要传入Excel文件即可进行逐行读取内容
  • ,但是当存在大量数据要获取时,就会使用一个新的properties文件,x=项目启动的时候现将文件加载到内存中。后面使用 private static Map<String, String> map = Maps.newHashMap(); static { ClassLoader ...
  • 数据量小的,可以使用普通api来做, 但是当数据量非常时,普通的就不行了,不过我的这个代码可以完美解决此问题
  • 读取本地的xml文件,通过DOM进行解析,DOM解析的特点就是把整个xml文件装载入内存中,形成一颗DOM树形结构,树结构是方便遍历和和操纵。DOM解析的特性就是读取xml文件转换为 dom树形结构,通过节点进行遍历。这是W3c...
  • 主要步骤必要配置JAVA读取指定文件JAVA连接Oracle数据库 必要配置 操作在eclipse里进行。在进行下面操作之前,注意需要将有关Oracle的数据库连接的jar包。 首先需要右键对应Java项目,选择Properties选项,然后找到...
  • java io读取文件

    2013-07-11 23:26:23
    java io读取文件的技术,最近用到了大数据量文件读取,研究下
  • 主要介绍了Java实现从数据库导出大量数据记录并保存到文件的方法,涉及Java针对数据库的读取文件写入等操作技巧,具有一定参考借鉴价值,需要的朋友可以参考下
  • Java读取Properties文件

    2017-04-22 21:21:00
    1.本篇博文将要介绍的用java读取Properties文件,最近接触到一个新的项目,里面有大量的properties文件,所以萌生了写这篇博文的想法。关于properties,一般用来做存储数据文件,采用的键值对的方式,比如我们经常...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 715
精华内容 286
关键字:

java读取大数据量文件

java 订阅