2018-10-21 21:21:58 qq_39591494 阅读数 227

Python操作office之excel

一、什么是Excel?

Microsoft Excel是Microsoft为使用Windows和Apple Macintosh操作系统的电脑编写的一款电子表格软件。直观的界面、出色的计算功能和图表工具,再加上成功的市场营销,使Excel成为最流行的个人计算机数据处理软件。在1993年,作为Microsoft Office的组件发布了5.0版之后,Excel就开始成为所适用操作平台上的电子制表软件的霸主。

二、在Jupyter notebook对Excel进行操作

Excel表格内容如下:
在这里插入图片描述

2.1、安装openpyxl

pip install openpyxl

2.2、导入openpyxl

%cd F:\python_test1\Python_office  # 进入工作目录
# 安装openpyxl
# pip install openpyxl
import openpyxl # 导入openpyxl
wb = openpyxl.load_workbook("yankerp1.xlsx")
# wb是一个object openpyxl.load_workbook就是把yankerp1.xlsx这个文档赋值给wb 那么wb就是那个yankerp1.xlsx这个文件

2.3、导入工作簿

SH = wb.active # 导入工作簿使用active

2.4、取坐标位置的值

SH["A1"].value
SH["A2"].value
SH["A3"].value

2.5、更改坐标值及获取坐标内容:

SH["A3"].value
>>>张三
SH["A3"] = "ZhangSan"
SH["A3"].value
>>>ZhangSan
SH["A1"].coordinate
>>>A1

2.6、获取当前工作簿名称:

SH.title
>>>成绩单

更改工作簿名称:

SH.title = "学生成绩单"
SH.title
>>> 学生成绩单

2.7、保存更改后的Excel到新的Excel

wb.save("学生成绩单")

更改后打开如下:
在这里插入图片描述

三、自我练习

原始表1:

%cd F:\python_test1\Python_office

import openpyxl
from openpyxl.styles import Font
WB = openpyxl.load_workbook("学生.xlsx")

Sheet1 = WB["Sheet1"]
Sheet2 = WB["Sheet2"]

index = 2
for i in Sheet1.rows:
    if i[0].coordinate != "A1" and i[1].value > 60:
        print(i[0].value, i[1].value)
        Sheet2["A" + str(index)] = i[0].value
        Sheet2["B" + str(index)] = i[1].value
        index += 1
    Sheet2.title = "及格同学"
    Sheet1.cell(row=9, column=1).value = "平均分数"
    Sheet1.cell(row=9, column=2).value = "=average(B2,B8)"
    Sheet1.title = "同学平均分数表"
    Sheet1["A2"] = "延凯"
    Sheet1["A3"] = "张三"
    
    fount = Font(bold=True, size=14)
    Sheet1["A9"].font = fount
WB.save("student1.xlsx")

在这里插入图片描述

在这里插入图片描述

2019-06-13 14:51:03 dxdfe 阅读数 568

基于python的office操作

记录一些用python做文档处理的知识以备以后再用。

pdf操作

因为今天要交申报材料,需要一些证书的pdf,在手机上拍照用app扫描全能王做pdf的话会留水印,除非交钱注册vip,另外在网上找的一些pdf在线处理网站也是各种需要注册缴费,干脆自己学下用python处理算了。

首先用扫描全能王得到证书的jpg文档(直接拍也行,但是扫描全能王可以抓边框做锐化,还是挺好的,我要是经常用就注册了),然后用python把jpg转pdf,教程如下(注意照片必须是竖着的,如果照片是横着的,会横着缩放到pdf里,我还没搞明白这是怎么回事,也不能旋转再转pdf,因为转的时候会在图片两边生成黑边然后一起输入pdf,还得裁剪掉黑边,太麻烦还不如一开始就准备好竖着的图片)

https://blog.csdn.net/ycc297876771/article/details/81005298blog.csdn.net

程序如下:

import sys

from reportlab.lib.pagesizes import portrait

from reportlab.pdfgen import canvas

def imgtopdf(input_paths, outputpath):

(maxw, maxh) = [Image.open](https://link.zhihu.com/?target=http%3A//Image.open)(input_paths).size #确认图片尺寸

c = canvas.Canvas(outputpath, pagesize=portrait((maxw, maxh))) #按图片尺寸生成空白pdf

c.drawImage(input_paths, 0, 0, maxw, maxh) #把图片抄到pdf

c.showPage()

[c.save](https://link.zhihu.com/?target=http%3A//c.save)()

imgtopdf(“2b.jpg”, “2b.pdf”) #根据默认文件夹下的图片生成pdf

如果需要合并多个pdf,则采用如下教程:一个用于合并pdf的简单Python脚本如果需要合并多个pdf,则采用如下教程,对默认文件夹下所有pdf进行合并,注意某些pdf合成会失败,不知道是不是版本问题:

一个用于合并pdf的简单Python脚本www.jianshu.com

程序如下:

import PyPDF2

import os

import re

def main():

# find all the pdf files in current directory.

mypath = os.getcwd()

pattern = r"\.pdf$"

file_names_lst = [mypath + "\\" + f for f in os.listdir(mypath) if [re.search](https://link.zhihu.com/?target=http%3A//re.search)(pattern, f, re.IGNORECASE)

and not [re.search](https://link.zhihu.com/?target=http%3A//re.search)(r'Merged.pdf',f)]

# merge the file.

opened_file = [open(file_name,'rb') for file_name in file_names_lst]

pdfFM = PyPDF2.PdfFileMerger()

for file in opened_file:

pdfFM.append(file)

# output the file.

with open(mypath + "\\Merged.pdf", 'wb') as write_out_file:

pdfFM.write(write_out_file)

# close all the input files.

for file in opened_file:

file.close()

if __name__ == '__main__':

	main()

如果要清除某些页面:

#Python#用软件删除PDF中的空白页,竟然收费?!果断用Pythonwww.jianshu.com

程序如下:

import PyPDF2

original = r'1.pdf'

new = r'2.pdf'

original_pdf = PyPDF2.PdfFileReader(original)

page8 = original_pdf.getPage(8)

page16 = original_pdf.getPage(16)

pdfWriter = PyPDF2.PdfFileWriter()

pdfWriter.addPage(page8)

pdfWriter.addPage(page16)

with open(new, 'wb') as f:

pdfWriter.write(f)
2019-03-29 11:29:18 hanli1992 阅读数 200

python无法处理.doc文件,所以先将.doc文件转化为.docx文件,再处理.docx

一、.doc转化为.docx、.pdf、.txt……

python无法处理.doc文件,但可以处理.docx文件,先将.doc转为.docx

from win32com import client as wc
 
#word = wc.Dispatch("Word.Application")
#启动独立的进程,推荐这种
word = wc.DispatchEx("Word.Application")
 
# 后台运行,不显示,不警告
w.Visible = False
w.DisplayAlerts = 0
 
doc = word.Documents.Open(r'D:\Projects\CA\script\CA0218748520031015-DNAv01.DOC')
doc.SaveAs(r'D:\Projects\CA\script\CA0218748520031015-DNAv01.docx', 12, False, '', True, '', False, False, False, False)  #12为docx
doc.Close()
word.Quit()

若将参数“12”换成“2”,就可将.doc转成.txt

根据参数设置可以将word文件转换成任意格式文件

wdFormatDocument 0
wdFormatDocument97 0
wdFormatTemplate 1
wdFormatTemplate97 1
wdFormatText 2
wdFormatTextLineBreaks 3
wdFormatDOSText 4
wdFormatDOSTextLineBreaks 5
wdFormatRTF 6
wdFormatEncodedText 7
wdFormatUnicodeText 7
wdFormatHTML 8
wdFormatWebArchive 9
wdFormatFilteredHTML 10
wdFormatXML 11
wdFormatXMLDocument 12
wdFormatXMLDocumentMacroEnabled 13
wdFormatXMLTemplate 14
wdFormatXMLTemplateMacroEnabled 15
wdFormatDocumentDefault 16
wdFormatPDF 17
wdFormatXPS 18
wdFormatFlatXML 19
wdFormatFlatXMLMacroEnabled 20
wdFormatFlatXMLTemplate 21
wdFormatFlatXMLTemplateMacroEnabled 22

word文件转html有两种格式可选wdFormatHTML、wdFormatFilteredHTML(对应数字8、10),区别是如果是wdFormatHTML格式的话,word文件里面的公式等ole对象将会存储成wmf格式,而选用wdFormatFilteredHTML的话公式图片将存储为gif格式,而且目测可以看出用wdFormatFilteredHTML生成的HTML明显比wdFormatHTML要干净许多。

二、读取.docx

  
#读取.docx
import docx
 
#获取文档对象
file=docx.Document(r'D:\Projects\CA\script\CA0218748520031015-DNAv01.docx')
print("段落数:"+str(len(file.paragraphs)))
 
#输出每一段的内容
for para in file.paragraphs:
    print(para.text)
 
#输出段落编号及段落内容
for i in range(len(file.paragraphs)):
    print("第"+str(i)+"段的内容是:"+file.paragraphs[i].text)

 

2012-12-05 14:22:20 scmsgxz 阅读数 9
这里测试的环境是:windows xp,office 2007,python 2.5.2,pywin32 build 213,原理是利用win32com接口直接调用office API,好处是简单、兼容性好,只要office能处理的,python都可以处理,处理出来的结果和office word里面“另存为”一致。

#!/usr/bin/env python

#coding=utf-8

from win32com import client as wc

word = wc.Dispatch('Word.Application')

doc = word.Documents.Open('d:/labs/math.doc')

doc.SaveAs('d:/labs/math.html', 8 )

doc.Close()

word.Quit()

关键的就是doc.SaveAs(‘d:/labs/math.html’, 8 )这一行,网上很多文章写成:doc.SaveAs(‘d:/labs/math.html’, win32com.client.constants.wdFormatHTML),直接报错:

AttributeError: class Constants has no attribute ‘wdFormatHTML’

当然你也可以用上面的代码将word文件转换成任意格式文件(只要office 2007支持,比如将word文件转换成PDF文件,把8改成17即可),下面是office 2007支持的全部文件格式对应表:

wdFormatDocument = 0
wdFormatDocument97 = 0
wdFormatDocumentDefault = 16
wdFormatDOSText = 4
wdFormatDOSTextLineBreaks = 5
wdFormatEncodedText = 7
wdFormatFilteredHTML = 10
wdFormatFlatXML = 19
wdFormatFlatXMLMacroEnabled = 20
wdFormatFlatXMLTemplate = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML = 8
wdFormatPDF = 17
wdFormatRTF = 6
wdFormatTemplate = 1
wdFormatTemplate97 = 1
wdFormatText = 2
wdFormatTextLineBreaks = 3
wdFormatUnicodeText = 7
wdFormatWebArchive = 9
wdFormatXML = 11
wdFormatXMLDocument = 12
wdFormatXMLDocumentMacroEnabled = 13
wdFormatXMLTemplate = 14
wdFormatXMLTemplateMacroEnabled = 15
wdFormatXPS = 18

照着字面意思应该能对应到相应的文件格式,如果你是office 2003可能支持不了这么多格式。word文件转html有两种格式可选wdFormatHTML、wdFormatFilteredHTML(对应数字8、10),区别是如果是wdFormatHTML格式的话,word文件里面的公式等ole对象将会存储成wmf格式,而选用wdFormatFilteredHTML的话公式图片将存储为gif格式,而且目测可以看出用wdFormatFilteredHTML生成的HTML明显比wdFormatHTML要干净许多。

当然你也可以用任意一种语言通过com来调用office API,比如PHP.

原文链接 : [url]http://www.fuchaoqun.com/2009/03/use-python-convert-word-to-html-with-win32com/[/url]
2010-05-14 12:54:00 binger819623 阅读数 1954

Python转换office word文件为HTML

这里测试的环境是:windows xp,office 2007,python 2.5.2,pywin32 build 213,原理是利用win32com接口直接调用office API,好处是简单、兼容性好,只要office能处理的,python都可以处理,处理出来的结果和office word里面“另存为”一致。

#!/usr/bin/env python

 
#coding=utf-8
 
from win32com import client as wc
 
word = wc.Dispatch ( 'Word.Application' )
 
doc = word.Documents .Open ( 'd:/labs/math.doc' )
 
doc.SaveAs ( 'd:/labs/math.html' , 8 )
 
doc.Close ( )
 
word.Quit ( )

关键的就是doc.SaveAs(‘d:/labs/math.html’, 8)这一行,网上很多文章写成:doc.SaveAs(‘d:/labs/math.html’, win32com.client.constants.wdFormatHTML),直接报错:

AttributeError: class Constants has no attribute ‘wdFormatHTML’

当然你也可以用上面的代码将word文件转换成任意格式文件(只要office 2007支持,比如将word文件转换成PDF文件,把8改成17即可),下面是office 2007支持的全部文件格式对应表:

wdFormatDocument                    =  0
wdFormatDocument97 = 0
wdFormatDocumentDefault = 16
wdFormatDOSText = 4
wdFormatDOSTextLineBreaks = 5
wdFormatEncodedText = 7
wdFormatFilteredHTML = 10
wdFormatFlatXML = 19
wdFormatFlatXMLMacroEnabled = 20
wdFormatFlatXMLTemplate = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML = 8
wdFormatPDF = 17
wdFormatRTF = 6
wdFormatTemplate = 1
wdFormatTemplate97 = 1
wdFormatText = 2
wdFormatTextLineBreaks = 3
wdFormatUnicodeText = 7
wdFormatWebArchive = 9
wdFormatXML = 11
wdFormatXMLDocument = 12
wdFormatXMLDocumentMacroEnabled = 13
wdFormatXMLTemplate = 14
wdFormatXMLTemplateMacroEnabled = 15
wdFormatXPS = 18

照着字面意思应该能对应到相应的文件格式,如果你是office 2003可能支持不了这么多格式。word文件转html有两种格式可选wdFormatHTML、wdFormatFilteredHTML(对应数字 8、10),区别是如果是wdFormatHTML格式的话,word文件里面的公式等ole对象将会存储成wmf格式,而选用 wdFormatFilteredHTML的话公式图片将存储为gif格式,而且目测可以看出用wdFormatFilteredHTML生成的HTML 明显比wdFormatHTML要干净许多。

当然你也可以用任意一种语言通过com来调用office API,比如PHP.

 

=========================================

使用 python 写 COM

 

2009年09月03日 星期四 下午 07:01

from : http://www.cppblog.com/bigsml/archive/2008/08/14/58851.html

Python 支持Com调用(client com) 以及撰写COM 组件(server com).
1. com 调用示例(使用Windows Media Player 播放音乐)

from win32com.client import Dispatch
mp
= Dispatch( " WMPlayer.OCX " )
tune
= mp.newMedia( " C:/WINDOWS/system32/oobe/images/title.wma " )
mp.currentPlaylist.appendItem(tune)
mp.controls.play()


2. com server 的编写
主要可以参考 <<Python Programming on Win32 之 Chapter 12 Advanced Python and COM http://oreilly.com/catalog/pythonwin32/chapter/ch12.html >>
示例(分割字符串)
- 代码

class PythonUtilities:
     _public_methods_
= [ ' SplitString ' ]
     _reg_progid_
= " PythonDemos.Utilities "
    
# NEVER copy the following ID
     # Use "print pythoncom.CreateGuid()" to make a new one.
     _reg_clsid_ = " {41E24E95-D45A-11D2-852C-204C4F4F5020} "
    
    
def SplitString(self, val, item = None):
        
import string
        
if item != None: item = str(item)
        
return string.split(str(val), item)

# Add code so that when this script is run by
#
Python.exe, it self-registers.
if __name__ == ' __main__ ' :
    
print " Registering COM server "
    
import win32com.server.register
     win32com.server.register.UseCommandLine(PythonUtilities)


- 注册/注销Com

Command-Line Option

Description

 

The default is to register the COM objects.

--unregister

Unregisters the objects. This removes all references to the objects from the Windows registry.

--debug

Registers the COM servers in debug mode. We discuss debugging COM servers later in this chapter.

--quiet

Register (or unregister) the object quietly (i.e., don't report success).


- 使用COM
可以在python 命令行下运行

>>> import win32com.client
>>> s = win32com.client.Dispatch( " PythonDemos.Utilities " )
>>> s.SplitString( " a,b,c " , " , " )
((u
' a ' , u ' a,b,c ' ),)
>>>


3. python server com 原理
其实在注册表中查找到python com 的实现内幕

Windows Registry Editor Version 5.00

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020} ]
@
= " PythonDemos.Utilities "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/Debugging ]
@
= " 0 "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/Implemented Categories ]

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/Implemented Categories/{B3EF80D0-68E2-11D0-A689-00C04FD658FF} ]

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/InprocServer32 ]
@
= " pythoncom25.dll "
" ThreadingModel " = " both "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/LocalServer32 ]
@
= " D://usr//Python//pythonw.exe / " D://usr//Python//lib//site-packages//win32com//server//localserver.py/ " {41E24E95-D45A-11D2-852C-204C4F4F5020} "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/ProgID ]
@
= " PythonDemos.Utilities "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/PythonCOM ]
@
= " PythonDemos.PythonUtilities "

[ HKEY_CLASSES_ROOT/CLSID/{41E24E95-D45A-11D2-852C-204C4F4F5020}/PythonCOMPath ]
@
= " D:// "

inproc server 是通过pythoncom25.dll 实现
local server 通过localserver.py 实现
com 对应的python 源文件信息在 PythonCOMPath & PythonCOM

4. 使用问题
用PHP 或者 c 调用com 的时候

<? php
$com = new COM( " PythonDemos.Utilities " );
$rs = $com -> SplitString( " a b c " );
foreach ( $rs as $r )
    
echo $r . " /n " ;
?>

会碰到下面的一些错误.
pythoncom error: PythonCOM Server - The 'win32com.server.policy' module could not be loaded.
<type 'exceptions.ImportError'>: No module named server.policy pythoncom error: CPyFactory::CreateInstance failed to create instance. (80004005)


可以通过2种方式解决:
a. 设置环境 PYTHONHOME = D:/usr/Python
另外在c ++ 使用python 的时候, 如果import module 出现错误 'import site' failed; use -v for traceback 的话, 也可以通过设置这个变量解决.

b. 为com 生产exe, dll 可执行文件, setup.py 代码如下 :

from distutils.core import setup
import py2exe

import sys
import shutil

# Remove the build tree ALWAYS do that!
shutil.rmtree( " build " , ignore_errors = True)

# List of modules to exclude from the executable
excludes = [ " pywin " , " pywin.debugger " , " pywin.debugger.dbgcon " , " pywin.dialogs " , " pywin.dialogs.list " ]

# List of modules to include in the executable
includes = [ " win32com.server " ]

# ModuleFinder can't handle runtime changes to __path__, but win32com uses them
try :
    
# if this doesn't work, try import modulefinder
     import py2exe.mf as modulefinder
    
import win32com
    
    
for p in win32com. __path__ [ 1 :]:
         modulefinder.AddPackagePath(
" win32com " , p)
    
    
for extra in [ " win32com.shell " , " win32com.server " ]: # ,"win32com.mapi"
         __import__ (extra)
         m
= sys.modules[extra]
        
for p in m. __path__ [ 1 :]:
             modulefinder.AddPackagePath(extra, p)

except ImportError:
    
# no build path setup, no worries.
     pass

# Set up py2exe with all the options
setup(
     options
= { " py2exe " : { " compressed " : 2 ,
                          
" optimize " : 2 ,
                          
# "bundle_files": 1,
                           " dist_dir " : " COMDist " ,
                          
" excludes " : excludes,
                          
" includes " : includes}},
    
# The lib directory contains everything except the executables and the python dll.
     # Can include a subdirectory name.
     zipfile = None,
     com_server
= [ 'PythonDemos ' ], # 文件名!!
     )



ref:
http://oreilly.com/catalog/pythonwin32/chapter/ch12.html
http://blog.donews.com/limodou/archive/2005/09/02/537571.aspx

 


python操作excel

阅读数 2600

python ecxcel处理

阅读数 51

没有更多推荐了,返回首页