2019-05-11 16:06:57 a_dev 阅读数 83
  • Python-数据

    Python数据库编程视频教程,数据库是MySQL,讲解Python链接MySQL数据库,并对数据库进行增删改查操作。

    10804 人正在学习 去看看 尹成

引言

Python是处理数据的良好工具。在做ArcGIS Engine开发时,需要数据处理的情形挺多,在数据量相对较大时,在Python中先编写和测试好数据处理的代码,再到C++、C#、Java等开发语言环境中调用,是一种可行的解决方案。
那么,Python脚本到底如何被其他语言调用呢,或者说,两者是如何交互的呢?本文尝试以C#语言和ArcPy进行试验和说明。

编写Python脚本

调用约定

本文的开发背景是某数据质检项目实践。项目需求是按不同的质检规则去检查GIS数据。部分规则的检查过程中,需要进行数据的空间位置进行必要运算,生成中间图层与结果图层,通过一系列预处理过程,达到识别质检结果的目的。

  • 每一条质检规则都有相应的编码,该编码是唯一的。因此,我们的Python脚本文件便以规则编码进行命名,统一放置在程序的指定目录下,以便识别和区分。如“…\Scripts\Rules\B006G001.py”。
  • 脚本包含中文字符,约定使用GBK字符集。
  • 规则正常执行结束时,输出一行文本,内容为数字“1”,表示成功,其他情况下可输出必要的过程信息,不作为判断成功依据。

代码编写

在PyCharm中编写数据处理代码,调试通过。

参数传递

通过字符串集合传递参数,将所有参数一一对应到Python脚本接收到的参数中即可。

脚本封装

添加必要的注释,使用main方法,捕捉异常等。具体的代码,不在本篇中给出。

编写C#调用方法

封装了三个方法,如下:

        /// <summary>
        /// 执行规则
        /// </summary>
        /// <param name="sRuleCode">规则编码(如:B002G026)</param>
        /// <param name="lstParam">参数集合</param>
        /// <returns>是否成功</returns>
        public bool RunRule(string sRuleCode, List<string> lstParam) => RunScript(GetRulePathByCode(sRuleCode), lstParam);

        /// <summary>
        /// 通过规则编码找到规则文件(同名,不带扩展名)
        /// </summary>
        /// <param name="sRuleCode">规则编码(如:B002G026)</param>
        /// <returns>全路径(不判断文件存在性)</returns>
        private string GetRulePathByCode(string sRuleCode) => Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"Scripts", @"Rules", $"{sRuleCode}.py");

        /// <summary>
        /// 执行Python脚本
        /// 约定:Python脚本中最后一行打印一个大于等于0的数字,表示成功
        /// </summary>
        /// <param name="sScriptPath">脚本路径</param>
        /// <param name="lstParam">参数列表</param>
        /// <returns>是否成功</returns>
        public bool RunScript(string sScriptPath, List<string> lstParam)
        {
            var bResult = false;
            try
            {
                if (!File.Exists(sScriptPath))
                    throw new Exception($"文件{sScriptPath}不存在!");

                var sInterpreterPath= PythonConfigManager.ConfigValueOfInterpreter; //Python解释器路径
                var sParam = $"{sScriptPath}";
                if (null != lstParam && 0 < lstParam.Count)
                {
                    var sArgument = "\"" + string.Join("\" \"", lstParam) + "\"";
                    sParam = $"\"{sParam}\" {sArgument}";
                }
                LogServices.WriteInfoLog(sParam);
                var start = new ProcessStartInfo
                {
                    WorkingDirectory = Environment.CurrentDirectory,
                    FileName = sInterpreterPath,
                    UseShellExecute = false,
                    ErrorDialog = true,
                    CreateNoWindow = true,
                    RedirectStandardOutput = true,
                    RedirectStandardInput = true,
                    Arguments = sParam
                };

                using (Process process = Process.Start(start))
                {
                    using (StreamReader reader = process.StandardOutput)
                    {
                        var sResult = "";
                        while (!reader.EndOfStream)
                        {
                            sResult = reader.ReadLine();
                            if (!string.IsNullOrEmpty(sResult))
                            {
                                LogServices.WriteInfoLog(sResult);
                            }
                        }
                        sResult = sResult.Trim();
                        if (!string.IsNullOrEmpty(sResult) && int.TryParse(sResult, out int nCode) && nCode >= 0)
                        {
                            bResult = true;
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                LogServices.WriteExceptionLog(ex, "Python脚本执行异常");
            }
            return bResult;
        }

优化

  • 在Python出输出必要的进度信息,并约定好相应格式规范
  • 在C#中不断获取Python脚本输出,判断进度
  • 使用cmd.exe来执行python.exe
2018-11-13 14:52:12 sgqhappy 阅读数 681
  • Python-数据

    Python数据库编程视频教程,数据库是MySQL,讲解Python链接MySQL数据库,并对数据库进行增删改查操作。

    10804 人正在学习 去看看 尹成

数据提取python脚本

版权声明:本文为博主原创文章,转载请注明出处:https://blog.csdn.net/sgqhappy/article/details/83988985

我们经常用到数据提取的Hive Sql的编写,每次数据提取都得进行hive的编写,为了将这种重复性强的运行命令简单化自动化人性化,我特地编写了一个python脚本,可以实现数据清洗,数据处理,计数下发,读写文件,保存日志等功能。

1. 导包

#!/usr/bin/python
#coding:utf-8

#Made by sgqhappy
#Date: 20181113
#function: data extract

from subprocess import Popen,PIPE
import os
import sys
import io
import re
import commands
import logging
from logging import handlers
from re import match

2. 定义一个类,用来打印脚本运行的log日志

日志既可以打印在控制台上,也可以输出到log文件。

class Logger(object):
	def __init__(self,log_file_name,log_level,logger_name):
		self.__logger = logging.getLogger(logger_name);
		self.__logger.setLevel(log_level);
		file_handler = logging.FileHandler(log_file_name);
		console_handler = logging.StreamHandler();
		
		#set log format and show log at console and log_file.
		LOG_FORMAT = "%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s : %(message)s";
		formatter = logging.Formatter(LOG_FORMAT);
		
		file_handler.setFormatter(formatter);
		console_handler.setFormatter(formatter);
		
		self.__logger.addHandler(file_handler);
		self.__logger.addHandler(console_handler);
		
	def get_log(self):
		return self.__logger;

3. 定义文件名及文件路径

	#This is file name.
	file_name = "%s_%s_%s" % (sys.argv[2],sys.argv[4],sys.argv[11]);
	info_log_path = '/python_test/%s.info.log' % (file_name);
	
	#this is record name and path.
	record_name = "data_extract_record.txt";
	record_path = "/python_test/";
	
	logger = Logger(log_file_name="%s" % (info_log_path),log_level=logging.DEBUG,logger_name="myLogger").get_log();
	
	#this is log path.
	path = '/python_test/%s.desc.log' % (file_name);
	logger.info("\n");
	logger.info("log path: %s" % (path));
	logger.info("\n");

4. 提取字段信息保存

	#function:write all fields to log file.
	hive_cmd_desc = 'beeline -u ip -n username -e "desc %s.%s" >> %s' % (sys.argv[1],sys.argv[2],path);
	logger.info(hive_cmd_desc);
	logger.info("\n");
	status,output = commands.getstatusoutput(hive_cmd_desc);
	logger.info(output);
	logger.info("\n");
	
	#logger.info success or failed information.
	if status ==0:
		logger.info("desc %s to %s successful!" % (sys.argv[2],path));
	else:
		#set color: '\033[;31;40m'+...+'\033[0m'
		logger.error('\033[;31;40m'+"desc %s to %s failed!" % (sys.argv[2],path)+'\033[0m');
		#exit program.
		exit();
	logger.info("\n");

5. 字符串处理

	#this is fields list
	fields_list = [];
	with io.open(path,'r',encoding="utf-8") as f:
		fields = list(f);
		for line in fields:
			#remove start letter "|".
			line_rm_start_letter = line.strip("|");
			logger.info(line_rm_start_letter);
			#remove start and end space.
			pos = line_rm_start_letter.find("|");
			fields_list.append(line_rm_start_letter[0:pos].strip());
	logger.info("\n");
	
	#remove desc.log.
	remove_desc_log = 'rm %s' % (path);
	logger.info(remove_desc_log);
	status,output = commands.getstatusoutput(remove_desc_log);
	
	#logger.info success or failed information.
	if status == 0:
		logger.info("remove %s successful!" % (path));
	else:
		logger.error('\033[;31;40m'+"remove %s failed!" % (path)+'\033[0m');
		exit();
	logger.info("\n");
	
	#remove the first three lines.
	del fields_list[0:3];
	create = "";
	start_or_etl = "";
	if 'etl_load_date' in fields_list:
		start_or_etl = "etl_load_date";
		end_letter_pos = fields_list.index("etl_load_date");
		len = len(fields_list);
		del fields_list[end_letter_pos:len+1];
	if 'start_dt' in fields_list:
		start_or_etl = "start_dt";
		end_letter_pos = fields_list.index("start_dt");
		len = len(fields_list);
		del fields_list[end_letter_pos:len+1];		

6. 添加附加条件

	#add condition_field.
	condition_field = "%s" % (sys.argv[3]);
	if condition_field == "0":
		pass;
	else:
		start_or_etl = condition_field;
		
	for i in fields_list:
		#logger.info(len(i));
		logger.info(i);
	logger.info("\n");

7. 拼接字段

	#splice fields.
	fields_splice = "";
	for i in fields_list:
		fields_splice = fields_splice+"nvl(a.\`"+i+"\`,''),'|',";
	logger.info(fields_splice);
	logger.info("\n");

8. 建表

	#create table command.
	add_conditions = "%s" % (sys.argv[9]);
	if add_conditions == "and 1=1":
		create = "create table if not exists database.%s stored as textfile as select concat (%s from %s.%s a join %s b on trim(a.\`%s\`)=trim(b.\`%s\`) where b.code='%s' and a.\`%s\`>='%s' and a.\`%s\`<='%s' %s;" % (file_name,fields_splice,sys_argv[1],sys.argv[2],sys.argv[6],sys.argv[7],sys.argv[8],sys.argv[4],start_or_etl,sys.argv[10],start_or_etl,sys.argv[11],sys.argv[9]);
	else:
		create = "create table if not exists database.%s stored as textfile as select concat(%s from %s.%s a %s;" % (file_name,fields_splice,sys.argv[1],sys.argv[2],sys.argv[9]);
	logger.info(create);
	logger.info("\n");
	
	#execute the command.
	hive_cmd_create = 'beeline -u ip -n username -e "%s"' % (create);
	logger.info(hive_cmd_create);
	logger.info("\n");
	status,output = commands.getstatusoutput(hive_cmd_create);
	logger.info(output);
	logger.info("\n");
	
	#logger.info success or failed information.
	if status ==0:
		logger.info("create database.%s successful!" % (file_name));
	else:
		#set color: '\033[;31;40m'+...+'\033[0m'
		logger.error('\033[;31;40m'+"create database.%s failed!" % (file_name)+'\033[0m');
		#exit program.
		exit();
	logger.info("\n");

9. 计数

	#count table_new command.
	count = "select count(*) from database.%s;" % (file_name);
	logger.info(count);
	logger.info("\n");
	
	#execute the command.
	hive_cmd_count = 'beeline -u ip -n username -e "%s"' % (count);
	logger.info(hive_cmd_count);
	logger.info("\n");
	status,output = commands.getstatusoutput(hive_cmd_count);
	
	#logger.info success or failed information.
	if status ==0:
		logger.info("count database.%s successful!" % (file_name));
	else:
		#set color: '\033[;31;40m'+...+'\033[0m'
		logger.error('\033[;31;40m'+"count database.%s failed!" % (file_name)+'\033[0m');
		#exit program.
		exit();
	logger.info("\n");
	logger.info(output);
	logger.info("\n");

10. 提取数量

	#extract number.
	output_split = output.split("\n");
	number = output_split[7].strip("|").strip();
	result = re.match(r"^\d+$",number);
	if result:
		#logger.info count.
		logger.info("The number matched success!");
		logger.info('\033[1;33;40m'+"The count is : %s" % (number)+'\033[0m');
		logger.info("\n");
	else:
		logger.warning("The number matched failed!");

11. 抽样查看数据的准确性

	#show the first five data.
	first_five_data = "select * from database.%s limit 5;" % (file_name);
	logger.info(first_five_data);
	logger.info("\n");
	
	#execute the command.
	hive_first_five_data = 'beeline -u ip -n username -e "%s"' % (first_five_data);
	logger.info(hive_first_five_data);
	logger.info("\n");
	status,output = commands.getstatusoutput(hive_first_five_data);
	
	#logger.info success or failed information.
	if status == 0:
		logger.info("show the first five data of database.%s successful!" % (file_name));
	else:
		#set color: '\033[;31;40m'+...+'\033[0m'
		logger.error('\033[;31;40m'+"show the first five data of database.%s failed!" % (file_name)+'\033[0m');
		#exit program.
		exit();
	logger.info("\n");
	
	#logger.info the first five data.
	logger.info('\033[1;33;40m'+"the first five data are : \n\n%s" % (output)+'\033[0m');
	logger.info("\n");

12. 记录相关信息到文件

	#append to record.txt.
	output = open("%s%s" % (record_path,record_name),'a');
	if add_conditions == "and 1=1":
		output.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" % ('database_name','table_name','code','extract_date','count','rel_tb_name','rel_field_name_pre','rel_field_name_after','date_pre','date_after'));
		output.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" % (sys.argv[1],sys.argv[2],sys.argv[4],sys.argv[5],number,sys.argv[6],sys.argv[7],sys.argv[8],sys.argv[10],sys.argv[11]));
	else:
		output.write("%s\t%s\t%s\t%s\t%s\t%s\n" % ('database_name','table_name','code','extract_date','count','add_conditions'));
		output.write("%s\t%s\t%s\t%s\t%s\t%s\n" % (sys.argv[1],sys.argv[2],sys.argv[4],sys.argv[5],number,sys.argv[9]));
	output.close();
	
	#logger.info the data extraction success information.
	logger.info('\033[1;35;40m'+"*****Data extract success!*****"+'\033[0m');
	logger.info('\033[1;35;40m'+"*****Made by sgqhappy in %s!*****" % (sys.argv[5])+'\033[0m');
	logger.info("\n");

13. 执行python脚本

python data.py database table 0 1000 20181101 rel_table rel_field rel_field "and 1=1" 20180731 20180801
2016-12-07 19:37:18 xx5595480 阅读数 1156
  • Python-数据

    Python数据库编程视频教程,数据库是MySQL,讲解Python链接MySQL数据库,并对数据库进行增删改查操作。

    10804 人正在学习 去看看 尹成

      最近在为公司写一个脚本,主要目的是抓取已有的一个excel表数据,按一定格式另存在其他表上,主要这个格式很麻烦,其次数据太多,最多的一个表有四十多万数据,加起来一百多万。

作者:学要
链接:https://www.zhihu.com/question/24651024/answer/134760930
来源:知乎
著作权归作者所有,转载请联系作者获得授权。

#coding:utf-8
#使用的库
import csv,os,sys
import xlwt,xlrd
from xlrd import open_workbook
from xlutils.copy import copy
#为数组获得空间

Azimuth=[None]*1000000
Elevation=[None]*1000000
qiangdu=[None]*1000000
chushishuju='data_103717.xlsx'

def cn0():
	data = xlrd.open_workbook(chushishuju)
	
	table = data.sheets()[0]
	nrows = table.nrows
	ncols = table.ncols 
	#rows=table.row_values(3)
	#print nrows
#	print table.cell(1,1).value
	c=1
	for a in range (0,nrows-1):
	#for b in range (1,ncols-1):
		try:
			row=table.cell(a,4).value
				#print a
		#	row = row.replace('\r','').replace('\n','').replace('\t','')
			qiangdu[c]=int(row)
			c+=1
				#print c
				#print Azimuth[a]
		except:
			continue

	table = data.sheets()[1]
	nrows = table.nrows
	ncols = table.ncols 
#	c+=1
	#print c
	for a in range (0,nrows-1):
	#for b in range (1,ncols-1):
		try:
			row=table.cell(a,4).value
			#print row
		#	print c
		#	row = row.replace('\r','').replace('\n','').replace('\t','')
			qiangdu[c]=int(row)
			c+=1
			#print qiangdu[c]
		except:
			continue
			
def hengzhou():
	data = xlrd.open_workbook(chushishuju)
	
	table = data.sheets()[0]
	nrows = table.nrows
	ncols = table.ncols 
	#rows=table.row_values(3)
	#print nrows
#	print table.cell(1,1).value
	c=1
	for a in range (0,nrows-1):
	#for b in range (1,ncols-1):
		try:
			row=table.cell(a,3).value
			#print row
		#	row = row.replace('\r','').replace('\n','').replace('\t','')
			Azimuth[c]=int(row)
			c+=1
			#print c
			#print Azimuth[a]
		except:
			continue

	table = data.sheets()[1]
	nrows = table.nrows
	#print nrows
	ncols = table.ncols 
#	c+=1
	#print c
	for a in range (0,nrows-1):
	#for b in range (1,ncols-1):
		try:
			row=table.cell(a,3).value
			#print row,a
		#	row = row.replace('\r','').replace('\n','').replace('\t','')
			Azimuth[c]=int(row)
		#	print Azimuth[c]
		#	print c
			c+=1
			#print c 
			#print Azimuth[c]

		except:
			continue

	#print c
			
def zongzhou():
	data = xlrd.open_workbook(chushishuju)
	
	table = data.sheets()[0]
	nrows = table.nrows
	ncols = table.ncols 
	#rows=table.row_values(3)
	#print nrows
#	print table.cell(1,1).value
	c=1
	for a in range (0,nrows-1):
	#for b in range (1,ncols-1):
		try:
			row=table.cell(a,2).value
				#print a
		#	row = row.replace('\r','').replace('\n','').replace('\t','')
			Elevation[c]=int(row)
			c+=1
		#	print c
		#	print Azimuth[a]
		except:
			continue

	table = data.sheets()[1]
	nrows = table.nrows
	ncols = table.ncols 
	#c+=1
	for a in range (0,nrows-1):
	#for b in range (1,ncols-1):
		try:
			row=table.cell(a,2).value
				#print a
		#	row = row.replace('\r','').replace('\n','').replace('\t','')
			Elevation[c]=int(row)
			c+=1

		except:
			continue
			
if __name__=="__main__":
	hengzhou()
	zongzhou()
	cn0()
	outfile='test1.xls'
	print Elevation[10000],Azimuth[10000],qiangdu[10000]
	for a in range (1,1000000):
		#print 	qiangdu[a],Azimuth[a],Elevation[a]	
		
		#print c
		filename = xlwt.Workbook ()
			
		sheet = filename.add_sheet('name11')
		rb = open_workbook(outfile)
		wb = copy(rb)
		sheet = wb.get_sheet(0)
		#			sheet.write((19-(int((Azimuth[a])/5))),2,Azimuth[a])
					#print (19-(int((Azimuth[a])5)))
		if Azimuth[a]>179:
			sheet.write((18-(Elevation[a]/5)),((Azimuth[a])/5-35),qiangdu[a])
		#sheet.write((19-(Elevation[a]/5)),((Azimuth[a])/5-35),qiangdu[a])
			wb.save(outfile)
		else:
			sheet.write((18-(Elevation[a]/5)),((Azimuth[a])/5+38),qiangdu[a])
			wb.save(outfile)
			#print qiangdu[a]


2016-11-16 11:07:23 isinstance 阅读数 827
  • Python-数据

    Python数据库编程视频教程,数据库是MySQL,讲解Python链接MySQL数据库,并对数据库进行增删改查操作。

    10804 人正在学习 去看看 尹成

使用Python脚本俩导入数据主要困难点在于数据的识别和数据的过滤

我们主要用到了python的re模块来对数据进行分析和整理过滤,然后就是逻辑结构的处理,下面贴出github的地址

Github地址

欢迎同学们提出跟好的算法

2018-08-26 20:51:02 lbllol365 阅读数 1490
  • Python-数据

    Python数据库编程视频教程,数据库是MySQL,讲解Python链接MySQL数据库,并对数据库进行增删改查操作。

    10804 人正在学习 去看看 尹成

我在数学建模比赛编写Python脚本处理数据时遇到的坑

1.在使用matplotlib画图时,横轴纵轴标签中文显示为方块。

解决方式:

from pylab import *

mpl.rcParams['font.sans-serif'] = ['SimHei']
mpl.rcParams['axes.unicode_minus'] = False
mpl.rcParams['font.size'] = 10 #调整字体大小

2.pandas中按照行遍历Dataframe

import pandas as pd
a = [1,2,3,4,5]
b = {"a" : a}
c = pd.Dataframe(b)
for indexs in c.index:
    data = c.loc[indexs].values[:] #取出一行数据
        for data2 in data:
            print(data) #遍历取出这一行中的数据

3.Matplotlib

在保存图像时,savefig函数一定要在show函数之前。否则,保存的是一个新生成的空白图像。

4.创建指定值填充的list

a = [0 for x in range(17)] #由17个0填充的数组

5.在从Excel读取看似是数字的数据放在if里进行比较前,一定要用type函数检查数据的类型!!

‘0’ != 0

没有更多推荐了,返回首页