精华内容
下载资源
问答
  • 2021-04-03 13:00:25

    目录

    1.用concat方法合并csv

    2.glob模块批量合并csv


    1.用concat方法合并csv

    将两个相同的csv文件进行数据合并,通过pandas的read_csv和to_csv来完成,即采用concat方法:

    #加载第三方库
    import pandas as pd
    import numpy as np
    #读取文件
    df1 = pd.read_csv("文件-1.csv")
    df2 = pd.read_csv("文件-2.csv")
    #合并
    df = pd.concat([df1,df2])
    df.drop_duplicates()  #数据去重
    #保存合并后的文件
    df.to_csv('文件.csv',encoding = 'utf-8')

    也可以增加一列标签,以区别两个合并后的数据:

    #加载第三方库
    import pandas as pd
    import numpy as np
    #读取文件
    df1 = pd.read_csv("文件-1.csv")
    df1["来自文件"] = "文件-1"
    df2 = pd.read_csv("文件-2.csv")
    df2["来自文件"] = "文件-2"
    #合并
    df = pd.concat([df1,df2])
    df.drop_duplicates()  #数据去重
    #保存合并后的文件
    df.to_csv('文件.csv',encoding = 'utf-8')

    2.glob模块批量合并csv

    在利用合并少量文件时,可以使用上面的concat方法。但是遇到大量的相同文件需要合并,此时应该进行批量合并,这可以减少工作量,提高操作效率。

    利用Python批量合并csv,这里介绍使用的方法是引入glob模块。

    glob模块是最简单的模块之一,内容少,它可以查找符合特定规则的文件路径名。

    通过glob方法遍历所有文件,读取数据并追加保存到文件中。

    import numpy as np
    import pandas as pd
    import glob
    import re
    
    csv_list = glob.glob('*.csv')
    print('共发现%s个CSV文件'% len(csv_list))
    print('正在处理............')
    for i in csv_list:
        fr = open(i,'r',encoding='utf-8').read()
        with open('文件合集.csv','a',encoding='utf-8') as f:
            f.write(fr)
    print('合并完毕!')
    

    以上方法是合并csv文件,要合并excel文件同理。

    更多相关内容
  • 合并csv文件脚本

    2018-06-14 15:10:27
    合并具有相同表头的csv格式文件文件,需要合并的文件必须放在同一目录下
  • python批量合并csv文件到一个excel
  • 此简单功能可构建一个UI,以合并csv文件并重命名其列。
  • Pandas合并csv文件

    千次阅读 2022-01-29 10:54:54
    Pandas合并多个CSV文件

    场景:

    项目中遇到多个csv文件交付比较麻烦,用pands.merge进行合并成一个csv文件


    代码:

    代码如下

    #-*-coding:utf-8-*-
    
    import pandas as pd
    import numpy as np
    
    # 读取csv文件 
    a = pd.read_table("a.csv",sep="\s+")
    b = pd.read_table("b.csv",sep='\s+')
    c = pd.read_table('c.csv',sep='\s+')
    d = pd.read_table("d.csv",sep='\s+')
    e = pd.read_table("e.csv",sep='\s+')
    f = pd.read_table("f.csv",sep='\s+')
    
    # 合并csv文件
    t = pd.merge(a, b, left_on='topic', right_on='topic', how='outer', suffixes=['_a','_b'])
    t = pd.merge(t, c,left_on='topic', right_on='topic',how='outer',suffixes=['','_c'])
    t = pd.merge(t, d,left_on='topic', right_on='topic',how='outer',suffixes=['','_d'])
    t = pd.merge(t, e,left_on='topic', right_on='topic', how='outer',suffixes=['','_e'])
    t = pd.merge(t, f, left_on='topic',right_on='topic', how='outer', suffixes=['','_f'])
    t.to_csv('需求2.csv')
    
    

    希望可以帮到大家

    展开全文
  • pandas读取csv文件,并进行csv文件合并处理: # -*- coding:utf-8 -*- import csv as csv import numpy as np # ------------- # csv读取表格数据 # ------------- ''' csv_file_object = csv.reader(c
  • I have hundreds of large CSV files that I would like to merge into one. However, not all CSV files contain all columns. Therefore, I need to merge files based on column name, not column position.Just ...

    I have hundreds of large CSV files that I would like to merge into one. However, not all CSV files contain all columns. Therefore, I need to merge files based on column name, not column position.

    Just to be clear: in the merged CSV, values should be empty for a cell coming from a line which did not have the column of that cell.

    I cannot use the pandas module, because it makes me run out of memory.

    Is there a module that can do that, or some easy code?

    解决方案

    The csv.DictReader and csv.DictWriter classes should work well (see Python docs). Something like this:

    import csv

    inputs = ["in1.csv", "in2.csv"] # etc

    # First determine the field names from the top line of each input file

    # Comment 1 below

    fieldnames = []

    for filename in inputs:

    with open(filename, "r", newline="") as f_in:

    reader = csv.reader(f_in)

    headers = next(reader)

    for h in headers:

    if h not in fieldnames:

    fieldnames.append(h)

    # Then copy the data

    with open("out.csv", "w", newline="") as f_out: # Comment 2 below

    writer = csv.DictWriter(f_out, fieldnames=fieldnames)

    for filename in inputs:

    with open(filename, "r", newline="") as f_in:

    reader = csv.DictReader(f_in) # Uses the field names in this file

    for line in reader:

    # Comment 3 below

    writer.writerow(line)

    Comments from above:

    You need to specify all the possible field names in advance to DictWriter, so you need to loop through all your CSV files twice: once to find all the headers, and once to read the data. There is no better solution, because all the headers need to be known before DictWriter can write the first line. This part would be more efficient using sets instead of lists (the in operator on a list is comparatively slow), but it won't make much difference for a few hundred headers. Sets would also lose the deterministic ordering of a list - your columns would come out in a different order each time you ran the code.

    The above code is for Python 3, where weird things happen in the CSV module without newline="". Remove this for Python 2.

    At this point, line is a dict with the field names as keys, and the column data as values. You can specify what to do with blank or unknown values in the DictReader and DictWriter constructors.

    This method should not run out of memory, because it never has the whole file loaded at once.

    展开全文
  • My first csv file looks like this with header included(header is included only at the top not after every entry):NAME,SURNAME,AGEFred,Krueger,Unknown.... n recordsMy second file might look like this:N...

    My first csv file looks like this with header included(header is included only at the top not after every entry):

    NAME,SURNAME,AGE

    Fred,Krueger,Unknown

    .... n records

    My second file might look like this:

    NAME,MIDDLENAME,SURNAME,AGE

    Jason,Noname,Scarry,16

    .... n records with this header template

    The merged file should look like this :

    NAME,SURNAME,AGE,MIDDLENAME

    Fred,Krueger,Unknown,

    Jason,Scarry,16,Noname

    ....

    Basically if headers don't match, all new header titles(columns) should be added after original header and their values according to that order.

    UPDATE:

    Above CSV were made smaller so I can illustrate what I want to achieve, in reality CSV files are generated one step before this(merge) and can be up to 100 columns

    Does anyone have any idea how can I do this? I'd appreciate any help

    解决方案

    I'd create a model for the 'bigger' format (a simple class with four fields and a collection for instances of this class) and implemented two parsers, one for the first, one for the second model. Create records for all rows of both csv files and implement a writer to output the csv in the correct format. IN brief:

    public void convert(File output, File...input) {

    List records = new ArrayList();

    for (File file:input) {

    if (input.isThreeColumnFormat()) {

    records.addAll(ThreeColumnFormatParser.parse(file));

    } else {

    records.addAll(FourColumnFormatParser.parse(file));

    }

    }

    CsvWriter.write(output, records);

    }

    From your comment I see, that you a lot of different csv formats with some common columns.

    You could define the model for any row in the various csv files like this:

    public class Record {

    Object id; // some sort of unique identifier

    Map values; // all key/values of a single row

    public Record(Object id) {this.id=id;}

    public void put(String key, String value){

    values.put(key, value);

    }

    public void get(String key) {

    values.get(key);

    }

    }

    For parsing any file you would first read the header and add the column headers to a global keystore (will be needed later on for outputting), then create records for all rows, like:

    //...

    List records = new ArrayList()

    for (File file:getAllFiles()) {

    List keys = getColumnsHeaders(file);

    KeyStore.addAll(keys); // the store is a Set

    for (String line:file.getLines()) {

    String[] values = line.split(DELIMITER);

    Record record = new Record(file.getName()+i); // as an example for id

    for (int i = 0; i < values.length; i++) {

    record.put(keys.get(i), values[i]);

    }

    records.add(record);

    }

    }

    // ...

    Now the keystore has all used column header names and we can iterate over the collection of all records, get all values for all keys (and get null if the file for this record didn't use the key), assemble the csv lines and write everything to a new file.

    展开全文
  • python 批量合并csv

    2021-07-07 16:10:51
    1.当csv数量在10以下,每个csv量很小时: ... 遍历并合并文件夹里的文件 :param path: 文件夹路径 :param col_name: 列名 :param file_type: 文件类型 :return: """ data = pd.DataFrame()
  • VBA合并csv文件

    2021-04-09 09:08:47
    Sub 合并当前目录下所有工作簿的全部工作表() Dim MyPath, MyName, AWbName Dim Wb As Workbook, WbN As String Dim G As Long Dim Num As Long Dim BOX As String Application.ScreenUpdating = False MyPath = ...
  • 读取与合并CSV文件

    2016-01-13 15:18:57
    一个合并CSV的代码,通俗易懂,可以自定义合并内容与文档
  • 合并多个CSV文件,且不重复csv的表头 import pandas as pd import glob def merge_csv(all_csv): result_csv = pd.concat(all_csv) result_csv.to_csv(result_csv_path,index=0,sep=',') if __name__ == '__main...
  • 批量合并CSV文件,保留唯一的表头,软件已封装完毕,拥有较好的操作界面和视图,可以选择需要合并的文件夹和保存路径。
  • 合并csv练习

    2021-02-24 07:13:30
    R演习展示了数据清理和合并技术,该技术需要进行调整以调整许多数据集中的数据,使其统一以便进行合并,以进行分析。
  • 整理文件涉及需求的比较多,这里分享的是将本地多个 CSV 文件整合成一个文
  • csv文件合并.bat

    2020-09-02 10:54:40
    将多个CSV文件合并成一个文件 CSV全称Comma-Separated Values,它是一种通用的,简单的,被广泛采用的一种表格数据格式。采用纯文本格式存储,用分隔符号分开。它格式类似于数据库表格,每一行,中间用分隔符分开,...
  • 最近尝试使用PYTHON处理CSV数据,由于CSV文件有好几个,需要提前拼接,然后再处理,因此遇到了PYTHON对文件进行合并的问题。此次尝试了两种方法:一是调用CMD命令处理;二是使用PYTHON写程序处理,经过尝试,觉得第...
  • 在Python中使用不同的列合并csv

    千次阅读 2020-12-20 08:35:56
    csv.DictReader和csv.DictWriter类应该工作良好(请参见Python docs)。像这样的:import csvinputs = ["in1.csv", "in2.csv"] # etc# First determine the field names from the top line of each input file# ...
  • 合并CSV文件

    2019-12-05 20:28:14
    文章目录第一种情况:合并列名一致的csv文件第二种情况:合并列名不同的csv文件 第一种情况:合并列名一致的csv文件 # @Purpose: # @Parameter: # @Time: 18:08 # @Author: Emma import pandas as pd import numpy...
  • 合并csv文件.py

    2022-01-22 11:45:37
    合并csv文件.py
  • linux 合并 csv

    千次阅读 2019-05-10 10:22:01
    cat *.csv > full.csv
  • 使用python合并csv文件

    万次阅读 2019-07-09 20:51:39
    python合并两个csv文件 pandas提供concat函数对两个或多个csv文件进行合并。 1.行合并 f1 = pd.read_csv('../data/train_acute.csv') f2 = pd.read_csv('../data/train_health.csv') file = [f1,f2] train = pd....
  • python合并csv、excel等文件 首先:导入所需的库import pandas as pd 其次:输入以下代码 #创建一个输出文件 writer = pd.ExcelWriter('E:/Test/test.xlsx') data = pd.read_table('E:/Test/test1.csv',sep=',',...
  • 当遇到很多excel需要进行合并的时候,可以使用此软件,相同的表头会汇总在一个sheet,不用的表头会新增一个sheet
  • 假设文件夹下有n个csv文件,需要将其合并后存入一个新的csv文件之中。 如果要批量读取csv文件,只需对下面的代码稍加修改。
  • python合并csv文件

    2021-02-03 18:45:16
    学一点总结一点,积少成多^_^需求:有两个csv文件,需要按列合并。举例如下:a.csv:column1 column2 column3a1 a21 a22 a31 a32b1 b21 b2 b3 b32b.csv:column1 column2 column3a1 ...
  • 将n个csv文件按照文件名合并成n列,简单易上手,只需修改文件路径。
  • os.listdir() # 读取第一个CSV文件并包含表头 df = pd.read_csv(Folder_Path + '\\' + file_list[0], encoding='gb2312') # 编码格式为gb2312,若乱码自行更改 # 将读取的第一个CSV文件写入合并后的文件保存 df.to_...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 40,762
精华内容 16,304
关键字:

合并csv