精华内容
下载资源
问答
  • sqoop2
    千次阅读
    2022-03-10 15:04:21

    1.sqoop2-shell执行shell脚本

    vim /opt/temp/sqoop2_shell.sh

    #!/bin/bash
    #环境变量
    source /opt/temp/bigdata_env
    #安全认证
    kinit -kt /opt/temp/user.keytab userName
    
    /opt/sqoop2/sqoop2-shell /opt/temp/sqoop2_shell_job.sh
    
    

    vim /opt/temp/sqoop2_shell_job.sh

    #show create connection 参数
    create connection -c 1 --help
    #创建connection 
    create connection -c 1 -name oracle-connection --connector-connection-connectionString jdbc:oracle:thin:@192.168.0.1:1521:schemaName  --connector-connection-jdbcDriver oracle.jdbc.driver.OracleDriver  --connector-connection-username root  --connector-connection-password root123
    #删除connection 
    #delete connection --name oracle-connection
    
    #show create job 参数
    create job -t import -xn oracle_test --help
    #创建job
    create job -t import -xn oracle_test --connector-table-sql select * from schemaName.tableName where \${CONDITIONS} --connector-table-needPartition false --fields-terminated-by '\\\t' --framework-output-outputDirectory /user/hive/warehouse/ods.db/tableName --framework-output-storageType HDFS --framework-throttling-extractors 3 --framework-output-fileType TEXT_FILE --queue root.default --name oracle_job_tableName
    #删除job
    #delete job --name oracle_job_tableName
    
    #启动job
    start job -n oracle_test -s
    
    

    或者单独执行某一条命令 -c

    #show create connection 参数
    /opt/sqoop2/sqoop2-shell  -c "create connection -c 1 --help"
    
    #创建connection 
    /opt/sqoop2/sqoop2-shell  -c "create connection -c 1 -name oracle-connection --connector-connection-connectionString jdbc:oracle:thin:@192.168.0.1:1521:schemaName  --connector-connection-jdbcDriver oracle.jdbc.driver.OracleDriver  --connector-connection-username root  --connector-connection-password root123"
    
    #删除connection 
    #/opt/sqoop2/sqoop2-shell  -c "delete connection --name oracle-connection"
    
    

    2.loader-shell执行job

    将1创建的job当做模板,批量串行执行table,所有参数值都可以动态传入shell脚本
    vim loader-shell.sh

    #!/bin/bash
    #环境变量
    source /opt/temp/bigdata_env
    #安全认证
    kinit -kt /opt/temp/user.keytab userName
    
    /opt/sqoop2/shell-client/submit_job.sh -n "oracle_job_tableName" -u y -jobType import -connectorType rdb -sql "select * from schemaName.tableName where ${CONDITIONS}" -frameworkType hdfs -extractors 3 -outputDirectory "/user/hive/warehouse/ods.db/tableName"
    
    
    更多相关内容
  • 使用sqoop2从mysql导入hadoop时,导入的数据带单引号。是由于sqoop2的一个bug导致,该包上传到sqoop_home/server/lib替换原来的jar包后,重启sqoop server,即可解决。
  • sqoop2安装文档

    2019-01-16 14:16:47
    sqoop2安装文档,这个很全面,每一步都有,希望对你有所帮助
  • sqoop2的安装包

    2018-12-18 14:39:15
    sqoop2安装包。
  • 很好用,sqoop2,希望能帮助到你,Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具
  • sqoop2-1.99.7 documentation 英文文档
  • sqoop2 java API从oracle导数据到HDFS开发总结 有了这个文档,会为你节省很多时间,一些我踩坑趟路的时间;但是不代表你什么都不知道的前提就能把demo运行起来 。 建议操作顺序:sqoop官方环境搭建文档看一遍,然后...
  • Sqoop 有关完整文档,请参阅 它是什么 该项目为 sqoop 用户提供了两个功能,包括: 扫描器 工人 Scanner将通过扫描所需的源文件将原始类型的配置文件传输到即用型配置文件。 Util 现在,我们只支持csv格式的源文件...
  • 除了代码还有总结内容,代码主要实现了从sqoop1.99.7从oracle导出数据到HDFS某目录;总结内容是一些踩坑的总结,还有一些坑的说明在代码注释里;灵活利用,可以帮助你跑通oracle到hdfs的过程。
  • Sqoop 是一个开源工具,主要用于在Hadoop和传统的数据库(Mysql, Oracle,等)进行数据传递,可以将一个关系型数据库中...版本以上 sqoop1不支持,必须使用sqoop2,接下来总结一下sqoop2的部署和填坑 一、下载sqoop包...

    Sqoop 是一个开源工具,主要用于在Hadoop和传统的数据库(Mysql, Oracle,等)进行数据传递,可以将一个关系型数据库中的数据导入Hadoop 的HDFS中,也可以将HDFS中的数据导出到关系型数据库中。之前总结了sqoop1的部署安装,接下来总结一下sqoop2的部署和填坑

    一、下载sqoop包

       https://downloads.apache.org/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz

     

    二、解压sqoop包并重命名

       tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz

       mv sqoop-1.99.7-bin-hadoop200 sqoop2

    三、配置环境变量:vim /etc/profile

      sqoop需要hadoop的环境变量配置:$HADOOP_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_MAPRED_HOME,HADOOP_YARN_HOME

    export HADOOP_HOME=/usr/local/hadoop-3.1.2
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common
    export HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs
    export HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce
    export HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    
    export HIVE_HOME=/usr/local/hive-3.1.2
    export PIG_HOME=/usr/local/pig-0.17.0
    export PIG_CLASSPATH=$HADOOP_HOME/conf
    export SQOOP_HOME=/usr/local/sqoop2
    export SQOOP_SERVER_EXTRA_LIB=/usr/local/sqoop2/extlib/
    PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$PIG_HOME/bin:$SQOOP_HOME/bin:

      备注:也可以直接把变量定义在sqoop.sh文件中

    HADOOP_HOME=/usr/local/hadoop-3.1.2/
    HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common
    HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs
    HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce
    HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn

    四、hadoop的core-site.xml和container-executor.cfg配置

    <!-- core-site.xml配置中追加代理用户 -->
    <property>
      <name>hadoop.proxyuser.sqoop2.hosts</name>
      <value>*</value>
    </property>
    <property>
      <name>hadoop.proxyuser.sqoop2.groups</name>
      <value>*</value>
    </property>
    
    
    <!-- container-executor.cfg 配置中 -->
    allowed.system.users=sqoop2  #如果有多用 sqoop2,hadoop,root

    五、修改sqoop配置文件 sqoop目录下conf/sqoop.properties

    1、@LOGDIR@ 修改为 /usr/local/sqoop2/logs,并在sqoop2目录下创建logs文件夹

    2、@BASEDIR@修改为/usr/local/sqoop2 

    六、添加sqoop2外部引入的jar包

    新建目录mkdir /usr/local/sqoop2/extlib/ 把mysql驱动包和derby.jar包存放于该目录下;

    开始挖坑了,按照官方文件 添加环境变量 export SQOOP_SERVER_EXTRA_LIB=/usr/local/sqoop2/extlib/

    然后source /etc/profile 让文件失效;以上官方部署完成了,然后sqoop2-tool upgrade,直接异常。

       第一没有配置org.apache.sqoop.submission.engine.mapreduce.configuration.directory配置文件目录

           修改配置org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop-3.1.2/etc/hadoop/

      第二次异常找不derby.jar到驱动包

    2020-04-13 10:53:44,825 ERROR [org.apache.sqoop.tools.tool.UpgradeTool.runToolWithConfiguration(UpgradeTool.java:55)] Can't finish upgrading RepositoryManager, Driver and ConnectionManager:
    org.apache.sqoop.common.SqoopException: JDBCREPO_0007:Unable to lease link
    	at org.apache.sqoop.repository.JdbcRepositoryTransaction.begin(JdbcRepositoryTransaction.java:63)
    	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:86)
    	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:62)
    	at org.apache.sqoop.repository.JdbcRepository.createOrUpgradeRepository(JdbcRepository.java:128)
    	at org.apache.sqoop.repository.RepositoryManager.initialize(RepositoryManager.java:128)
    	at org.apache.sqoop.tools.tool.UpgradeTool.runToolWithConfiguration(UpgradeTool.java:39)
    	at org.apache.sqoop.tools.ConfiguredTool.runTool(ConfiguredTool.java:35)
    	at org.apache.sqoop.tools.ToolRunner.main(ToolRunner.java:72)
    Caused by: java.sql.SQLException: No suitable driver found for jdbc:derby:/usr/local/sqoop2/repository/db;create=true
    	at java.sql.DriverManager.getConnection(DriverManager.java:689)
    	at java.sql.DriverManager.getConnection(DriverManager.java:208)
    	at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:78)
    	at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
    	at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)
    	at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
    	at org.apache.sqoop.repository.JdbcRepositoryTransaction.begin(JdbcRepositoryTransaction.java:61)

    修改sqoop配置文件,强行指定derby.jar包

    修改sqoop.properties属性 指定加载jar目录 org.apache.sqoop.connector.external.loadpath=/usr/local/sqoop2/extlib/

    具体配置如下

    
    org.apache.sqoop.log4j.debug=false
    org.apache.sqoop.log4j.rootLogger=INFO, file
    org.apache.sqoop.log4j.category.org.apache.sqoop=INFO
    org.apache.sqoop.log4j.appender.file=org.apache.log4j.RollingFileAppender
    org.apache.sqoop.log4j.appender.file.File=/usr/local/sqoop2/logs/sqoop.log
    org.apache.sqoop.log4j.appender.file.MaxFileSize=25MB
    org.apache.sqoop.log4j.appender.file.MaxBackupIndex=5
    org.apache.sqoop.log4j.appender.file.layout=org.apache.log4j.PatternLayout
    org.apache.sqoop.log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %-5p [%l] %m%n
    # Audit logger for default configuration of FileAuditLogger
    org.apache.sqoop.log4j.logger.audit=INFO, audit
    org.apache.sqoop.log4j.appender.audit=org.apache.log4j.RollingFileAppender
    org.apache.sqoop.log4j.appender.audit.File=/usr/local/sqoop2/logs/audit.log
    org.apache.sqoop.log4j.appender.audit.MaxFileSize=25MB
    org.apache.sqoop.log4j.appender.audit.MaxBackupIndex=5
    org.apache.sqoop.log4j.appender.audit.layout=org.apache.log4j.PatternLayout
    org.apache.sqoop.log4j.appender.audit.layout.ConversionPattern=%d{ISO8601} %-5p [%l] %m%n
    
    #
    # Audit Loggers Configuration
    # Multiple audit loggers could be given here. To specify an
    # audit logger, you should at least add org.apache.sqoop.
    # auditlogger.[LoggerName].class. You could also provide
    # more configuration options by using org.apache.sqoop.
    # auditlogger.[LoggerName] prefix, then all these options
    # are parsed to the logger class.
    #
    org.apache.sqoop.auditlogger.default.class=org.apache.sqoop.audit.FileAuditLogger
    org.apache.sqoop.auditlogger.default.logger=audit
    
    #
    # Repository configuration
    # The Repository subsystem provides the special prefix which
    # is "org.apache.sqoop.repository.sysprop". Any property that
    # is specified with this prefix is parsed out and set as a
    # system property. For example, if the built in Derby repository
    # is being used, the sysprop prefixed properties can be used
    # to affect Derby configuration at startup time by setting
    # the appropriate system properties.
    #
    
    # Repository provider
    org.apache.sqoop.repository.provider=org.apache.sqoop.repository.JdbcRepositoryProvider
    
    # Repository upgrade
    # If set to true, it will not upgrade the sqoop respository schema, by default it will iniate the upgrade on server start-up
    org.apache.sqoop.repository.schema.immutable=false
    
    # JDBC repository provider configuration
    #org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler
    #org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
    #org.apache.sqoop.repository.jdbc.maximum.connections=10
    #org.apache.sqoop.repository.jdbc.url=jdbc:mysql://192.168.15.50:3306/sqoop?createDatabaseIfNotExist=true
    #org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver
    #org.apache.sqoop.repository.jdbc.user=kfyw
    #org.apache.sqoop.repository.jdbc.password=123456
    
    org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
    org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
    org.apache.sqoop.repository.jdbc.maximum.connections=10
    org.apache.sqoop.repository.jdbc.url=jdbc:derby:/usr/local/sqoop2/repository/db;create=true
    org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
    org.apache.sqoop.repository.jdbc.user=sa
    org.apache.sqoop.repository.jdbc.password=
    
    # System properties for embedded Derby configuration
    org.apache.sqoop.repository.sysprop.derby.stream.error.file=/usr/local/sqoop2/logs/derbyrepo.log
    
    #
    # Sqoop Connector configuration
    # If set to true will initiate Connectors config upgrade during server startup
    #
    org.apache.sqoop.connector.autoupgrade=false
    
    #
    # Sqoop Driver configuration
    # If set to true will initiate the Driver config upgrade during server startup
    #
    org.apache.sqoop.driver.autoupgrade=false
    
    # Sleeping period for reloading configuration file (once a minute)
    org.apache.sqoop.core.configuration.provider.properties.sleep=60000
    
    #
    # Submission engine configuration
    #
    
    # Submission engine class
    org.apache.sqoop.submission.engine=org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine
    
    # Number of milliseconds, submissions created before this limit will be removed, default is one day
    #org.apache.sqoop.submission.purge.threshold=
    
    # Number of milliseconds for purge thread to sleep, by default one day
    #org.apache.sqoop.submission.purge.sleep=
    
    # Number of milliseconds for update thread to sleep, by default 5 minutes
    #org.apache.sqoop.submission.update.sleep=
    
    #
    # Configuration for Mapreduce submission engine (applicable if it's configured)
    #
    
    # Hadoop configuration directory
    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=//usr/local/hadoop-3.1.2/etc/hadoop/
    
    # Log level for Sqoop Mapper/Reducer
    org.apache.sqoop.submission.engine.mapreduce.configuration.loglevel=INFO
    
    #
    # Execution engine configuration
    #
    org.apache.sqoop.execution.engine=org.apache.sqoop.execution.mapreduce.MapreduceExecutionEngine
    
    #
    # Authentication configuration
    #
    org.apache.sqoop.security.authentication.type=SIMPLE
    org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
    org.apache.sqoop.security.authentication.anonymous=true
    #org.apache.sqoop.security.authentication.type=KERBEROS
    #org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.KerberosAuthenticationHandler
    #org.apache.sqoop.security.authentication.kerberos.principal=sqoop/_HOST@NOVALOCAL
    #org.apache.sqoop.security.authentication.kerberos.keytab=/home/kerberos/sqoop.keytab
    #org.apache.sqoop.security.authentication.kerberos.http.principal=HTTP/_HOST@NOVALOCAL
    #org.apache.sqoop.security.authentication.kerberos.http.keytab=/home/kerberos/sqoop.keytab
    #org.apache.sqoop.security.authentication.enable.doAs=true
    #org.apache.sqoop.security.authentication.proxyuser.#USER#.users=*
    #org.apache.sqoop.security.authentication.proxyuser.#USER#.groups=*
    #org.apache.sqoop.security.authentication.proxyuser.#USER#.hosts=*
    
    # Default user, default value is "sqoop.anonymous.user"
    #org.apache.sqoop.security.authentication.default.user=
    
    #
    # Authorization configuration
    #
    #org.apache.sqoop.security.authorization.handler=org.apache.sqoop.security.authorization.DefaultAuthorizationHandler
    #org.apache.sqoop.security.authorization.access_controller=org.apache.sqoop.security.authorization.DefaultAuthorizationAccessController
    #org.apache.sqoop.security.authorization.validator=org.apache.sqoop.security.authorization.DefaultAuthorizationValidator
    #org.apache.sqoop.security.authorization.authentication_provider=org.apache.sqoop.security.authorization.DefaultAuthenticationProvider
    #org.apache.sqoop.security.authorization.server_name=SqoopServer1
    
    #
    # SSL/TLS configuration
    #
    #org.apache.sqoop.security.tls.enabled=false
    #org.apache.sqoop.security.tls.protocol="TLSv1.2"
    #org.apache.sqoop.security.tls.keystore=
    #org.apache.sqoop.security.tls.keystore_password=
    
    #
    # Repository Encryption
    #
    
    #org.apache.sqoop.security.repo_encryption.enabled=true
    #org.apache.sqoop.security.repo_encryption.password=
    #org.apache.sqoop.security.repo_encryption.password_generator=
    #org.apache.sqoop.security.repo_encryption.hmac_algorithm=HmacSHA256
    #org.apache.sqoop.security.repo_encryption.cipher_algorithm=AES
    #org.apache.sqoop.security.repo_encryption.cipher_key_size=16
    #org.apache.sqoop.security.repo_encryption.cipher_spec=AES/CBC/PKCS5Padding
    #org.apache.sqoop.security.repo_encryption.initialization_vector_size=16
    #org.apache.sqoop.security.repo_encryption.pbkdf2_algorithm=PBKDF2WithHmacSHA1
    #org.apache.sqoop.security.repo_encryption.pbkdf2_rounds=4000
    
    
    # External connectors load path
    # "/path/to/external/connectors/": Add all the connector JARs in the specified folder
    #
    org.apache.sqoop.connector.external.loadpath=/usr/local/sqoop2/extlib/
    
    # Sqoop application classpath
    # ":" separated list of jars to be included in sqoop.
    #
    org.apache.sqoop.classpath.extra=
    
    # Sqoop extra classpath to be included with all jobs
    # ":" separated list of jars to be included in map job classpath.
    #
    org.apache.sqoop.classpath.job=
    
    #
    # Jetty Server configuration
    #
    #org.apache.sqoop.jetty.thread.pool.worker.max=500
    #org.apache.sqoop.jetty.thread.pool.worker.min=5
    #org.apache.sqoop.jetty.thread.pool.worker.alive.time=60
    #org.apache.sqoop.jetty.port=12000
    
    # Blacklisted Connectors
    # ":" separated list of connector names as specified in their
    # sqoopconnector.properties file
    org.apache.sqoop.connector.blacklist=

    配置完后,重新sqoop2-tool upgrade 和sqoop2-tool verify 成功了

    Verification was successful.
    Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly

    七、补充部分,如果你不想使用derby而改用mysql,可以修改配置如下

    org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler
    org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
    org.apache.sqoop.repository.jdbc.maximum.connections=10
    org.apache.sqoop.repository.jdbc.url=jdbc:mysql://192.168.15.50:3306/sqoop?createDatabaseIfNotExist=true
    org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver
    org.apache.sqoop.repository.jdbc.user=kfyw
    org.apache.sqoop.repository.jdbc.password=123456
    
    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=//usr/local/hadoop-3.1.2/etc/hadoop/
    org.apache.sqoop.connector.external.loadpath=/usr/local/sqoop2/extlib/

     当然这么配置好了,sqoop2-tool upgrade 和sqoop2-tool verify 时也会报错

    2020-04-13 11:15:22,916 INFO  [org.apache.sqoop.repository.mysql.MySqlRepositoryHandler.initialize(MySqlRepositoryHandler.java:68)] MySqlRepositoryHandler initialized.
    2020-04-13 11:15:22,918 INFO  [org.apache.sqoop.repository.JdbcRepositoryProvider.initializeRepositoryHandler(JdbcRepositoryProvider.java:160)] JdbcRepositoryProvider initialized
    2020-04-13 11:15:22,918 INFO  [org.apache.sqoop.repository.JdbcRepositoryProvider.initialize(JdbcRepositoryProvider.java:67)] JdbcRepository initialized.
    2020-04-13 11:15:22,918 INFO  [org.apache.sqoop.repository.RepositoryManager.initialize(RepositoryManager.java:127)] Creating or updating respository at bootup
    2020-04-13 11:15:23,230 INFO  [org.apache.sqoop.repository.JdbcRepository$1.doIt(JdbcRepository.java:131)] Creating repository schema objects
    2020-04-13 11:15:23,238 INFO  [org.apache.sqoop.repository.mysql.MySqlRepositoryHandler.createOrUpgradeRepository(MySqlRepositoryHandler.java:122)] Detected repository version: 0
    2020-04-13 11:15:23,259 ERROR [org.apache.sqoop.repository.common.CommonRepositoryHandler.runQuery(CommonRepositoryHandler.java:2685)] Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP"
    com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"SQOOP"' at line 1
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    	at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
    	at com.mysql.jdbc.Util.getInstance(Util.java:383)
    	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
    	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
    	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
    	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
    	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
    	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2826)
    	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2082)
    	at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1302)
    	at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    	at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    	at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    	at org.apache.sqoop.repository.common.CommonRepositoryHandler.runQuery(CommonRepositoryHandler.java:2672)
    	at org.apache.sqoop.repository.mysql.MySqlRepositoryHandler.createOrUpgradeRepository(MySqlRepositoryHandler.java:129)
    	at org.apache.sqoop.repository.JdbcRepository$1.doIt(JdbcRepository.java:132)
    	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:91)
    	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:62)
    	at org.apache.sqoop.repository.JdbcRepository.createOrUpgradeRepository(JdbcRepository.java:128)
    	at org.apache.sqoop.repository.RepositoryManager.initialize(RepositoryManager.java:128)
    	at org.apache.sqoop.tools.tool.UpgradeTool.runToolWithConfiguration(UpgradeTool.java:39)
    	at org.apache.sqoop.tools.ConfiguredTool.runTool(ConfiguredTool.java:35)
    	at org.apache.sqoop.tools.ToolRunner.main(ToolRunner.java:72)
    2020-04-13 11:15:23,263 INFO  [org.apache.sqoop.repository.JdbcRepositoryTransaction.close(JdbcRepositoryTransaction.java:111)] Attempting transaction commit
    2020-04-13 11:15:23,265 ERROR [org.apache.sqoop.tools.tool.UpgradeTool.runToolWithConfiguration(UpgradeTool.java:55)] Can't finish upgrading RepositoryManager, Driver and ConnectionManager:
    org.apache.sqoop.common.SqoopException: COMMON_0000:Unable to run specified query - CREATE DATABASE IF NOT EXISTS"SQOOP"
    	at org.apache.sqoop.repository.common.CommonRepositoryHandler.runQuery(CommonRepositoryHandler.java:2686)
    	at org.apache.sqoop.repository.mysql.MySqlRepositoryHandler.createOrUpgradeRepository(MySqlRepositoryHandler.java:129)
    	at org.apache.sqoop.repository.JdbcRepository$1.doIt(JdbcRepository.java:132)
    	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:91)
    	at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:62)
    	at org.apache.sqoop.repository.JdbcRepository.createOrUpgradeRepository(JdbcRepository.java:128)
    	at org.apache.sqoop.repository.RepositoryManager.initialize(RepositoryManager.java:128)
    	at org.apache.sqoop.tools.tool.UpgradeTool.runToolWithConfiguration(UpgradeTool.java:39)
    	at org.apache.sqoop.tools.ConfiguredTool.runTool(ConfiguredTool.java:35)
    	at org.apache.sqoop.tools.ToolRunner.main(ToolRunner.java:72)
    Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"SQOOP"' at line 1
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    	at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
    	at com.mysql.jdbc.Util.getInstance(Util.java:383)
    	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
    	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)
    	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)
    	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
    	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
    	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2826)
    	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2082)
    	at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1302)
    	at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    	at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    	at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
    	at org.apache.sqoop.repository.common.CommonRepositoryHandler.runQuery(CommonRepositoryHandler.java:2672)
    	... 9 more
    2020-04-13 11:15:23,266 INFO  [org.

    解决办法:在mysql中设置全局变量:set global sql_mode ='ANSI_QUOTES';或者修改mysql配置文件sql_mode=ANSI_QUOTES

     

    第二部分:sqoop操作说明

    [hadoop@hadoop-01 sbin]$ sqoop.sh client 登录客户端 
    Setting conf dir: /usr/local/sqoop2/bin/../conf
    Sqoop home directory: /usr/local/sqoop2
    Sqoop Shell: Type 'help' or '\h' for help.
    
    sqoop:000> set server --host 127.0.0.1 --port 12000 --webapp sqoop  #连接服务器端
    Server is set successfully
    sqoop:000> set option --name verbose --value true   #设置提示报错
    Verbose option was changed to true
    sqoop:000> show connector    #查看连接器
    0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    +------------------------+---------+------------------------------------------------------------+----------------------+
    |          Name          | Version |                           Class                            | Supported Directions |
    +------------------------+---------+------------------------------------------------------------+----------------------+
    | generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
    | kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
    | oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
    | ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
    | hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
    | kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
    | sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
    +------------------------+---------+------------------------------------------------------------+----------------------+
    sqoop:000> show link  #查看link
    +----------------+------------------------+---------+
    |      Name      |     Connector Name     | Enabled |
    +----------------+------------------------+---------+
    | sx_hadoop      | generic-jdbc-connector | true    |
    | sx_hadoop_hdfs | hdfs-connector         | true    |
    +----------------+------------------------+---------+
    sqoop:000> show job  #查看JOB
    +----+----------------+------------------------------------+---------------------------------+---------+
    | Id |      Name      |           From Connector           |          To Connector           | Enabled |
    +----+----------------+------------------------------------+---------------------------------+---------+
    | 3  | clean_user_job | sx_hadoop (generic-jdbc-connector) | sx_hadoop_hdfs (hdfs-connector) | true    |
    +----+----------------+------------------------------------+---------------------------------+---------+

    1、创建 mysql连接器

       sqoop:000>create link -connector generic-jdbc-connector

      

    展开全文
  • 介绍Sqoop是一个开源的ETL软件。通过Sqoop可以在关系数据库(如Oracle、MySQL)与HDFS之间导入和导出数据。Sqoop通过Hadoop的MapReduce将关系数据库中的数据导入到HDFS中。Sqoop也支持从HDFS数据导出到关系数据库。...

    介绍

    Sqoop是一个开源的ETL软件。通过Sqoop可以在关系数据库(如Oracle、MySQL)与HDFS之间导入和导出数据。

    Sqoop通过Hadoop的MapReduce将关系数据库中的数据导入到HDFS中。Sqoop也支持从HDFS数据导出到关系数据库。

    关系数据库方面:Sqoop支持可通过JDBC驱动访问的各种常见数据库。比如Oracle、MySQL、Microsoft SQL Server和PostgreSQL等。

    Hadoop方面:最新版本的Sqoop仅支持导入到HDFS中,不支持直接导入到Hive和HBase中(详见后文Sqoop 1和Sqoop 2的对比)。

    Sqoop1和Sqoop2对比

    Sqoop1是指Sqoop 1.4.x系列版本,此系列版本Sqoop为单机模式部署形态。

    Sqoop2是指Sqoop 1.99.x系列版本,此系列版本Sqoop的C/S模式部署形态。

    对比1:支持的关系数据库类型

    差别不大,常见的关系型数据库二者都可以支持。

    对比2:集成Kerberos认证

    Sqoop1支持集成Kerberos认证,Sqoop2的1.99.3版本还不支持(Sqoop 1.99.6版本已经支持)。

    对比3:关系数据库与Hive、HBase之间的数据互导

    Sqoop1支持,Sqoop2不支持。在Sqoop2场景下,可以通过先将数据导入/导出到HDFS中转。HDFSHive/HBase可以使用Hive的LOAD DATA等命令完成。

    Sqoop2架构

    架构图

    我画的架构图:

    7f28ab3d63d5b2974bc2f0ddda9560cb.png

    网上的架构图:

    f72ab2e43fc947348ebfd193502c0bb2.png

    架构描述

    Sqoop Server提供REST接口,Sqoop Client的内部实现实际上通过REST接口访问Sqoop Server;

    Sqoop提供Java API,可通过此API连接Server;

    Sqoop Server启动后会监听两个端口,分别是12000和12001。其中12000用于提供REST接口,12001说是用于管理,具体使用方法未知。

    Sqoop不足

    Sqoop Server不支持基于时间的任务调度。Sqoop Server支持以批处理脚本的方式执行命令,可以基于批处理脚本结合crontab等时间调度软件,实现任务的周期性调度;

    Sqoop只支持从关系型数据库中导数据,只支持导出结构化数据,不支持导非结构化数据;

    作为ETL软件,Sqoop有提供Extract和Load的功能,但Transformation的功能基本没有提供。比如无法对抽取的数据作为格式变换;

    Sqoop的metadata保存在Derby数据库中,目前暂不支持使用MySQL存储Metadata;

    展开全文
  • 其中sqoop1和sqoop2区别可以参考以下链接 重拾初心——Sqoop1和Sqoop2的刨析对比http://blog.csdn.net/gamer_gyt/article/details/55225700 安装 安装步骤参考官方文档...

    下载

    https://306t.com/file/24704272-456283295

    下载后得到:sqoop-1.99.7-bin-hadoop200.tar.gz文件

    其中sqoop1和sqoop2区别可以参考以下链接 
    重拾初心——Sqoop1和Sqoop2的刨析对比 
    http://blog.csdn.net/gamer_gyt/article/details/55225700

    安装

    安装步骤参考官方文档 
    http://sqoop.apache.org/docs/1.99.7/admin/Installation.html

    解压tar包

    将sqoop-1.99.7-bin-hadoop200.tar.gz上传至服务器中,这里安装到/usr/local/hadoop/sqoop-1.99.7目录中

    # 拷贝tar包到/usr/local/hadoop/中
    cp sqoop-1.99.7-bin-hadoop200.tar.gz /usr/local/hadoop/
    # 解压tar包
    tar -xvf sqoop-1.99.7-bin-hadoop200.tar.gz
    # 为文件夹重命名
    mv sqoop-1.99.7-bin-hadoop200 sqoop-1.99.7

    配置sqoop2

    1. 确保sqoop2所依赖的环境变量 
      sqoop2需要hadoop的依赖(HADOOP_COMMON, HADOOP_HDFS, HADOOP_MAPRED, HADOOP_YARN),所以你的环境变量中需要存在如下的环境变量
    export HADOOP_HOME=/..

    或者

    export HADOOP_COMMON_HOME=/...
    export HADOOP_HDFS_HOME=/...
    export HADOOP_MAPRED_HOME=/...
    export HADOOP_YARN_HOME=/...

    如果配置了HADOOP_HOME,则sqoop会自动寻找$HADOOP_HOME/share/common相应的目录。

    1. 配置sqoop2的环境变量 
      在/etc/profire文件中增加如下配置,注意SQOOP2_HOME的位置
    #Sqoop配置
    export SQOOP2_HOME=/usr/local/hadoop/sqoop-1.99.7
    export CATALINA_BASE=$SQOOP2_HOME/server
    #Sqoop外部jar包存放路径,以后mysql-**.jar,odbc*.jar文件都放到extra-lib中即可
    export SQOOP_SERVER_EXTRA_LIB=$SQOOP2_HOME/server/lib/extra-lib
    export PATH=$PATH:$SQOOP2_HOME/bin
    1. sqoop访问hdfs的配置 

           因为sqoop访问Hadoop的MapReduce使用的是代理的方式,必须在Hadoop的core-site.xml中配置所接受的proxy用户和组,在hadoop的core-site.xml中增加如下配置,其中sqoop2是代理的用户和组的意思。表示运行hadoop的系统用户名

    <property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
    </property>
    <property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
    </property>

    配置sqoop访问的hadoop配置文件的位置 
    在/usr/local/hadoop/sqoop-1.99.7/conf/sqoop.properties文件中,修改如下变量的值为你hadoop配置文件目录

    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/hadoop-2.7.4/etc/hadoop

     

    初始化sqoop仓库

    因为sqoop2中存储着元数据信息,所以首次安装时需要初始化仓库

    sqoop2-tool upgrade

    初始化完成后,可以做一次验证

    sqoop2-tool verify

    当出现Verification was successful.字样时,表示初始化成功。

    启动sqoop

    执行sqoop2-server start启动sqoop,同理。执行sqoop2-server stop会停止sqoop

    sqoop2-server start

    执行完以上命令后,启动信息如下。 

    然后执行jps命令查看启动了哪些进程,可以看到sqoop server只启动了SqoopJettyServer进程。 

    启动sqoop客户端

    由于我现在是伪分布式,所以sqoop server和sqoop client都在一台机器上,直接执行sqoop2-shell命令即可启动sqoop客户端 
     
    可以查看sqoop2的版本,执行show version --all命令

    sqoop:000> show version --all
    client version:
    Sqoop1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
    Compiledby abefine on TueJul1916:08:27 PDT 2016
    0[main] WARN org.apache.hadoop.util.NativeCodeLoader-Unable to load native-hadoop library for your platform...using builtin-java classes where applicable
    server version:
    Sqoop1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
    Compiledby abefine on TueJul1916:08:27 PDT 2016
    API versions:
    [v1]
    sqoop:000>

     

    展开全文
  • 2、导出、迁出 导入数据:MySQL,Oracle 导入数据到 Hadoop 的 HDFS、HIVE、HBASE 等数据存储系统 导出数据:从 Hadoop 的文件系统中导出数据到关系数据库 mysql 等 Sqoop 的本质还是一个命...
  • oozie-sqoop2

    2021-05-01 04:00:53
    oozie-sqoop2 可以作为oozie的java action中调用,调用sqoop2的api进行sqoop2的管理,properties文件记录了配置
  • sqoop2安装与配置以及常见问题

    千次阅读 2019-04-04 23:41:40
    一、sqoop2的安装: 安装sqoop2前,首先安装配置好Hadoop,sqoop2是在Hadoop的基础上运行的。 1.sqoop2的版本:sqoop-1.99.7-bin-hadoop200.tar.gz 解压缩即可: tar -zxvfsqoop-1.99.7-bin-hadoop200.tar.gz 2...
  • sqoop2从mysql导入数据到hbase

    千次阅读 2018-08-15 10:49:34
    sqoop2从mysql导入数据到hbase 标签(空格分隔): sqoop2 hbase hadoop mysql 数据迁移 一、基础环境 hadoop-2.6.0 sqoop-1.99.6 hbase-1.0.1.1 二、数据迁移实际操作步骤 1、mysql中表结构显示 ...
  • Sqoop2 安装配置与测试 下载 安装配置 1. 解压 2. 环境变量配置 2.1 编辑`/etc/profile` 2.2 编辑`/sqoop/conf/sqoop.properties` 2.3 修改Hadoop的`yarn-site.xml` 2.4 修改`core-site.xml`并重启Hadoop 2.5 需要...
  • Sqoop1和Sqoop2的刨析对比

    万次阅读 2017-02-18 13:56:05
    Sqoop是一款开源的工具,主要用于在Hadoop和传统的数据库(MySQL、postgresql等)进行数据的传递,可以将一个关系型数据库(例如:MySQL、Oracle、Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进...
  • sqoop简介及sqoop1与sqoop2区别

    万次阅读 2018-05-08 16:19:28
    Sqoopsqoop.apache.org)工具是hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入...
  • sqoop2安装

    2018-09-20 15:10:39
    # Decompress Sqoop distribution tarball tar -xvf sqoop-&lt;version&gt;-bin-hadoop&lt;hadoop-version&gt;.tar.gz # Move decompressed content to any location mv sqoop-&lt;version&...
  • # Decompress Sqoop distribution tarball tar -xvf sqoop-&amp;lt;version&amp;gt;-bin-hadoop&amp;lt;hadoop-version&amp;gt;.tar.gz # Move decompressed content to any location mv sqoop-&...
  • Sqoop2安装

    2018-06-08 23:41:36
    一、简介 sqoop是一个用于结构化数据系统(比如关系型数据库系统RDBMS)和Hadoop系统间的大量数据传输的工具,特别是Hadoop所使用的HDFS... Sqoop版本又分Sqoop1和Sqoop2,其中Sqoop1目前最高释出版本为1.4.6,Sqo...
  • Sqoop2中Connectors开发方法

    千次阅读 2016-10-30 13:29:25
    Sqoop是Hadoop生态圈中的ETL抽取工具,可以从关系型数据库抽取数据至HDFS、HBase、Hive中,其内在机制利用了MapReduce进行多节点并行抽取,可以有效地提升抽取速度。 1. Sqoop抽取原理 Sqoop抽取的核心思想是对...
  • sqoop2用法介绍

    2018-11-27 11:52:48
    sqoop通俗讲就是支持将mysql,oracle等关系型数据库数据导入到hdfs中的工具。 用的是mapreduce这种计算模式,只要在一台服务器上安装,整个服务器集群都能开启服务。 例子1: 项目需求:mysql——>hdfs ——>...
  • Sqoop1和Sqoop2简介

    2021-05-13 13:16:19
    http://sqoop.apache.org/ --- 官网...
  • sqoop2的shell下的使用及其详解(sqoop-1.99.7) 创建link和job,运行job
  • 最近在研究如何使用sqoop2来做数据etl。sqoop2采用了mapreduce机制,可以让多个节点同时抽取数据,这样可以提高效率。理念不错,但是这个软件由于是开源产品,做得并不是很完善,我在使用它连接oracle数据库的时候,...
  • Sqoop2 Hue 使用

    千次阅读 2018-01-11 15:02:22
    如果使用 CDH parcel 安装的Sqoop2 的话,JDBC 的驱动包应该放在 /var/lib/sqoop2 下面, 新建 Oracle Link , JDBC Driver Class oracle.jdbc.driver.OracleDriver JDBC Connection String 如下格式: jdbc:...
  • CDH中安装使用sqoop2

    千次阅读 2018-05-09 10:45:15
    由于我们的Hadoop、Hive等集群都是通过CDH安装部署的,而且CDH本身支持sqoop安装,因此直接就在这里安装测试了...Add Service在Service列表中找到sqoop:这里我们可以看到有Sqoop 1 Client和Sqoop2,并且Sqoop2的描...
  • Sqoop1 与 Sqoop2

    千次阅读 2017-02-08 14:47:40
    是一款方便的在RDBMS数据库(如Oracle,MySQL,Postgresql等)与Hadoop之间进行数据迁移的工具,它充分利用MapReduce并行特点以批处理的方式加快数据传输,目前主要包括两个版本:Sqoop1和Sqoop2。Sqoop是Hadoop下...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 46,363
精华内容 18,545
关键字:

sqoop2

友情链接: ModSimPy3.rar