使用 Sqoop 进行数据交换

SQOOP = “ SQl to HadOOP”

一、开始使用 Sqoop

1.1 Sqoop 介绍

为什么要用 Sqoop ?

导入（import）—— 将数据从 RDBMS 转移到 HDFS

导出（export）—— 将数据从 HDFS 转移到 RDBMS

我们虽然进入了大数据时代, 但是之前还有好多数据存储在 RDBMS 中, 我们如何对其进行数据迁移 ?

这时 Sqoop 出现了, 就像用勺子将哈根达斯挖到甜筒中一样

Sqoop 将 RDBMS 中的数据 import 到 HDFS 等大数据数据库中

而将 HDFS 等数据库中的数据 export 到 RDBMS 中

至于我们为什么要用到第二种 ?

有的小公司没有大数据技术, 但是还是想使用我们的数据, 怎么办 ?

我们就需要使用 Sqoop 来将我们存储在 HDFS 中的数据给到他们的 RDBMS 中

1.2 Sqoop 安装

现在默认打开了虚拟机, 并且一切都配置好了

1	$ mysql -u root -p

二、Sqoop 工具

展示所有数据库

$ sqoop list-databases \
--connect jdbc:mysql://127.0.0.1:3306 \
--username root \
--password 123456 \
–-verbose

$ sqoop list-databases \	# 展示所有数据库

--connect jdbc:mysql://127.0.0.1:3306 \	# 每个虚拟机都有 localhost, 这里可以写 localhost 也可以写127.0.0.1
# 大数据指令中指令 - 和 -- 效力是一样的, 都是作为指令使用	
# 不过后来意识到用作参数时用 -- 用作连字符时用 - , 这样方便区分

--username root \

--password 123456 \

–-verbose	# 让密码在传输过程中不会明文传输, 而是加密传输, 虽然在本地机器上能看见密码,但是只是使用虚拟机的本人能看见

# 自己写指令, 不要复制

# MapReduce 是用来处理表格数据的, 不是用来处理数据库

在命令行询问密码

$ sqoop list-databases \
--connect jdbc:mysql://127.0.0.1:3306 \
--username root –P \	
-verbose

$ sqoop list-databases \

--connect jdbc:mysql://127.0.0.1:3306 \

--username root –P \	
# -P 表示不用在指令中写出密码, 而是直接让 Linux 在执行指令后询问密码, 而且使用遮罩功能, 在屏幕上隐藏密码
# 虽然都这样了但还是有风险, 因为在你旁边的人会看见你输入的密码

-verbose

将密码保存到密码文件中, 用的时候再调用

记得，文件里只能有一行，而且直接是密码，不要有多余的东西

我有一次试了好久没成功发现文件里多了个空行。。。

1	$ echo -n 'Root123!' > /tools/.mysql.password

$ echo -n 'Root123!' > /tools/.mysql.password

# echo 表示 print
# -n 表示不换行
# > 表示将密码写入哪里, 后面接路径
# 这个 .password 其实就是一个普通的存放文本的文件, 只不过后缀名是 .password, 就算是 .txt 也无所谓
# 这个 .password 文件可以存储多个密码, 但是下次调用的时候就要用 cat 或者过滤器找密码了
# 一般都是一个 .password 文件存放一个密码, 这样每次找着方便

通过密码文件来寻找密码（这样就不用输入了，这就是为啥 Linux 非常安全）

$ sqoop list-databases \
--connect jdbc:mysql://127.0.0.1:3306  \
--username root \
--password-file file:///tools/.mysql.password

$ sqoop list-databases \

--connect jdbc:mysql://127.0.0.1:3306  \

--username root \

--password-file file:///tools/.mysql.password
# 这里的前两个 / 是 file: 后的格式, 第三个 / 表示 /tools 路径

顺带一提

1	$ mysql -uroot -p123456 # 这样就是不会再询问你密码, 直接全都写在指令里了

三、Sqoop import 工具

3.1 Importing data from MySQL to HDFS

将数据从 RDBMS import 到 HDFS

3.1.1 表中没有主键

先登录 MySQL

1	$ mysql -u root -p

创建数据库, 表并插入数据

create database niit;
use niit;
create table author(author_name varchar(65), total_no_of_articles int, phone_no int, address varchar(65));
insert into author values(“santy”,10,123456789,”Gwalior”);

启动 Hadoop

1	$ start-all.sh

启动 Hive, 另开一个窗口

$ hive

在 Hive 中创建数据库, 不用创建表

1 2	create database niitbd3; use niitbd3;

在 Hadoop 上运行下列命令 ( Sqoop 指令), 另开一个窗口

$ sqoop import --connect \
jdbc:mysql://127.0.0.1:3306/niit \
 --username root --password 123456 \
 --table author \
 --hive-import --hive-table niitbd3.author_hive \
 --m 1

$ sqoop import --connect \

jdbc:mysql://127.0.0.1:3306/niit \	# MySQL 中的数据库名

 --username root --password 123456 \	# MySQL 用户和密码
 
 --table author \	# 要 import 的表
 
 --hive-import --hive-table niitbd3.author_hive \	
 # 要 import 到的 hive 中的数据库名.要 import 到的 hive 中的表名
 # 这个表名因为我们没在 hive 中创建，所以可以自定义
 
 --m 1	# 创建多少个 mapper (MapReduce, 这里是 m 1, 所以创建了 1 个)

在 Hive 中查看表, 成功 import

1	show tables;

3.1.2 表中有主键

先登录 MySQL

1	$ mysql -u root -p

创建数据库, 表并插入数据

create database niit;
use niit;
create table author_test(id int primary key, name varchar(64), age int, major varchar(64));
insert into author_test values(1, “santy”, 10, ”CS”);
insert into author_test values(2, “Andrew”, 10, ”CS”);

启动 Hadoop

1	$ start-all.sh

启动 Hive, 另开一个窗口

$ hive

在 Hive 中创建数据库, 不用创建表

1 2	create database niitbd3; use niitbd3;

在 Hadoop 上运行下列命令 ( Sqoop 指令), 另开一个窗口

$ sqoop import --connect \
jdbc:mysql://127.0.0.1:3306/niit \
 --username root --password 123456 \
--table author_test \
--split-by  id \
--hive-import --create-hive-table \
--hive-table niitbd3.author_test

$ sqoop import --connect \

jdbc:mysql://127.0.0.1:3306/niit \	# MySQL 中的数据库名

 --username root --password 123456 \	# MySQL 用户和密码
 
--table author_test \	# 要 import 的 MySQL 中的表

--split-by  id \	# id 是主键

--hive-import --create-hive-table \

--hive-table niitbd3.author_test	# 要 import 到的 hive 中的数据库名.要 import 到的 hive 中的表名

在 Hive 中查看表, 成功 import

1	show tables;

Sqoop 执行成功

Hive 中检查成功

3.1.3 注意事项

--m 1 与 --split-by id \ 都是用来调用 MapReduce 的指令, 没有会报错, 不能同时用, 要二选一
--split-by id \ 中的 id 是主键, 要先死后活
--m 1 是用来分配 mapper 的, 有几行 (row) 后面的数字就是几
import 到 Hive 中时，表是自动生成的，我们只需要确定表名即可

3.2 Import data from Windows MySQL

Windows 中的 MySQL import 到 Linux 中的 Hive

（不过都学到这里了操作系统什么的就无所谓了）

3.2.1 开放 Linux 与 Windows 中 MySQL 中的端口连接

首先要安装 MySQL8.0 服务

USE mysql;
CREATE USER 'root'@'Linux_IP' IDENTIFIED BY 'Windows_Mysql_密码' ;
GRANT ALL ON *.* TO 'root'@'Linux_IP';
FLUSH PRIVILEGES;

USE mysql;

CREATE USER 'root'@'Linux_IP' IDENTIFIED BY 'Windows_Mysql_密码' ;	# 创建一个名为 Linux_IP 的 root 用户

GRANT ALL ON *.* TO 'root'@'Linux_IP';	# 授予用户全局权限

FLUSH PRIVILEGES;	# 更新内存权限数组，使全局权限授权指令生效

如果在这里报错 Caused by: java.sql.SQLException: Access denied for user 'root'@'bd' (using password: YES)，请见我的另一篇文章

3.2.2 开始进行 `import`

先在 MySQL 中创建两个带主键的表, 并插入数据（下面两张表主键都是 id）

create database test;

create table stu_coll(
    id, int,
    name varchar(65),
    job varchar(65)
);
insert into stu_coll values(1,"Mobitor",”BD”);
insert into stu_coll values(2,"Student",”Java”);

create table stu_per(
    id, int,
    name varchar(65)
);
insert into stu_per values(1,"Andrew");
insert into stu_per values(2,"Tim");

stu_per

stu_coll

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --query 'select a.id, a.name, b.major, b.job from stu_per a join stu_coll b on a.id = b.id where $CONDITIONS' \
 --split-by a.id \
 --hive-import \
 --hive-table niitbd3.import_join_test \
 --target-dir /test \
 --m 1

$ sqoop import \

 --connect jdbc:mysql://192.168.1.1:3305/test \
# 如果是 Windows 中的 MySQL 的话，主机号的最后一位都要写为 1，我这里就是，我的主机号前 3 位就是 192.168.1
 
 --username root --password 123456 \
 
 --query 'select a.id, a.name, b.major, b.job from stu_per a join stu_coll b on a.id = b.id where $CONDITIONS' \
# --query 参数后面接一个 select 语句, 用单引号包裹起来
# select 语句中的 where $CONDITIONS 是必写的, 不能动, 这是让 sqoop 知道这是一个查询语句
# $CONDITIONS 必须大写, 不然会报错
# --split-by 是 Hive 的分区操作, 通过 a.id 分区, 不能通过 b.id, 因为 select 语句中没有出现 b.id, 如果出现了的话就可以
 
 --split-by a.id \
 
 --hive-import \
 
 --hive-table niitbd3.import_join_test \
 # --hive-table hive数据库.hive表 中的这个表, 是执行这个语句之后自动生成的, 不用管, 命名好就行
 
 --target-dir /test \
 # -target-dir /test 这个 /test 路径是 MapReduce 的路径
 
 --m 1
# -m 1 这个参数就是调用几个 mapper

运行成功截图

Hive中查看

3.3 Import-all-tables command

如果要使用 import-all-tables 指令, 则所有表都要有主键, 不然会报错

将 MySQL 中的所有表 import 到指定路径

1
2
3

$ sqoop import-all-tables \
 --connect jdbc:mysql://127.0.0.1:3306/test \
 --username root --P

将除了指定的几个表以外的其他表都 import 到指定路径

$ sqoop import-all-tables \
 --connect jdbc:mysql://127.0.0.1:3306/test \
 --username root --P \
 --exclude-tables <table1>,<tables2>

$ sqoop import-all-tables \

 --connect jdbc:mysql://127.0.0.1:3306/test \
 
 --username root --P \
 
 --exclude-tables <table1>,<tables2>
 # --exclude-tables 指的是除了哪几张表, 不过一般都是把没有主键的表刨掉不然会报错
 # --exclude-tables 只能运行在 import-all-tables 指令中

没有主键的话会报这个错

3.4 Other Import

3.4.1 Compress Import

压缩导入

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table stu_per \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.BZip2Codec

$ sqoop import \	# 压缩路径为 MapReduce 下的 /user/root

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table stu_per \
 
 --compress \	# 压缩导入标识
 
 --compression-codec org.apache.hadoop.io.compress.BZip2Codec	# 压缩格式

压缩位置

3.4.2 Bulk Import

快速导入

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table stu_per \
 --direct

$ sqoop import \	# 导入的表名会和压缩导入冲突, 要改名或者删除才能继续导入。导入方式不同，引擎不同

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table stu_per \
 
 --direct	# 快速导入标识

3.4.3 Incremental Import

增量导入

我们已经将表导入后，又更新了这个表，还想把这个新表导入

但是再导入之前已经有的那些数据会浪费资源，这时候我们就要用增量导入

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table stu_coll \
 --incremental append \
 --check-column id \
 --last-value 1

$ sqoop import \	# 增量导入不会与快速/压缩导入产生文件名冲突，如果重名了就再创建一个新的同名文件
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table stu_coll \
 --incremental append \
 --check-column id \	# 根据这行的参数判断是否产生增量
 --last-value 1

3.4.4 Custom Boundary Query Import

自定义边界查询导入

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --query 'select a.id, stu_coll.major, stu_coll.job from stu_per a join stu_coll using(id) where $CONDITIONS' \
 --split-by id \
 --target-dir /bd31 \
 --boundary-query 'select min(id), max(id) from stu_per'

$ sqoop import \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --query 'select a.id, stu_coll.major, stu_coll.job from stu_per a join stu_coll using(id) where $CONDITIONS' \	# 先进行一个查询
 
 --split-by id \
 
 --target-dir /bd31 \
 
 --boundary-query 'select min(id), max(id) from stu_per'	#确定边界值（包括边界值）

3.5 File Import

各种类型的文件导入

3.5.1 Sqoop data in text file format

导入为 .txt 文件格式

将 Windows MySQL 中的表以 .txt 文件的格式存储在 HDFS 中（这里用的是 MapReduce）

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \ 
 --query 'select * from ttable WHERE $CONDITIONS' \
 --m 2 \
 --as-textfile \
 --target-dir /bd3/tbl \
 --split-by id \
 --fields-terminated-by ',' \
 --lines-terminated-by ' '

$ sqoop import \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \ 
 
 --query 'select * from ttable WHERE $CONDITIONS' \
 
 --m 2 \
 
 --as-textfile \	# 导入为 .txt 文件的标识
 
 --target-dir /bd3/tbl \	 # MapReduce 的路径
 
 --split-by id \
 
 --fields-terminated-by ',' \	# 用什么字符来做每列的间隔符（这里是逗号）
 
 --lines-terminated-by ' '	# 用什么字符来做每行的间隔符（这里是空格）
 # 不能是空的, 但是可以是空格

1 2	$ hdfs dfs -ls /test/tbl $ hadoop fs -text hdfs_path > local_file.txt

在 Hive 中的建表，必须要与 .txt 文件的类型相同

根据数据选择数据类型

Windows Linux 用 load

在 hdfs 中了已经, 就用 location

1	create external table ttable(id int, name string) row format delimited fields terminated by ',' location '/bd3/tbl';

3.5.2 Sqoop Data in Avro file format

导入为 .avro 文件格式

avrofile

avro 文件包含可用于将数据集拆分为适合 MapReduce 处理的子集的标记

一些数据交换服务使用代码声测好难过器来解释数据定义并生成代码来访问数据, 但avro 不需要这一步, 因此非常适合脚本语言

avro() - 表数据

avsc(Linux) - 表结构

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root -password 123456 \
 --query 'select * from stu_coll where $CONDITIONS' \
 --m 3 \
 --split-by id \
 --as-avrodatafile \
 --outdir /tools/tblA \
 --target-dir /test/tblA

$ sqoop import \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root -password 123456 \
 
 --query 'select * from stu_coll where $CONDITIONS' \
 
 --m 3 \
 
 --split-by id \
 
 --as-avrodatafile \	# 导入为 .avro 文件的标识
 
 --outdir /tools/tblA \	# 输出路径 (Linux 路径)为 /tools/tblA, 这里的 tblA 文件夹是自动生成的
 # 参数是 --outdir，所以路径是文件夹
 
 --target-dir /test/tblA	# MapReduce 中的输出路径

在 HDFS 中检查文件内容

1 2	$ cat /tools/tblA/QueryResult.avsc $ hdfs dfs -cat /test/tblA/part-m-00000.avro

$ cat /tools/tblA/QueryResult.avsc	# avcs = avro schema
# 读取 Linux 中的 avsc 文件


$ hdfs dfs -cat /test/tblA/part-m-00000.avro
# 读取 HDFS 中的 avro 文件

# 执行成功上面的序列化指令后会生成两个文件, 一个是 Linux 中的 avsc 文件, 一个是 HDFS 中的 avro 文件
# 这里的两个指令是用来读取 avsc 这个序列化文件, 查看表的结构
# avro 文件是用来反序列化解密的文件

创建适配 .avro 文件的 Hive 表

create external table tbla 
row format SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
stored as 
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
location '/test/tblA' 
TBLPROPERTIES ('avro.schema.url'='file:///tools/tblA/QueryResult.avsc');

create external table tbla      # 建表, 这里就不用属性分隔符了

row format SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' # 反序列化格式包

stored as 

INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' # 输入格式包

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' # 输出格式包

location '/test/tblA' # HDFS 中的存储路径

TBLPROPERTIES ('avro.schema.url'='file:///tools/tblA/QueryResult.avsc'); # Linux 中的存储路径

3.5.3 Sqoop Data in Parquet File Format

导入为 .parquet 文件格式

parquetfile

它是一种开源的, 面向列的数据文件格式, 专为游戏哦啊的数据存储和检索而设计, 它提供高效的数据压缩和编码方案, 据欧增强

不可以使用 -query, 只能用 -table 表名

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table stu_coll \
 --m 3 \
 --split-by id \
 --as-parquetfile \
 --outdir /tools/tblP \
 --target-dir /test/tblP

$ sqoop import \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table stu_coll \
 
 --m 3 \
 
 --split-by id \
 
 --as-parquetfile \
 
 --outdir /tools/tblP \
 
 --target-dir /test/tblP

1	create external table tblP(id int, age int, city string) row format delimited fields terminated by ',' stored as parquetfile location '/test/tblP';

3.5.4 Sqoop Data in Sequential File format

导入为二进制文件格式

sequence 文件

$ sqoop import \
 --connect jdbc:mysql://localhost:3306/test \
 --username root -password 123456 \
 --table ttable2 \
 --m 3 \
 --split-by id \
 --as-sequencefile \
 --outdir /tools/tblS \
 --target-dir /test/tblS

$ sqoop import \

 --connect jdbc:mysql://localhost:3306/test \
 
 --username root -password 123456 \
 
 --table ttable2 \
 
 --m 3 \
 
 --split-by id \
 
 --as-sequencefile \
 
 --outdir /tools/tblS \
 
 --target-dir /test/tblS

1	create external table tblS(id int, age int, city string) row format delimited fields terminated by ',' stored as sequencefile location '/test/tblS';

3.6 Import subset of RDBMS Table to HDFS

将 RDBMS 表的子集 import 到 HDFS

3.6.1 Import partial columns

导入部分列

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \
 --username root -password dcef1396dnf \
 --table emp \
 --columns name,id,dept \
 --hive-import \
 --hive-table niitbd3.emp \
 --target-dir /sqoopIquery \
 --m 1

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \

 --username root -password dcef1396dnf \
 
 --table emp \
 
 --columns name,id,dept \	# 往 Hive 中 import MySQL 中的部分列, ',' 后面不能有空格
 
 --hive-import \
 
 --hive-table niitbd3.emp \
 # 使用了 -hive-table, 下面的 mapper 文件 import 后会自动删除
 
 --target-dir /sqoopIquery \
 # 数据会被以 TXT 文件写入 MapReduce 中这个路径下的的 mapper 文件, 默认文件类型就是 TXT
 
 --m 1

Sqoop 指令执行成功

成功 import

3.6.2 Import data via custom query statement using –-query option

使用 --query 选项通过自定义查询语句导入数据

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \
 --username root -password 123456 \
 --query 'select a.name, a.salary, b.street from emp a join emp_addr b on a.id = b.id where $CONDITIONS and a.salary >= 30000 ' \
 --hive-import \
 --hive-table niitbd3.empS \
 --target-dir /sqoopIquery \
 --m 1

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \

 --username root -password 123456 \
 
 --query 'select a.name, a.salary, b.street from emp a join emp_addr b on a.id = b.id where $CONDITIONS and a.salary >= 30000 ' \
 # 在 $CONDITION 后面添加 and 约束
 
 --hive-import \
 
 --hive-table niitbd3.empS \
 # 使用了 -hive-table, 下面的 mapper 文件 import 后会自动删除
 
 --target-dir /sqoopIquery \
 
 --m 1

Sqoop 指令执行成功

成功 import

3.6.3 Incremental Import

增量导入, 将表中新增的行进行单独导入

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root -password 123456 \
 --table emp \
 --incremental append \
 --check-column id \
 --last-value 1105 \
 --target-dir /sqoopIquery \
 --m 1

$ sqoop import \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root -password 123456 \
 
 --table emp \
 
 --incremental append \	# append, 表示增量导入
 
 --check-column id \	# 指定基准列
 
 --last-value 1105 \	# 指定最后的数据 (即基准列值为 1105 后的所有数据都会被视为新数据)
 # 没有使用 --hive-table, 下面的 mapper 文件 import 后不会自动删除, 数据就存在 mapper 文件里
 
 --target-dir /sqoopIquery \
 
 --m 1

Sqoop 指令执行成功

mapper 文件不会删除

数据存在 mapper 文件中

3.6.4 Incremental data import using lastmodified mode

使用 lastmodified 模式增量数据导入

lastmodified 模式会指定最后修改时间, 这个时间之后的数据都是新数据

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \
 --username root -password 123456 \
 --table emp \
 --incremental lastmodified \
 --check-column hire_date -last-value '2019-05-02' \
 --target-dir /sqoopIquery \
 --m 1 \
 --append

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \

 --username root -password 123456 \
 
 --table emp \
 
 --incremental lastmodified \	# 使用 lastmodified 模式进行增量导入
 
 --check-column hire_date -last-value '2019-05-02' \
 # --last-value 会设置一个最后修改时间, 在这个时间之后的数据都是新数据
 # 没有使用 --hive-table, 下面的 mapper 文件 import 后不会自动删除, 数据就存在 mapper 文件里
 
 --target-dir /sqoopIquery \
 
 --m 1 \
 
 --append	# 增量导入标识

Sqoop 指令执行成功

查看 mapper 文件

3.6.5 Use –merge-key option to import

使用 --merge-key 选项导入

--merge-key 表示将 MapReduce 中指定路径下的所有相同 key 的 part-m-% 文件合并为一个 part-r-% 文件

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \
 --username root -password 123456 \
 --table emp \
 --incremental lastmodified \
 --check-column hire_date -last-value '2019-05-02' \
 --target-dir /sqoopIquery \
 --m 1 \
 --merge-key id

$ sqoop import -connect jdbc:mysql://192.168.1.1:3305/test \

 --username root -password 123456 \
 
 --table emp \
 
 --incremental lastmodified \
 
 --check-column hire_date -last-value '2019-05-02' \
 
 --target-dir /sqoopIquery \
 
 --m 1 \
 
 --merge-key id	# 指定 key

Sqoop 指令执行成功

mapper 文件合并

内容合并

四、Sqoop export 工具

4.1 Exporting data from HDFS to MySQL

将数据从 HDFS export到 RDBMS

启动 Hadoop

1	$ start-all.sh

启动 Hive

$ hive

进入 Hive 中的数据库

1	use niitbd3;

在 Hive 中创建表

1	create table export_test(name string, age int, city string) row format delimited fields terminated by ',';

在 Hive 中插入数据

1	insert into export_test values(‘Andrew’, 20, ‘HaiKou’);

启动 MySQL, 新开一个窗口

1	$ mysql -u root -p

进入 MySQL 中的数据库

use niit;

在 MySQL 中新建表

1	create table export_test(name varchar(64), age int, city varchar(64));

在 Hadoop 上运行下列命令 ( Sqoop 指令), 另开一个窗口

$ sqoop export \
 --connect jdbc:mysql://127.0.0.1:3306/niit \
 --table export_test \
 --username root --password 123456 \
 --export-dir /user/hive/warehouse/niitbd3.db/export_test \
 --m 1 \
 --driver com.mysql.jdbc.Driver \
 --input-fields-terminated-by ','

$ sqoop export --connect \

jdbc:mysql://127.0.0.1:3306/niit \	# 要 export 到的 MySQL 的数据库名

 --table export_test \
 
 --username root --password 123456 \	# MySQL 用户名和密码
 
 --export-dir /user/hive/warehouse/niitbd3.db/export_test \
 # /user/hive/warehouse/要export的Hive的数据库名.db/要export的Hive的表名
 # /user/hive/warehouse 不能动, 这是 MapReduce 的文件下的目录
 # .db 也是同理, MapReduce 中的数据库文件后缀就是 .db
 
 --m 1 \
 
 --driver com.mysql.jdbc.Driver \	# 驱动器
 
 --input-fields-terminated-by ','	# 分隔符

Sqoop 中执行成功

MySQL 中检查成功

4.2 Export data to Windows MySQL

将 Linux 中 HDFS 的数据转到 Windows 中的 MySQL

先在 MySQL 中键入以下语句

USE mysql; # 这个表是 mysql 自带的, 里面是用户信息之类的, 不用自己创建, 直接 use 进去就好
CREATE USER 'root'@'192.168.1.130' IDENTIFIED BY 'dcef1396dnf' ;
GRANT ALL ON *.* TO 'root'@'192.168.1.130';
FLUSH PRIVILEGES; # 授予全部权限

上面语句的解释

USE mysql;
CREATE USER 'root'@'Linux 的 IP 地址' IDENTIFIED BY 'Windows 中 mysql 的密码' ;
GRANT ALL ON *.* TO 'root'@'Linux 的 IP 地址';
FLUSH PRIVILEGES;

在 Hive 中创建表并插入数据

1 2	use niitbd3; create table export_test (name string, age int, city string);

在 MySQL 中创建表，格式要跟 Hive 中要 export 出来的表一样

1
2
3

create database test;
use test;
create table stu(name varchar(64), age int, city varchar(64));

在 Hadoop 中执行 sqoop 指令

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --table stu \
 --username root --password 123456 \
 --export-dir /user/hive/warehouse/niitbd3.db/export_test \
 --m 1 \
 --driver com.mysql.jdbc.Driver \
 --input-fields-terminated-by ','

$ sqoop export \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 # jdbc:mysql://192.168.1.1:Windows_MySQL_端口号/Windows_MySQL_数据库名
 # 我自己的 MySQL8.0 端口用的是 3305, 3306是 5.7
 # 这个 Windows_MySQL_端口号 必定是 192.168._.1, 最后一个位只能是 1

 --table stu \	#  Windows 中 MySQL 的表名
 
 --username root --password 123456 \	# Windows 中 MySQL 的密码
 
 --export-dir /user/hive/warehouse/niitbd3.db/export_test \
 #  /user/hive/warehouse/Linux_Hive_数据库.db/Linux_Hive_表名
 
 --m 1 \	# 启动几个 MapReduce
 
 --driver com.mysql.jdbc.Driver \	# 驱动器
 
 --input-fields-terminated-by ','	# Hive 中的逗号分隔符

sqoop 执行成功

MySQL中export成功

4.3 Simple Export from HDFS txt to MySQL table

将 HDFS 中的 .txt 文件 export 到 MySQL 中

先来建两个 .txt 文件, 然后 copy 到虚拟机里

cust1.txt

1
2
3

cust001,Sumit Kumar,sumit@hotmail.com,+91-1111122222,Patna,BR
cust002,Debopam Mitra,dev@hotmail.com,+91-9999988888,Siliguri,WB
cust003,Neha Ladia,neha@hotmail.com,+91-8777799999,Siliguri,WB

cust2.txt

cust004,Madhuri Achanala,madhuri@hotmail.com,+91-1234123412,Bengaluru,KA
cust005,Anjana Sharma,anjana@hotmail.com,+91-8989898989,Bengaluru,KA
cust006,Rohit Shankla,rohit@hotmail.com,+91-3232325454,Jaipur,RJ
cust007,Pradeep Patidar,pradeep@hotmail.com,+91-1200120000,Indore,MP

在 MySQL 中建表, 数据类型要与 .txt 文件中的类型相同

create table tbl_export_customers(
    customerid varchar(50),
    name varchar(50),
    email varchar(50),
    phoneno varchar(50),
    city varchar(50),
    state varchar(50),
    Primary Key(customerid)
);

然后在 HDFS 中创建目录

1 2	$ hdfs dfs -mkdir /sqoop $ hdfs dfs -mkdir /sqoop/data

把 Linux 中的 .txt 文件 copy 进 HDFS

1 2	$ hdfs dfs -put /tools/sqoop_export_test/cust1.txt /sqoop/data $ hdfs dfs -put /tools/sqoop_export_test/cust2.txt /sqoop/data

sqoop export 指令

这个指令没有另外两个参数, 但是可能会导致 export 之后的表中的数据属性对应不上

原因是因为 .txt 文件不是二进制文件, 而用来处理的 mapper 文件是二进制文件, 就可能会导致这个错误

$ sqoop-export \
--connect jdbc:mysql://192.168.1.1:3305/test \
--username root --password 123456 \
--table tbl_export_customers \
--export-dir /sqoop/data \
--m 1

加上 --columns 参数, 这样就可以把属性分开, 不会出现上面的错误

$ sqoop-export \
--connect jdbc:mysql://192.168.1.1:3305/test \
--username root --password 123456 \
--table tbl_export_customers \
--export-dir /sqoop/data \
--m 1 \
--columns 'customerid,name,email,phoneno,city,state'

--validate 参数会在执行完指令之后的最后一行返回 data successfully validated, 这样就不用再特意去 MySQL 中验证了

$ sqoop-export \
--connect jdbc:mysql://192.168.1.1:3305/test \
--username root --password 123456 \
--table tbl_export_customers \
--export-dir /sqoop/data \
--m 1 \
--columns 'customerid,name,email,phoneno,city,state' \
-–validate

$ sqoop-export \	# export
--connect jdbc:mysql://192.168.1.1:3305/test \
--username root --password 123456 \
--table tbl_export_customers \
--export-dir /sqoop/data \
# 这里不能写 /sqoop/data/cust1.txt, 因为 --export-dir 接收的是一个 directory, 不是一个 file
# 所以就只能把这个目录下所有的文件全都 export 了, 现在都是默认有主键, 没有主键还搞什么大数据
--m 1 \	# 一个 mapper
--columns 'customerid,name,email,phoneno,city,state' \	# 分隔属性
-–validate	# 验证

运行成功

mysql 中查询

4.4 Export from HDFS txt to MySQL table after updateion in txt file on HDFS On Linux

将 HDFS 中的文件更新后 export 到 MySQL 中

我们不能通过以下指令直接更新 HDFS 中的 .txt 文件, 只能通过更新操作来修改

1	$ hdfs dfs -vi /sqoop/cust1.txt

把 Linux 中的 .txt 文件修改一下

1	vi /tools/sqoop_export_test/cust1.txt

在 HDFS 中创建目录

1	$ hdfs dfs -mkdir /sqoop_updated

将修改后的 .txt 文件 put 进 HDFS 的新建文件夹

1	$ hdfs dfs -put /tools/cust1.txt /sqoop_updated

执行 sqoop 指令

$ sqoop-export \
--connect jdbc:mysql://192.168.1.1:3305/test \
--username root \
--password 123456 \
--table tbl_export_customers \
--export-dir /sqoop_updated \
--update-key customerid \
--m 1 \
--update-mode alowinsert \
--columns 'customerid,name,email,phoneno,city,state'

$ sqoop-export \

--connect jdbc:mysql://192.168.1.1:3305/test \

--username root \

--password 123456 \

--table tbl_export_customers \	# 要更新 MySQL 中的哪个表

--export-dir /sqoop_updated \	# 文件所在的 HDFS 目录

--update-key customerid \	# 确定更新主键, 这样就只会把新的主键的数据更新进来了, 不会碰原来主键的数据

--m 1 \

--update-mode alowinsert \	# 告诉 MySQL 同意插入

--columns 'customerid,name,email,phoneno,city,state'	# 分列

执行成功

更新完成

4.5 Exporting data Call Mode On Mysql

通过调用 (过程) 模式进行 export

先将这两个 jar 包放入这个路径

两个 jar 包

在 MySQL 中建两个表

1
2
3

create table user(id int NOT NULL, name VARCHAR(20) NOT NULL,PRIMARY KEY(id));

CREATE TABLE goods (id int NOT NULL,name VARCHAR(50) NOT NULL,price int,PRIMARY KEY(id));

在 MySQL 中新建一个过程

这个过程就相当于一个函数, 想干啥直接调用函数就行了

我还纳闷用 sqoop 指令怎么执行这么复杂的东西, 原来如此

delimiter $$
CREATE PROCEDURE insert_tables (
  IN c1 INT(11),
  IN c2 VARCHAR(20),
  IN c3 INT(11),
  IN c4 VARCHAR(50),
  IN c5 INT(11)
)
BEGIN
  INSERT INTO user(id, name) VALUES(c1, c2)
    ON DUPLICATE KEY UPDATE name=VALUES(name);
  INSERT INTO goods(id, name,price) VALUES(c3, c4, c5)
    ON DUPLICATE KEY UPDATE name=VALUES(name),price=VALUES(price);
END$$
delimiter ;

可以在这个位置找到过程

建个 order.txt 文件

10001|Tom|1023|ITEM23|23
10002|Jerry|1023|ITEM23|23
10002|Jerry|1024|ITEM24|24
10003|Spike|1023|ITEM23|23
10003|Spike|1024|ITEM24|24
10003|Spike|1025|ITEM25|25

然后 copy 进 HDFS

1	$ hdfs dfs -put /tools/sqoop_export_test/order.txt /sqoop

执行 sqoop 指令

1
2
3

$ sqoop-export --connect jdbc:mysql://192.168.1.1:3305/test --username root --password dcef1396dnf --call insert_tables --export-dir /sqoop --fields-terminated-by '|'

$ sqoop-export --connect jdbc:mysql://192.168.1.1:3305/test --username root --password dcef1396dnf --call insert_tables --export-dir /sqoop --fields-terminated-by '|'

4.6 Other Export

4.6.1 Exporting files under HDFS directory to a table

导出 HDFS 目录下的文件到表中

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table cities \
 --export-dir cities

$ sqoop export \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table cities \
 
 --export-dir cities

4.6.2 Batch Inserts Export

批量插入导出

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table cities \
 --export-dir cities \
 --batch

$ sqoop export \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table cities \
 
 --export-dir cities \
 
 --batch

4.6.3 Updating existing Data set

更新已有数据集

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table cities \
 --update-key id

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table cities \
 
 --update-key id

4.6.4 Upsert Export

翻转导出

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table cities \
 --update-key id \
 --update-mode allowinsert

$ sqoop export \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table cities \
 
 --update-key id \
 
 --update-mode allowinsert

4.6.5 Column Export

按列导出

$ sqoop export \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password 123456 \
 --table cities \
 --columns country,city

$ sqoop export \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password 123456 \
 
 --table cities \
 
 --columns country,city

五、其他 Sqoop 工具

5.1 Sqoop Job Tool

5.1.1 Create Job using the eval tool

如何创建 job

$ sqoop job \
 --create myjob \
 -- import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --table actor \
 --target-dir /sqoopjob1

$ sqoop job \
 # 声明这是 job 工具

 --create myjob \
 # 创建一个名为 myjob 的job 工具
 # 下面几行全是这个 myjob 的内容
 
 # job 内容是将这个 actor 表导入到 MapReduce 中，路径为 /sqoopjob1
 -- import \ # 注意这里，-- 后面与 import 中间有一个空格
 
 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password-file file:///tools/.mysql_password.password \
 
 --table actor \
 
 --target-dir /sqoopjob1
 # 记住，没有执行路径的话默认路径是 /user/root

运行成功

5.1.2 list all saved jobs

展示所有 job

1	$ sqoop job --list

展示所有 job

5.1.3 inspect the configuration of a job with the show action

使用 --show 参数展示某个 job 的配置

1	$ sqoop job --show myjob

这里如果没有用密码文件的话在配置里就不会显示密码

而且每次执行这个 job 的时候都会询问你密码

所有配置

5.1.4 run the job with –exec

使用 --exec 参数执行某个 job

exec 其实就是 execute（执行）

1	$ sqoop job --exec myjob

执行成功

5.1.5 Check the output on HDFS (/user/root or specified target dir)

查看在 HDFS 中某个路径的输出（在 job 中设置的 --target-dir 或者默认路径）

1	$ hdfs dfs -cat /sqoopjob1/part-m-*

job 执行成功

5.1.6 Create Job using the eval tool for incremental update

创建增量导入（Incremental Import）的 job

$ sqoop job \
 --create actor_append_job \
 -- import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --table actor \
 --target-dir /sqoopjob1_append \
 --incremental append \
 --check-column actor_id \
 --m 1

Job创建成功

5.1.7 list all saved jobs

展示所有已经创建的 Job

1	$ sqoop job --list

展示所有Job

5.1.8 inspect the configuration of a job with the show action

使用 show 操作查看 Job 的配置

1	$ sqoop job --show actor_append_job

再次查看这个Job的配置

5.1.9 run the job with –exec

使用 --exec 参数运行 Job

--exec 是单词 executive（运行） 的缩写

1	$ sqoop job --exec actor_append_job

Job运行成功

5.1.10 Check the output on HDFS (specified target dir)

指定目标路径来在 HDFS 上检查输出

1	$ hdfs dfs -cat /sqoopjob1_append/part-m-*

检查成功

5.1.11 Add some new records

添加一些新纪录

$ sqoop-eval \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --query "insert into actor(first_name,last_name) value('JOE','SWANK'),('CHRISTION','GABLE'),('ZERO','CAGE')"

成功添加

5.1.12 run the job with –exec

使用 --exec 参数来运行 Job

1	$ sqoop job --exec actor_append_job

增量导入运行成功

5.1.13 Check the output on HDFS (specified target dir)

指定目标路径来在 HDFS 上检查输出

1	$ hdfs dfs -cat /sqoopjob1_append/part-m-*

导入成功

5.1.14 Set the saved job as a scheduled task

关于 crontab

这里解释一下 Linux 中的 crontab 文件

这个指令是在 Linux 系统中带有的文件，并非 Sqoop 或 Hadoop 专属

这个指令可以上我们在指定时间（包括哪月哪周哪日哪小时哪分钟）周期性地运行我们预定的指令

crontab 的文件格式

{minute} {hour} {day-of-month} {month} {day-of-week} {full-path-to-shell-script}

minute 区间为 0 - 59
hour 区间为 0 - 23
day-of-month 区间为 0 - 31
month 区间为 1 - 12
day-of-week 区间为 0 - 7，或者英文单词的前三个字母
full-path-to-shell-script 要运行的完整指令

修改 crontab 文件

1	$ crontab -e

显示 crontab 文件

1	$ crontab -l

删除 crontab 文件

1	$ crontab -r

删除 crontab 文件，在删除之前提醒用户

1	$ crontab -ir

将保存的 Job 设置为日常 Job

1	$ crontab -e

1	30 03 * * tue sqoop-job --exec actor_append_job

修改crontab文件

再来看看修改之后的文件

1	$ crontab -l

查看crontab

展示 root 用户的 crontab 内容

1	$ less /etc/crontab

详细信息

5.2 Sqoop Merge Tool

merge 工具可以让我们将 MapReduce 中的所选择的/所有的 mapper 文件合并为一个 reduce 文件

5.2.1 check the files in HDFS path /sqoopjob2

在开始之前，我们得把这个表 import 到 MapReduce

$ sqoop import \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --table actor \
 --target-dir /sqoopjob2 \
 --m 3

导入成功

然后我们来看看这个 MapReduce 路径下都有啥文件

1	$ hdfs dfs -ls /sqoopjob2

导入成功

5.2.2 generate the corresponding jar file

生成相应的 jar 文件

$ sqoop-codegen \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --table actor \
 --bindir /tools/codegen

$ sqoop-codegen \
# codegen 可以生成这个表的 .jar 文件和 .class 文件和 .java 文件

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password-file file:///tools/.mysql_password.password \
 
 --table actor \
 
 --bindir /tools/codegen	# --bindir 参数指定生成路径
 # 这个路径是自动生成的

试运行

生成的两个文件

5.2.3 generated code files can be found in /tools/codegen

生成的代码文件可以在 /tools/codegen 中找到

1	$ ls /tools/codegen

5.2.4 Use sqoop-merge tool to merge the newly incrementally imported file with the old one

使用 sqoop-merge 工具将新增量导入的文件与旧文件合并

$ sqoop-merge \
 --new-data /sqoopjob2/part-m-00001 \
 --onto /sqoopjob2/part-m-00000 \
 --target-dir /sqoopjob2-merged \
 --jar-file /tools/codegen/actor.jar \
 --class-name actor \
 --merge-key actor_id

$ sqoop-merge \

 --new-data /sqoopjob2/part-m-00001 \
 # 将新的这个 map 文件
 
 --onto /sqoopjob2/part-m-00000 \
 # 合并到这个 map 文件中
 
 --target-dir /sqoopjob2-merged \
 # 存储到这个 MapReduce 路径下
 
 --jar-file /tools/codegen/actor.jar \
 # 使用这个路径下的 .jar 文件
 # .jar 文件就是把这个表的整个结构存储到这个文件里，确保这个 .jar 文件是这个表的数据
 # 这里是通过 .jar 文件进行合并
 
 --class-name actor \
 # 类名（.java .class)
 
 --merge-key actor_id
 # 我们需要参照的 merge-key（一般都是主键）

执行成功

5.2.5 Verification

检验是否合并成功

1	$ hdfs dfs -cat /sqoopjob2-merged/part-r-*

合并成功

5.3 Sqoop Eval Tool

sqoop-eval 可以让我们再 Sqoop 指令中使用 MySQL语句

没错，可以使用所有 MySQL 语句，从此摆脱 Navicat（doge）

5.3.1 List all databases using the eval tool

使用 eval 工具展示所有数据库

$ sqoop-eval \
 --connect jdbc:mysql://192.168.1.1:3305 \
 --username root --password-file file:///tools/.mysql_password.password \
 --query "show databases;"

$ sqoop-eval \	# 声明使用 eval 工具

 --connect jdbc:mysql://192.168.1.1:3305 \
 
 --username root --password-file file:///tools/.mysql_password.password \
 
 --query "show databases;"	# SQL 语句要用双引号引上

展示所有数据库

5.3.2 Create a table using sqoop-eval tool

使用 eval 工具创建表

$ sqoop-eval \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --query "CREATE TABLE actor (actor_id smallint(5) unsigned NOT NULL AUTO_INCREMENT, first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, last_update timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,PRIMARY KEY (actor_id)) ENGINE=InnoDB;"

$ sqoop-eval \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password-file file:///tools/.mysql_password.password \
 
 --query "CREATE TABLE actor (actor_id smallint(5) unsigned NOT NULL AUTO_INCREMENT, first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, last_update timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,PRIMARY KEY (actor_id)) ENGINE=InnoDB;"	# MySQL 建表语句

试运行

创建成功

5.3.3 Insert data into the database using the sqoop-eval tool

使用 eval 工具在表中插入数据

$ sqoop-eval \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --query "INSERT INTO actor(first_name,last_name) VALUES ('PENELOPE', 'GUINESS'), ('NICK', 'WAHLBERG'), ('ED', 'CHASE')"

$ sqoop-eval \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password-file file:///tools/.mysql_password.password \
 
 --query "INSERT INTO actor(first_name,last_name) VALUES ('PENELOPE', 'GUINESS'), ('NICK', 'WAHLBERG'), ('ED', 'CHASE')"	# 插入数据

试运行

插入成功

5.3.4 Query data using the sqoop-eval tool

使用 eval 工具展示表中数据

$ sqoop-eval \
 --connect jdbc:mysql://192.168.1.1:3305/test \
 --username root --password-file file:///tools/.mysql_password.password \
 --query "SELECT * FROM actor;"

$ sqoop-eval \

 --connect jdbc:mysql://192.168.1.1:3305/test \
 
 --username root --password-file file:///tools/.mysql_password.password \
 
 --query "SELECT * FROM actor;"	# SQL 语句

展示成功

使用 Sqoop 进行数据交换

一、开始使用 Sqoop

1.1 Sqoop 介绍

为什么要用 Sqoop ?

1.2 Sqoop 安装

二、Sqoop 工具

三、Sqoop import 工具

3.1 Importing data from MySQL to HDFS

3.1.1 表中没有主键

3.1.2 表中有主键

3.1.3 注意事项

3.2 Import data from Windows MySQL

3.2.1 开放 Linux 与 Windows 中 MySQL 中的端口连接

3.2.2 开始进行 import

3.3 Import-all-tables command

3.4 Other Import

3.4.1 Compress Import

3.4.2 Bulk Import

3.4.3 Incremental Import

3.4.4 Custom Boundary Query Import

3.5 File Import

3.5.1 Sqoop data in text file format

3.5.2 Sqoop Data in Avro file format

3.5.3 Sqoop Data in Parquet File Format

3.5.4 Sqoop Data in Sequential File format

3.6 Import subset of RDBMS Table to HDFS

3.6.1 Import partial columns

3.6.2 Import data via custom query statement using –-query option

3.6.3 Incremental Import

3.6.4 Incremental data import using lastmodified mode

3.6.5 Use –merge-key option to import

四、Sqoop export 工具

4.1 Exporting data from HDFS to MySQL

4.2 Export data to Windows MySQL

4.3 Simple Export from HDFS txt to MySQL table

4.4 Export from HDFS txt to MySQL table after updateion in txt file on HDFS On Linux

4.5 Exporting data Call Mode On Mysql

4.6 Other Export

4.6.1 Exporting files under HDFS directory to a table

4.6.2 Batch Inserts Export

4.6.3 Updating existing Data set

4.6.4 Upsert Export

4.6.5 Column Export

五、其他 Sqoop 工具

5.1 Sqoop Job Tool

5.1.1 Create Job using the eval tool

5.1.2 list all saved jobs

5.1.3 inspect the configuration of a job with the show action

5.1.4 run the job with –exec

5.1.5 Check the output on HDFS (/user/root or specified target dir)

5.1.6 Create Job using the eval tool for incremental update

5.1.7 list all saved jobs

5.1.8 inspect the configuration of a job with the show action

5.1.9 run the job with –exec

5.1.10 Check the output on HDFS (specified target dir)

5.1.11 Add some new records

5.1.12 run the job with –exec

5.1.13 Check the output on HDFS (specified target dir)

5.1.14 Set the saved job as a scheduled task

5.2 Sqoop Merge Tool

5.2.1 check the files in HDFS path /sqoopjob2

5.2.2 generate the corresponding jar file

5.2.3 generated code files can be found in /tools/codegen

5.2.4 Use sqoop-merge tool to merge the newly incrementally imported file with the old one

5.2.5 Verification

5.3 Sqoop Eval Tool

5.3.1 List all databases using the eval tool

5.3.2 Create a table using sqoop-eval tool

5.3.3 Insert data into the database using the sqoop-eval tool

5.3.4 Query data using the sqoop-eval tool

3.2.2 开始进行 `import`