压缩：

sqoop命令中带上-z或者–compress，即可对拉取的数据进行压缩。
默认压缩算法为gzip，导出后在hdfs中的数据，后缀默认为.gz

sqoop  import  \
--connect jdbc:mysql://localhost:3306/horus \
--username root --password ****** \
--table data  --fields-terminated-by "\0001"  \
--lines-terminated-by '\n'    \
--null-string '\\N' \
--null-non-string  '\\N'  \
--hive-table s_data \
--hive-import \
--hive-overwrite \
--delete-target-dir \
--hive-drop-import-delims  \
--compress \
-m 2

默认gzip压缩算法可以直接导到hive里；
默认gzip压缩算法支持-m 2分割（也可利用–compression-codec指定压缩算法，但部分压缩算法不支持-m分割任务）
用默认压缩算法压缩后的.gz文件可以支持导出。
压缩比为：
50W条数据，不压缩，43M多。压缩后，3M多。
1W条数据，不压缩，1M；压缩后，90KB左右。

as-avrodatafile与–as-sequencefile

目前为止测试情况：

这两个参数直接加在sqoop命令中使用时候，不支持和–hive-import同时使用，如果同时使用即直接导入到hive中会直接报错如下：

Hive import is not compatible with importing into AVRO format.

或者：

Hive import is not compatible with importing into SequenceFile format.

官网有如下说法：

Delimited text is appropriate for most non-binary data types. It also readily supports further manipulation by other tools, such as Hive.

结合其他个人猜测可能含义是只有txt支持直接导入到hive

个人测试：
（1）、先加–as-avrodatafile参数导到hdfs中，可以成功导入，后缀为.avro。
（2）、从hvie中load数据到创建好的表中，会产生乱码。

Sqoop的学习笔记

压缩：

as-avrodatafile与–as-sequencefile

发表回复取消回复

压缩：

as-avrodatafile与–as-sequencefile

发表回复 取消回复

发表回复取消回复