2020年7月 – xlucas分享站

随着目前数量越来越大，很多在日志文件需要压缩存储，那么压缩文件在大数据内是否可以用MR 进行处理？下面我这边用hive处理gzip的文件

建表语句：注意存储格式是text格式的

create table hive_gzip_formate_example(
username string COMMENT '用户名',
occupation string COMMENT '职业',
age string COMMENT '年龄')
COMMENT 'hive_gzip_formate_example'
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\u0001',
'serialization.format'='\u0001')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

上传gz文件

[xlucas@localhost ~]$ hadoop fs -ls /user/hive/warehouse/xlucas.db/hive_gzip_formate_example/
-rw-r-----   3 xlucas xlucas         66 2020-07-03 15:40 /user/hive/warehouse/xlucas.db/hive_gzip_formate_example/GzipFormateTest.gz
-rw-r-----   3 xlucas xlucas         66 2020-07-03 15:45 /user/hive/warehouse/xlucas.db/hive_gzip_formate_example/aa.gz

查询数据，没有问题

hive (xlucas)> select * from hive_gzip_formate_example;
OK
username      occupation    age
zhangsan        it      20
lisi            sale    30
zhangsan        it      20
lisi            sale    30

一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

月度归档： 2020年7月

MapReduce处理压缩文件