Will sqoop export create duplicates when the number of mappers is higher than the number of blocks in the source hdfs location?
There is no relationship with number of mappers and number of the blocks,
Only in case if the hdfs file does have any duplicate records then sqoop cannot help in resisting the duplicates.
if duplicate rows were seen in target table and when you tried to add PK constraint, it failed due to PK violation, further, the source does not have duplicate rows. One possible scenario is that your Target table could already have records which maybe because of a previous incomplete sqoop job. Please check whether target table has key which is also in source.
One workaround for this scenario is, use parameter "--update-mode allowinsert". In your query, add these parameters, --update-key --update-mode allowinsert. This will ensure that if key is already present in table then the record will get updated else if key is not present then sqoop will do an insert.
Comments
Post a Comment