ERROR :
RetryInvocationHandler:
Exception while invoking ConsistencyCheckerS3FileSystem.mkdirs over null.
Retrying after sleeping for 10000ms.
Root cause :
Problem occurs
because of below issues:
- Retry logic in spark and
hadoop systems.
- When a process of creating a
file on s3 failed, but it already updated in the Dynamodb.
- Manual deletion of files and
directory from S3 console.
- when the hadoop process
restarts the process as the entry is already present in the Dynamodb. It
throws the consistent error.
Solution :
Try re-run your
spark job by cleaning up the EmrFSMetadata
in dynamo db.
Follow the steps to
clean-up & Restore the intended specific directory in the S3 bucket….
Deletes all the objects in the path, emrfs
delete uses the hash function to delete the records, so it may delete unwanted
entries also, so we are doing the import and sync in the consequent steps
Delete all the metadata
emrfs import
s3://<bucket>/path
Sync the data
between s3 and the metadata.
emrfs delete s3://<bucket>/path
Retrieves the
metadata for the objects that are physically present in s3 into dynamo db
Hopefully this will resolve the issue.