AWS Glue Crawler Not Creating Table

Question

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes.

The crawler takes roughly 20 seconds to run and the logs show it successfully completed. CloudWatch log shows:

Benchmark: Running Start Crawl for Crawler
Benchmark: Classification Complete, writing results to DB
Benchmark: Finished writing to Catalog
Benchmark: Crawler has finished running and is in ready state

I am at a loss as to why the tables in the data catalog are not being created. AWS Docs are not of much help debugging.

I am facing the same with the root user that has all access for all services, I dont understand what is wrong! — Manza
– Manza, Commented Jan 31, 2022 at 23:26

Ray · Accepted Answer · 2018-01-10 22:21:56Z

61

check the IAM role associated with the crawler. Most likely you don't have correct permission.

When you create the crawler, if you choose to create an IAM role(the default setting), then it will create a policy for S3 object you specified only. if later you edit the crawler and change the S3 path only. The role associated with the crawler won't have permission to the new S3 path.

answered Jan 10, 2018 at 22:21

Ray

6117 silver badges3 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

bphi Over a year ago

The default glue service role includes only S3 prefixes like glue-public, I needed to change it to include the bucket I wanted to crawl.

Simon Over a year ago

Any idea why this incorrect permission doesn't appear as an exception in the logs?

Abrham Smith Over a year ago

This worked for me, I deleted the old role and edited the crawler and created a new one, tables were then created in the catalog, appreciate the tip!

Xonshiz Over a year ago

Thanks for this one. I spent 30 minutes checking logs and failed to understand what was happening. This was on point... <3

Ryanman Over a year ago

Wow. Reason number 953 why AWS is the opposite of easy to use. How difficult is this to fix?

|

Mohammad Sadoughi · Accepted Answer · 2020-04-01 14:45:30Z

6

I had the same issue, as advised by others I tried to revise the existing IAM role, to include the new S3 bucket as the resource, but for some reason it did not work. Then I created a completely new role from scratch... this time it worked. Also, one big question I have for AWS is "why this access denied error due to a wrong attached IAM policy does not show up in Cloud watch log??" That makes it difficult to debug.

edited Apr 1, 2020 at 14:45

answered Apr 1, 2020 at 14:35

Mohammad Sadoughi

1,2193 gold badges11 silver badges18 bronze badges

Comments

Kris Bravo · Accepted Answer · 2018-05-03 00:11:45Z

2

If you have existing tables in the target database the crawler may associate your new files with the existing table rather than create a new one.

This occurs when there are similarities in the data or a folder structure that the Glue may interpret as partitioning.

Also on occasion I have needed to refresh the table listing of a database to get new ones to show up.

answered May 3, 2018 at 0:11

Kris Bravo

1611 silver badge6 bronze badges

Comments

user2210411 · Accepted Answer · 2021-03-26 08:07:23Z

2

I had a similar IAM issue as mentioned by Ray. But in my case, I did not add an asterisk (*) after the bucket name, which means the crawler did not go into the subfolders, and no table was created.

Wrong:

{
   "Statement": [
    {
        "Action": [
            "s3:GetObject",
            "s3:PutObject"
        ],
        "Effect": "Allow",
        "Resource": [
            "arn:aws:s3:::bucket-name"
        ]
    }
   ],
   "Version": "2012-10-17"
}

Correct:

{
   "Statement": [
    {
        "Action": [
            "s3:GetObject",
            "s3:PutObject"
        ],
        "Effect": "Allow",
        "Resource": [
            "arn:aws:s3:::bucket-name*"
        ]
    }
   ],
   "Version": "2012-10-17"
}

answered Mar 26, 2021 at 8:07

user2210411

1,7271 gold badge10 silver badges7 bronze badges

Comments

cozyss · Accepted Answer · 2018-07-30 18:45:40Z

1

You can try excluding some files in the s3 bucket, and those excluded files should appear in the log. I find it helpful in debugging what's happening with the crawler.

answered Jul 30, 2018 at 18:45

cozyss

1,4081 gold badge19 silver badges27 bronze badges

Comments

astef · Accepted Answer · 2022-05-04 18:41:21Z

1

In my case, the problem was in the setting Crawler source type > Repeat crawls of S3 data stores, which I've set to Crawl new folders only, because I thought it will crawl everything for the first run, and then continue to discover only new data.

After setting it to Crawl all folders it discovered all tables.

answered May 4, 2022 at 18:41

astef

9,6887 gold badges67 silver badges108 bronze badges

Comments

Dheeraj Inampudi · Accepted Answer · 2019-04-15 12:30:49Z

Here is my sample role JSON that allows glue to access s3 and create a table.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "VisualEditor0",
        "Effect": "Allow",
        "Action": [
            "ec2:DeleteTags",
            "ec2:CreateTags"
        ],
        "Resource": [
            "arn:aws:ec2:*:*:instance/*",
            "arn:aws:ec2:*:*:security-group/*",
            "arn:aws:ec2:*:*:network-interface/*"
        ],
        "Condition": {
            "ForAllValues:StringEquals": {
                "aws:TagKeys": "aws-glue-service-resource"
            }
        }
    },
    {
        "Sid": "VisualEditor1",
        "Effect": "Allow",
        "Action": [
            "iam:GetRole",
            "cloudwatch:PutMetricData",
            "ec2:DeleteNetworkInterface",
            "s3:ListBucket",
            "s3:GetBucketAcl",
            "logs:PutLogEvents",
            "ec2:DescribeVpcAttribute",
            "glue:*",
            "ec2:DescribeSecurityGroups",
            "ec2:CreateNetworkInterface",
            "s3:GetObject",
            "s3:PutObject",
            "logs:CreateLogStream",
            "s3:ListAllMyBuckets",
            "ec2:DescribeNetworkInterfaces",
            "logs:AssociateKmsKey",
            "ec2:DescribeVpcEndpoints",
            "iam:ListRolePolicies",
            "s3:DeleteObject",
            "ec2:DescribeSubnets",
            "iam:GetRolePolicy",
            "s3:GetBucketLocation",
            "ec2:DescribeRouteTables"
        ],
        "Resource": "*"
    },
    {
        "Sid": "VisualEditor2",
        "Effect": "Allow",
        "Action": "s3:CreateBucket",
        "Resource": "arn:aws:s3:::aws-glue-*"
    },
    {
        "Sid": "VisualEditor3",
        "Effect": "Allow",
        "Action": "logs:CreateLogGroup",
        "Resource": "*"
    }
]

}

Grace Musungu · Accepted Answer · 2022-09-13 22:30:56Z

0

Encountered the same problem. I created a new crawler and a new IAM role but still used the same database and it worked!

answered Sep 13, 2022 at 22:30

Grace Musungu

1

1 Comment

Grace Musungu Over a year ago

PS. you can also try adjusting the maximum threshold for the tables. I adjusted that too.

Bobby · Accepted Answer · 2024-11-17 00:18:36Z

0

FWIW, I was trying to use a JDBC connection to an RDS instance as the source of my crawl. I was putting what I thought was a direct connection to the source table (e.g. postgres/table_name). However, I forgot that the table was nested in the public schema. Setting my source value to postgres/% fixed the issue for me

answered Nov 17, 2024 at 0:18

Bobby

6,9501 gold badge25 silver badges25 bronze badges

Collectives™ on Stack Overflow

AWS Glue Crawler Not Creating Table

9 Answers 9

8 Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

8 Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related