Use could use Branch and Bound.
You'd branch on "for tasks i, which set do I pick?", picking the biggest set first to cause failure as high up in the tree in as possible to save work. For the initial solution, you can flip that around to find a reasonable (but not optimal) solution quickly, which ends up saving work by pruning more in the real search later.
You could also branch on the s[q,t] of the following model which is closest to 0.5 (in a way, the choice which it is "least sure about").
The bound could be based on the linear relaxation of this ILP model:
maximize sum of x[t] over all t
variables:
0 <= x[t] <= 1 ; x[t] decides whether task t is scheduled or not
0 <= r[i,t] <= 1 ; r[i,t] decides whether task t uses resource i
0 <= s[q,t] <= 1 ; s[q,t] decides whether set q of task t is used
constraints:
1. for all tasks t: (sum s[q,t] over all valid q) - x[t] = 0
2. for all sets s[q,t]: (sum r[i,t] over all i in the set) - size_of_set * s[q,t] >= 0
3. for all resources i: sum r[i,t] over all t <= 1
- forces exactly 1 set of resources to be associated with any task that is chosen.
- forces the resources used by choosing set q for task t to be used by task t (>= 0 because sets may overlap)
- forces all resources to be used no more than once.
I may have made mistakes in the model, and I'm not sure how good it is. Anyway, solve it with linear programming (no doubt there's a library for it for Python) and then do a couple of Gomory cuts for good measure (they may look scary, but they're actually pretty simple to program), not too many though, trying to get an all-integer solution with only Gomory cuts often converges very slowly. Doing some of them is a cheap way to improve a solution.
This will give you an estimate that will let you prune some of the search space. How much it actually prunes depends on how close it gets to the actual solution. I predict that it will tend to select several sets belonging to a task with some factor between 0 and 1, because selecting a set "only a bit" allows it to use the same resource for multiple tasks. It has to pick several sets then because it must use a total of 1 set per task, but that also means it has more choice of resource so it can do that. Linear Programming is sneaky that way, always trying to give you, in a sense, the most annoying answer :)
Of course in this model, you'd exclude any possibilities that are no longer possible (allocated resources and the sets that contain them and tasks that would have zero possible sets), and skip tasks that are already scheduled.
If this is too complicated, you can calculate a much simpler bound like this: for all tasks t, take the size of their smallest set s[t]. Check how many you can take until the total size is larger than the number of unallocated resources (so take the smallest, add the next smallest, and so on - sort them or use a min-heap).
Of course if with the resources allocated so far, so many tasks are now without any possible sets that in total (including the ones that were already scheduled) you can not get more than the best solution so far, you can give up on the current branch of the recursion tree.
For the initial solution, you could try using the greedy algorithm you described but with one change: take the smallest set that only contain unallocated resources. That way it tries to "keep out of the way" of further tasks, though obviously you can construct cases where this is worse than picking the first possible set.
edit: and of course if there is a set in the collection of sets of a task that is a superset of an other set in that collection, just delete it. It can't be any better to use that superset. That happens to fix the example given in OP, but it typically it wouldn't.