Recently, I needed to create a tool to scrape a page's source so I could parse out of a public database, for a project that I'm working on. Python seemed like an easy solution but it was a pain getting it up and running and currently I have it half working (saves source to file instead of returning). When I run my c++ code I get a strange error...
Exception ignored in: <module 'threading' from 'C:\\Python34\\Lib\\threading.py'
>
Traceback (most recent call last):
File "C:\Python34\Lib\threading.py", line 1293, in _shutdown
t = _pickSomeNonDaemonThread()
File "C:\Python34\Lib\threading.py", line 1300, in _pickSomeNonDaemonThread
for t in enumerate():
File "C:\Python34\Lib\threading.py", line 1270, in enumerate
return list(_active.values()) + list(_limbo.values())
TypeError: an integer is required (got type NoneType)
My Python Code:
import urllib.request
import sys
def run(a):
req = urllib.request.Request(a)
res = urllib.request.urlopen(req)
d = str(res.read())
with open('temp.dat', 'w') as outfile:
for x in range(0, len(d)):
outfile.write(d[x])
The above code works correctly and doesn't issue any errors, so I feel that the mistake is somewhere in my c++ implementation. Anyways, I feel that it is worth mentioning that it successfully saves the websites (parameter a) source code to the 'temp.dat' file, I'm just trying to get rid of the error reporting.
My C++ code:
void pyCall(string url, string outfile, char* mod = "Scrape", char * dat = "run")
{
PyObject *pName, *pModule, *pDict, *pFunc;
PyObject *pArgs, *pValue, *pOutfile, *pURL;
int i;
Py_Initialize();
PyObject* sysPath = PySys_GetObject((char*)"path");
PyList_Append(sysPath, PyUnicode_FromString("."));
pName = PyUnicode_FromString(mod);
/* Error checking of pName left out */
pModule = PyImport_Import(pName);
Py_DECREF(pName);
if (pModule != NULL)
{
pFunc = PyObject_GetAttrString(pModule, dat);
/* pFunc is a new reference */
if (pFunc && PyCallable_Check(pFunc))
{
/* pValue reference stolen here: */
pArgs = Py_BuildValue("(s)", url.c_str());
pValue = PyObject_CallObject(pFunc, pArgs);
Py_DECREF(pArgs);
if (pValue != NULL)
{
printf("Result of call: %ld\n", PyLong_AsLong(pValue));
Py_DECREF(pValue);
}
}
Py_XDECREF(pFunc);
Py_DECREF(pModule);
}
Py_Finalize();
}
Now this code is pretty standard and is a 'cookie cutter' example of the code Python has on their API at https://docs.python.org/3.5/extending/embedding.html; The only differences is the way that I pass the arguments and appending the path at the beginning.
Any help would be greatly appreciated.