-1

I am currently have a project analysing web logs of a website using machine learning. I am cleaning the data and want to identify unique visitors to this site.

I don't have much experience in dealing with web logs, but it is obvious to see that when a user visits, several files were retrieved (for example the records in column cs.uri.stem shown below).

My questions, how about when a user goes through several pages (like went to page B from a link in page A)? How can I know he's behaviours on this site?

Additionally, can anyone suggest any great python library that helps analysing web logs?

Much appreciated!!!

         date     time        s.ip cs.method cs.uri.stem                                                               cs.uri.query s.port cs.username         c.ip sc.status sc.substatus sc.win32.status time.taken device            os          browser
1  2014-08-05 00:00:03 10.130.0.12       GET /                                                                                    -     80           - 67.205.67.76       200            0               0       1391 Spider         Other   PingdomBot_1.4
2  2014-08-05 00:00:11 10.130.0.12       GET /about-the-hotel.aspx                                                                -     80           -  70.56.59.43       200            0               0       1194     PC Mac_OS_X_10.8     Firefox_31.0
3  2014-08-05 00:00:11 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/a-hotel-unlike-any-others.ashx            -     80           -  70.56.59.43       200            0               0        976     PC Mac_OS_X_10.8     Firefox_31.0
4  2014-08-05 00:00:12 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/0713-ExComTeam.ashx                       -     80           -  70.56.59.43       200            0               0       1620     PC Mac_OS_X_10.8     Firefox_31.0
5  2014-08-05 00:00:12 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/vivienne-tam.ashx                    -     80           -  70.56.59.43       200            0               0       1713     PC Mac_OS_X_10.8     Firefox_31.0
6  2014-08-05 00:00:12 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/william-lim.ashx                     -     80           -  70.56.59.43       200            0               0       2387     PC Mac_OS_X_10.8     Firefox_31.0
7  2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/barney-cheng.ashx                    -     80           -  70.56.59.43       200            0               0       2180     PC Mac_OS_X_10.8     Firefox_31.0
8  2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/tommy-li.ashx                        -     80           -  70.56.59.43       200            0               0       1146     PC Mac_OS_X_10.8     Firefox_31.0
9  2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/yang-rutherford.ashx                 -     80           -  70.56.59.43       200            0               0        869     PC Mac_OS_X_10.8     Firefox_31.0
10 2014-08-05 00:00:14 10.130.0.12       GET /~/media/Images/Hotel_ICON_revamp/about+us/icon/justin_wong_img1.ashx                -     80           -  70.56.59.43       200            0               0        845     PC Mac_OS_X_10.8     Firefox_31.0
4
  • You can specify them with IP-OS-Browser Commented Apr 3, 2017 at 4:51
  • Can you be more specific? Commented Apr 3, 2017 at 4:52
  • 1
    Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow. Commented Apr 3, 2017 at 4:52
  • @AdamLeo "how about when a user goes through several pages?" You can change your log and add referrer or check a user IP, If one IP visited a page with special Os and browser, maybe he went to the second page from a link in the first page Commented Apr 3, 2017 at 4:59

1 Answer 1

1

It may be a good idea to look at pandas library. Once you have loaded data using pandas (see example here), it should be straight forward to find unique elements conditioned on one or multiple columns, example here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.