-4

I'm running a LAMP web server.

I'd like to include my script files on my page with:

<script src="http://domain.com/script.js"></script>

I would like visiting http://domain.com/script.js to display either an error, or a blank page.

I've seen other similar questions, of which the answer was "just obfuscate it", or "security by obfuscation is bad".

This isn't for the sake of security. I'm wanting to stop bots from pulling my code automatically. I'm ok with human users getting the code. I would simply like this as an alternative to obfuscation.

I've already attempted this with the use of base64_encoded $_GET and $_SESSION parameters. I'm wondering if there's a more elegant solution out there.


CLARIFICATION:

I am aware that Javascript is still available to the user. I am perfectly fine with the code being accessible via Firebug, Chrome's developer tools, etc. I simply want the code accessible via my tags, and inaccessible directly. This is not for security, and not to "hide" my code.


Clarification 2:

The reason I need this is because our company recently found a competitor running scripts to scrape data off of our site. I would like to be able to prevent the data from being scraped via their script, and force them to do it manually.

17
  • 4
    If you feel you must do this, look through these questions on preventing hotlinking of image files and apply the same methodology to .js. stackoverflow.com/search?q=htaccess+prevent+hotlink These methods are easily defeated by forging referer headers though. Commented Nov 7, 2014 at 14:18
  • 3
    I don't think you really understand the architecture of how Web Servers serve up content, but what you are asking for is not really possible. Commented Nov 7, 2014 at 14:19
  • 1
    No matter what you do, the JavaScript will still be available to the user. Commented Nov 7, 2014 at 14:20
  • 1
    @Mark That was a different question--I would like the same file that is included via the <script> tag to also be inaccessible directly. Commented Nov 7, 2014 at 14:20
  • 3
    Having just seen clarification 2, i would say that you are probably best following a different approach. Scrapers are quite easy to identify, because they hit all your endpoints very quickly. Better to identify the offending IPs, and either block them entirely, or if you are feeling devious send them old/incorrect data. Other suggestions are either useless (a scraper will probably request the html before the js, like a regular browser, it might even be an automated browser) or inconvenience your real users (captchas are a pain). Commented Nov 7, 2014 at 19:57

5 Answers 5

2

With your clarification #2 in mind, you might consider using PHP sessions.

You could first have the user hit a page that requires a captcha to proceed. Once the captcha is submitted and verified, a PHP session is started (or updated) with a boolean $isHuman that shows you are indeed dealing with a human.

Requests for scripts are directed to a php page that serves the script only if a session exists and $isHuman is true.

Sign up to request clarification or add additional context in comments.

1 Comment

This is much more likely to stop Data Scraping then what you (OP) are actually asking for.
2

I'm asking, quite specifically, how to make the file accessible via my <script> tag, and inaccessible directly.

Two options come to mind:

"Prevent hotlinking" Solutions

As @MichaelBerkowski pointed out, this is very similar to the common requirement of not allowing hotlinking of images, and the same sorts of solutions apply, with the same caveats and pitfalls. Basically, it's either of the following or both in combination:

  1. Checking REFERER (sic) headers on requests for your JavaScript files and denying those requests if REFERER doesn't refer to one of your pages.

  2. Remembering the IP addresses of machines that request your HTML pages for a brief time (say, up to a minute), and only allowing those IP addresses to download the JavaScript files, denying attempts from all other IP addresses.

The first is trivial, but also trivially bypassed. The second is a lot less trivial, but also readily bypassed (by simply issuing a request for the HTML and then disregarding the result), but does at least require that the request be made.

Embedding the JavaScript in the HTML

An alternative to doing that is to use an Apache module to minify the script and inject it into your HTML file at the point you have your <script src="myfile.js"></script> tag, resulting in a <script>codehere</script> tag instead. Then there's no JavaScript file to request. This has the downside of meaning that the same JavaScript on multiple pages doesn't benefit from caching, but then again it has the upsides of A) Not requiring a separate HTTP request, and B) Making it impossible for people to download your JavaScript files (as you'd simply not host them externally-visible at all).


Neither of the above means people can't get access to your code, because fundamentally that's impossible (the best you can do is obfuscate, and de-obfuscators are pretty good; fundamentally if the browser can run your script, anyone can see it), but it's clear from the comments on the question that you understand that.

6 Comments

@Mark: Indeed not. I think the comments made that abundantly clear, I figured the answer didn't benefit from going into it further.
Thank you for actually answering my question seriously. This unfortunately doesn't solve my issue though--I've already got a test script that prints out the $_SERVER superglobal. Whether it's included, or accessed directly, the values are identical.
@user3191820: What does the $_SERVER superglobal have to do with it? I'm talking about the REFERER header, and/or an in-memory DB of IP addresses, and/or embedding the script in the HTML.
@T.J.Crowder the $_SERVER superglobal contains the HTTP_REFERER and the REMOTE_ADDR (IP address) values. They were identical in both cases (values being 'localhost' and '127.0.0.1', respectively, on my local machine).
@user3191820: That's your local machine. Try it in a real production environment. A hotlink to your script file will have a different REFERER, and the REMOTE_ADDR should (provided all the layers are set up correctly) have the remote IP, not your local one.
|
1

As several people have tried to explain in the comments, this isn't really possible because the server how no way to know whether a JS file is being requested as part of the HTML page or on it's own.

The closest you are going to get to achieving this is by creating a random string and appending it to your script when the HTML is generated and checking for that string when the JS is called.

This is how CAPTCHA's work, BTW.

In you HTML

<?php
session_start(); //start session

// Function to generate random str, borrowed from here: http://stackoverflow.com/questions/4356289/php-random-string-generator
function randStr() {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $randomString = '';
    for ($i = 0; $i < 10; $i++) $randomString .= $characters[rand(0, strlen($characters) - 1)];
    return $randomString;
}

// set random str to session variable
$_SESSION['JS_STR'] = randStr();

// append random string to JS file, which will have to have a php extention
echo "<script src='myjavascriptfile.php?str=".$_SESSION['JS_STR']."' />";

?>

In you JS

You will have to change your .js file to .php

<?php
session_start();

//check make sure session variable matches the appended string
if(!isset($_SESSION['JS_STR']) || !isset($_GET['str']) || $_GET['str'] !== $_SESSION['JS_STR']) die("you don't have permission to view this");

//tell the browser your serving some JS
header('Content-Type: application/javascript');

?>
window.alert("your JS goes here...");

5 Comments

All this means is that the bot has to get the first page before asking for the script. That's not how a captcha works. Captchas require human intervention, and there is none here.
captchas work by creating a session variable on one page and verifying it on the next. save for lack of human interaction, yes, that is how a captcha works.
but yes, as I and everyone else says, this is a waste of time.
Yes, but here you're giving the answer away right after generating it.
which is why it's a waste of time. the only good this will do is prevent someone from viewing the file with only the filename.
1

I opted to just pursue the $_SESSION/$_GET/$_POST-gated script I had started before visiting StackOverflow.

The solution's not perfect, but it suits my needs, in that the scripts are accessible via my tags, but inaccessible directly. This is a simplified version of what I am doing:

File 1 is the PHP file generating the HTML page the user sees. This file creates a random value, and sets the value to the session. The script file File 2 is included using this random value as a GET parameter.

File 1:

<?php
session_start();
$gate['first_gate'] = crypt((time() * md_rand()) . 'salt');
$gate['second_gate'] = null;
$_SESSION['gate'] = json_encode($gate);
?>
<html>
    ...
    <!--this is just the HTML page including the script-->
    <script src="file_2.php?gate=<?=base64_encode(json_encode($gate))?>"></script>
    ...
</html>

File 2 is the PHP file functioning as a gate for the actual JavaScript code. It verifies that the randomized session variable is equal to the GET parameter, then grabs the code from File 3 using a POST request.

File 2:

 <?php
 $session_gate = json_decode($_SESSION['gate']);
 $get_gate = json_decode(base64_decode($_GET['gate']));
 //Exit if the session value != the get value
 if($get_gate->first_gate != $session_gate->first_gate) exit;

 //Set first gate to null to prevent re-visit
 $session_gate->first_gate = null;
 $session_gate->second_gate = crypt((time() * md_rand()) . 'salt');
 $_SESSION['gate'] = json_encode($session_gate);
 header('Content-Type: application/javascript');
 ?>
 //This is visible via "view source" (then clicking on the script's URL)
 //Grab the actual JS file, hidden behind a POST "wall"
 $.post("file_3.php", { gate: '<?=base64_encode($_SESSION['gate'])?>' });

File 3 is inaccessible when directly viewing the page, as it exits without the POST data from File 2. Bots will still be able to ping it with a POST request, so some additional safety measures should be added here.

File 3:

 <?php
 $session_gate = json_decode($_SESSION['gate']);
 $post_gate = json_decode(base64_decode($_POST['gate']));
 //Exit without a POST request. Use a more specific value, other than
 //the $_POST superglobal by itself (just using $_POST for illustrative purposes)
 if(!$_POST) exit; //or print an error message
 //Exit if the session value != the get value
 if($get_gate->second_gate != $session_gate->second_gate) exit;

 //Set both gates to null to prevent re-visit
 $session_gate->first_gate = null;
 $session_gate->second_gate = null;
 $_SESSION['gate'] = json_encode($session_gate);
 //Additional safety measures (such as IP address/HOST check) here, if desired
 header('Content-Type: application/javascript');
 ?>
 //Javascript code here

1 Comment

+1 for answering your own question. This could be simplified and improved however - see my answer
1

Following your answer, this is a simplified version of your solution:

<?php
//file1
session_start();
$token = uniqid();
$_SESSION['token'] = $token;
?>
<!--page html here-->
<script src="/js.php?t=<?php echo $token;?>"></script>

.

header('Content-Type: application/javascript');
$token = isset($_GET['t'])? $_GET['t'] : null;
if(!isset($_SESSION['token']) || $_SESSION['token'] != $token){
    //lets mess with them and inject some random js, ih this case a random chunk of compressed jquery
    die('n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return n.each(this,a,b)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)}');
}
//regenerate token, this invalidates current token
$token = uniqid();
$_SESSION['token'] = $token;
?>
$.getScript('js2.php?t=<?php echo $token;?>');

.

<?php
//js2.php
//much the same as before
session_start();
header('Content-Type: application/javascript');
$token = isset($_GET['t'])? $_GET['t'] : null;
if(!isset($_SESSION['token']) || $_SESSION['token'] != $token){
    //lets mess with them and inject some random js, ih this case a random chunk of compressed jquery
    die('n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return n.each(this,a,b)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)}');
}
unset($_SESSION['token']);
//get actual js file, from a folder outside of webroot, so it is never directly accessable, even if the filename is known
readfile('../js/main.js');

Note the main changes are:

  1. Simplifying the token system. As the token is in the page source, all it needs to do to function is to be unique, attempts to make it 'more secure' with encoding and salts etc do nothing.

  2. The actual js file is saved outside the web root, so its not possable to access directly even if you know the filename

Please note that i still stand by my comment about IP banning bots. This solution will make scraping a lot harder, but not impossible, and could have unforeseen consequences for genuine visitors.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.