5

I was trying to make a variable inaccessible for a project I'm doing, and I ran across an SO post on Does Python have “private” variables in classes?. For me, it raised some interesting questions that, to try and make this answerable, I'll label with Q1 , Q2 , etc. I've looked around, but I didn't find answers to the questions I'm asking, especially to those about sensitive data.

I found useful stuff in that post, but it seems that the general consensus was something like if you see a variable with a _ before it, act like an adult and realize you shouldn't be messing with it. The same kind of idea was put forward for variables preceded by __. There, I got the general idea that you trust people not to use tricks like those described here and (in more detail) here. I also found some good information at this SO post.

This is all very good advice when you're talking about good coding practices.

I posted some thoughts in comments to the posts I've shared. My main question was posted as a comment.

I'm surprised there hasn't been more discussion of those who want to introduce malicious code. This is a real question: Is there no way in Python to prevent a black-hat hacker from accessing your variables and methods and inserting code/data that could deny service, reveal personal (or proprietary company) informationQ1? If Python doesn't allow this type of security, should it ever be used for sensitive dataQ2?

Am I totally missing something: Could a malicious coder even access variables and methods to insert code/data that could deny service or reveal sensitive dataQ3?

I imagine I could be misunderstanding a concept, missing something, putting a problem in a place where it doesn't belong, or just being completely ignorant on what computer security is. However, I want to understand what's going on here. If I'm totally off the mark, I want an answer that tells me so, but I would also like to know how I'm totally off the mark and how to get back on it.

Another part of the question I'm asking here is from another comment I made on those posts/answers. @SLott said (somewhat paraphrased)

... I've found that private and protected are very, very important design concepts. But as a practical matter, in tens of thousands of lines of Java and Python, I've never actually used private or protected. ... Here's my question "protected [or private] from whom?"

To try and find out whether my concerns are anything to be concerned about, I commented on that post. Here it is, edited.

Q: "protected from whom?" A: "From malicious, black-hat hackers who would want to access variables and functions so as to be able to deny service, to access sensitive info, ..." It seems the A._no_touch = 5 approach would cause such a malicious coder to laugh at my "please don't touch this". My A.__get_SSN(self) seems to be just wishful hoping that B.H. (Black Hat) doesn't know the x = A(); x._A__get_SSN() trick (trick by @Zorf).

I could be putting the problem in the wrong place, and if so, I'd like someone to tell me I'm putting the problem in the wrong place, but also to explain. Are there ways of being secure with a class-based approachQ4? What other non-class-and-variable solutions are there for handling sensitive data in PythonQ5?

Here's some code that shows why I see the answers to these questions as a reason for wondering if Python should ever be used for sensitive data Q2. It's not complete code (why would I put these private values and methods down without using them anywhere?), but I hope it shows the type of thing I'm trying to ask about. I typed and ran all this at the Python interactive console.

## Type this into the interpreter to define the class.
class A():
  def __init__(self):
    self.name = "Nice guy."
    self.just_a_4 = 4
    self.my_number = 4
    self._this_needs_to_be_pi = 3.14
    self.__SSN = "I hope you do not hack this..."
    self.__bank_acct_num = 123
  def get_info():
    print("Name, SSN, bank account.")
  def change_my_number(self, another_num):
    self.my_number = another_num
  def _get_more_info(self):
    print("Address, health problems.")
  def send_private_info(self):
    print(self.name, self.__SSN, self.__bank_acct_num)
  def __give_20_bucks_to(self, ssn):
    self.__SSN += " has $20"
  def say_my_name(self):
    print("my name")
  def say_my_real_name(self):
    print(self.name)
  def __say_my_bank(self):
    print(str(self.__bank_acct_num))
>>> my_a = A()
>>> my_a._this_needs_to_be_pi
3.14
>>> my_a._this_needs_to_be_pi=4 # I just ignored begins-with-`_` 'rule'.
>>> my_a._this_needs_to_be_pi
4

## This next method could actually be setting up some kind of secure connection,  
## I guess, which could send the private data. I just print it, here.
>>> my_a.send_private_info()
Nice guy. I hope you do not hack this... 123

## Easy access and change a "private" variable
>>> my_a.__SSN
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute '__SSN'
>>> my_a.__dict__
{'name': 'Nice guy.', 'just_a_4': 4, 'my_number': 4, '_this_needs_to_be_pi': 4, 
'_A__SSN': 'I hope you do not hack this...', '_A__bank_acct_num': 123}
>>> my_a._A__SSN
'I hope you do not hack this...'

# (maybe) potentially more dangerous
>>> def give_me_your_money(self, bank_num):
      print("I don't know how to inject code, but I can")
      print("access your bank account number:")
      print(my_a._A__bank_acct_num)
      print("and use my bank account number:")
      print(bank_num)
>>> give_me_your_money(my_a,345)
I don't know how to inject code, but I can
access your bank account number:
123
and use my account number:
345

At this point, I re-entered in the class definition, which probably wasn't necessary.

>>> this_a = A()
>>> this_a.__give_20_bucks_to('unnecessary param')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute '__give_20_bucks_to'
>>> this_a._A__give_20_bucks_to('unnecessary param')
>>> this_a._A__SSN
'I hope you do not hack this... has $20'

## Adding a fake "private" variable, `this_a.__SSN`
>>> this_a.__SSN = "B.H.'s SSN"
>>> this_a.__dict__
{'name': 'Nice guy.', 'just_a_4': 4, 'my_number': 4, '_this_needs_to_be_pi': 3.14, 
'_A__SSN': 'I hope you do not hack this... has $20', '_A__bank_acct_num': 123, 
'__SSN': "B.H.'s SSN"}
>>> this_a.__SSN
"B.H.'s SSN"

## Now, changing the real one and "sending/stealing the money"
>>> this_a._A__SSN = "B.H.'s SSN"
>>> this_a._A__give_20_bucks_to('unnecessary param')
>>> this_a._A__SSN
"B.H.'s SSN has $20"

I've actually done some work at a previous contracting job with sensitive data - not SSNs and bank account numbers, but things like people's ages, addresses, phone numbers, personal history, marital and other relationship history, criminal records, etc. I wasn't involved in the programming to secure this data; I helped with trying to extract useful information by helping to ground-truth the data as preparation for machine learning. We had permission and legal go-aheads to work with such data. Another main question is this: How, in Python, could one collect, manage, analyze, and draw useful conclusions with this sensitive dataQ6? From what I've discussed here, it doesn't seem that classes (or any of the other data structures, which I didn't go into here, but which seem to have the same problems) would allow this to be done securely (privately or in a protected manner. I imagine that a class-based solution probably has something to do with compilation. Is this trueQ7?

Finally, since it wasn't security, but code reliability that brought me here, I'll post another post I found and comment I made to complete my questions.

@Marcin posted,

[In response to the OP's words,] "The problem is simple. I want private variables to be accessed and changed only inside the class." [Marcin responded] So, don't write code outside the class that accesses variables starting with __. Use pylint or the like to catch style mistakes like that.

My goal with my following reply comment was to see if my thoughts represent actual coding concerns. I hope it did't come across as rude

It seems this answer would be nice if you wrote code only for your own personal enjoyment and never had to hand it on to someone else to maintain it. Any time you're in a collaborative coding environment (any post-secondary education and/or work experience), the code will be used by many. Someone down the line will want to use an easy way to change your __you_really_should_not_touch_this variable. They may have a good reason for doing so, but it's possible you set up your code such that their "easy way" is going to break things.

Is mine a valid point, or do most coders respect the double underscoreQ8? Is there a better way, using Python, to protect the integrity of the code - better than the __ strategyQ9?

14
  • 2
    To protect the integrity of the code don't let everyone change it on your production machine. Commented Aug 20, 2019 at 7:10
  • 7
    You are mixing up several levels of security. Once someone can touch your code, you are insecure whether or not your language enforces private attributes. If someone you don't trust maintains your code, you can't trust your code any more, whatever your programming language does. Private access is only here to protect from other parts of your codebase, mostly to enforce correct patterns. Unfortunately it includes libraries - the only attack vector that this leaves open. If you vet your libraries, this is not an issue. If you are using libraries whose integrity you don't trust, it's a problem. Commented Aug 20, 2019 at 7:17
  • 1
    I can only speak for me (and the companies I'm working for): If anyone changes the code he/she is a developer and has the right to that. What would I achieve with forced protection? If I can access the codebase I can just change private to public attributes. Commented Aug 20, 2019 at 7:18
  • 1
    "Protected from whom" is indeed the correct question. You are already protected against a random black hat hacker who cannot access your machine due to physical security, firewalls, proper PAM setup, good passwords, any exposed services being coded against good security practices, and all that crud. If the black hat hacker accesses your machine, it's not because Python has no private variables, it's because someone at your company screwed up on physical or logical security, or they managed to hire someone malicious. And then not even Java or C++ with strong access checks will help you. Commented Aug 20, 2019 at 7:21
  • 2
    You have to think how code can be injected. The main ways are buffer overflows (on binary code that is not properly protected against it - not applicable to Python), the infamous eval function against unchecked user input, writing user input to unchecked user-specified file path, inserting unchecked user input into raw SQL queries. Not using eval, and being careful about user input is all you need. (eval has valid uses, but chances are you'll almost never run into one. Whenever tempted to use eval think if it can be done some other way, and almost certainly you'll find it can.) Commented Aug 20, 2019 at 7:36

1 Answer 1

7

private and protected do not exist for security. They exist to enforce contracts within your code, namely logical encapsulation. If you mark a piece as protected or private, it means that it is a logical implementation detail of the implementing class, and no other code should touch it directly, since other code may not [be able to] use it correctly and may mess up state.

E.g., if your logical rule is that whenever you change self._a you must also update self._b with a certain value, then you don't want external code to modify those variables, as your internal state may get messed up if the external code does not follow this rule. You want only your one class to handle this internally since that localises the potential points of failure.

In the end all this gets compiled into a big ball of bytes anyway, and all the data is stored in memory at runtime. At that point there is no protection of individual memory offsets within the application's scope anyway, it's all just byte soup. protected and private are constraints the programmer imposes on their own code to keep their own logic straight. For this purpose, more or less informal conventions like _ are perfectly adequate.

An attacker cannot attack at the level of individual properties. The running software is a black box to them, whatever goes on internally doesn't matter. If an attacker is in a position to actually access individual memory offsets, or actually inject code, then it's pretty much game over either way. protected and private doesn't matter at that point.

Sign up to request clarification or add additional context in comments.

2 Comments

That perfectly covers all the things I've been learning from people's comments. I'm a physicist who learned to code for research, enjoyed it, and gone become a developer and researcher. This makes it so I run into these basic concepts that weren't part of my education. I'm not sure where I got the idea that private, protected, and underscores were part of computer security (maybe the word, protected,) but your answer very concisely fills in the gaps in my knowledge. Thanks.
I think many a programmer has had the same misunderstanding; languages which do sport actual protected keywords also usually throw errors if you do try to access properties marked as such. This may enforce the notion that you "can't" access them. (Actually, you always can, you just have to be really deliberate about it.) In this sense I do like Python's approach; it doesn't even pretend that these attributes aren't accessible, it just wants you to be an adult about accessing them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.