Data Structures
Data scrtucures are efficent memory construct used to sotre and organize data in an efficent manner. Adopting the right data structure and having efficent access to the needed information is a fundamentala to build usable and scalable products.
Big-O Notation and asymptotic Analysis
To evaluate the efficency of a data structure we need to evaluate the time and memory consumption requred to execute the algorithm. As the run-time depends on the input size, we will focus on the performance of the data structure when the inputs are infinitly large. The asymptotic notations is the mathematical tool used to perform this analysis, specifically we will focus on the Big-O notation that studies the behaviout of each algorithm in the worst-case scenarious. Thus, it indicates the complexity of an algirithm assuming inputs of size N with \(\lim N\to\infty\). Under this context constant factors are ignored as are dominated by N.
Notation | Name | Example |
---|---|---|
\(O(1)\) | constant | Determining if a binary number is even or odd; Using a constant-size lookup table |
\(O(\log N)\) | logarithmic | Finding an item in a sorted array with a binary search or a balanced search tree as well as all operations in a binomial heap. |
\(O(N)\) | linear | Finding an item in an unsorted list or in an unsorted array; adding two n-bit integers by ripple carry. |
\(O(N\log N)\) | linearithmic | Performing a Fast Fourier Transform or Merge sort. |
\(O(N^{2})\) | quadratic | Simple sorting algorithms such as Bubble Sort or Select Sort. |
\(O(N!)\) | factorial | Solving the Travelling Salesman problem via brute-force. |
Data Structures
Array
Arrays are collections of items stored at a contiguous memory locations. Such property makes array easy to traverse and genearlly it provides random access to its element in constant complexity.
Genearally speaking arrays have fixed size and new element can’t be added if the array is already full. However, it is possible to implement dynamic arrays at the expences of a memory overhead (unused memory is reserved for new items that will be added later on). Dynamic arrays achieve constant time complexity when it comes to append and delete operation in the general case, but if resize is needed then a new copy of the current array has to be create; thus requireing high memory and time complexity. The dynamic structure is obtained by creating a new array double size of the original array and copy all element from the previous array to the new array.
Name | Description | Complexity |
---|---|---|
Append | Add an element to the end of the array | Time and Space: \(O(1)\) (in ammortized time) |
Insert | Insert an element to the i-th position of the array | Time and Space: \(O(N)\) |
Remove | Remove the i-th element of the array | Time: \(O(N)\) and Space: \(O(N)\) |
Remove Last | Remove the last element of the array | Time and Space: \(O(1)\) (in ammortized time) |
Search | Check if an element is present in the list | Time: \(O(N)\) and Space: \(O(1)\) |
Get | Get the i-th element in the list | Time: \(O(1)\) and Space: \(O(1)\) |
Sort | Sort element in the list | Time: \(O(N \log N)\) and Space: \(O(N)\) |
Hash Tables
Hash tabels are one of the most importat data strcutre build uppon arrays. By organising data in (key, values) pairs it allows for fast insertion, lookup and access to data. It is composed by an array and the position of each key in this array is determined by the function:
\[ idx = hash(key) \% size(hash\_table) \].
Python provide a native implementation of hash table under the dict
class.
Name | Description | Complexity |
---|---|---|
Insert | Add an element to the dictionary | Time and Space: \(O(1)\) (in ammortized time) |
Remove | Remove a key from the dictonary | Time: \(O(1)\) and Space: \(O(1)\) |
Search | Check if a key is present in the dictionary | Time: \(O(1)\) and Space: \(O(1)\) |
Get | Get a given key in the dictionary | Time: \(O(1)\) and Space: \(O(1)\) |
Iterate | Iterate over all element of the dictionary | Time: \(O(N)\) and Space: \(O(1)\) |
Linked List
A linked list is a linear data structure that includes a series of connected nodes. Usually every nodes is composed by a data filed that contains some value and a pointer to the next element (if there is). While arrays are contiguous in memory, linked lists allows for a dynamic memory management where nodes can be scattered across the memory and simply point to each other. Linked lists are the fundamental backbone for other data structure as stacks and queue.
Name | Description | Complexity |
---|---|---|
Insert | Add an element to the list | Time and Space: \(O(1)\) |
Remove | Remove an element from the list | Time and Space: \(O(1)\) |
Search | Search an element in the list | Time: \(O(N)\) Space: \(O(1)\) |
class Node:
def __init__(self, value):
self.value = value
self.next = None
class LinkedList:
# Initializing a stack.
# Use a dummy node, which is
# easier for handling edge cases.
def __init__(self):
self.head = None
self.tail = None
# Check if the stack is empty
def isEmpty(self):
return self.head is None
# Insert at the end
def insert(self, value):
= Node(value)
node if self.isEmpty():
self.head = node
self.tail = node
else:
self.tail.next = node
self.tail = self.tail.next
# remove from the beginning.
def remove(self):
if self.isEmpty():
raise Exception("Remove from an empty list")
= self.head
node if self.head == self.tail:
self.tail = self.tail.next
self.head = self.head.next
return node
# sarch the node with a given value
def search(self, value):
= self.head
node while node is not None:
if node.value == value:
break
= node.next
node return node
Floyd’s Cycle Finding Algorithm
One of the most famous algorithm for LinkedList is the so called Floyd’s finding algorithm. This algorithm is used to find a loop in a linked list. It uses two pointers one moving twice as fast as the other one. The faster one is called the faster pointer and the other one is called the slow pointer. While traversing the linked list one of these things will occur:
- the fast pointer may reach the end (NULL) this shows that there is no loop in the linked list.
- the fast pointer again catches the slow pointer at some time therefore a loop exists in the linked list.
def detectLoop(llist):
= llist.head
slow_pointer = llist.head
fast_pointer
while (slow_pointer != None
and fast_pointer != None
and fast_pointer.next != None):
= slow_pointer.next
slow_pointer = fast_pointer.next.next
fast_pointer if (slow_pointer == fast_pointer):
return 1
return 0
Stack
The Stack is a special kind of linked list that follows the LIFO principle. Intuitivly it’s a deck of cards where the top card of the deck (the last added element added) is the first card to picked (the first element to remove next).
Name | Description | Complexity |
---|---|---|
Push | Add an element to the top of a stack | Time and Space: \(O(1)\) |
Pop | Remove an element from the top of a stack | Time and Space: \(O(1)\) |
Peek | Get the value of the top element without removing it | Time and Space: \(O(1)\) |
IsEmpty | Check if the stack is empty | Time and Space: \(O(1)\) |
class Node:
def __init__(self, value):
self.value = value
self.next = None
self.prev = None
class Stack:
# Initializing a stack.
# Use a dummy node, which is
# easier for handling edge cases.
def __init__(self):
self.head = None
self.size = 0
# Get the current size of the stack
def getSize(self):
return self.size
# Check if the stack is empty
def isEmpty(self):
return self.size == 0
# Get the top item of the stack
def peek(self):
# Sanitary check to see if we
# are peeking an empty stack.
if self.isEmpty():
raise Exception("Peeking from an empty stack")
return self.head.value
# Push a value into the stack.
def push(self, value):
= Node(value)
node if self.head is None:
self.head = node
else:
= self.head
node.prev self.head.next = node
self.head = node
self.size += 1
# Remove a value from the stack and return.
def pop(self):
if self.isEmpty():
raise Exception("Popping from an empty stack")
= self.head
remove self.head = self.head.prev
self.head.next = None
self.size -= 1
return remove.value
Queue
Queues are an implementation of LinkedList that follows the FIFO principle. Similarly to ticket queue outside a cinema hall, where the first person entering the queue is the first person who gets the ticket.
Name | Description | Complexity |
---|---|---|
Enqueue | Add an element to the end of the queue | Time and Space: \(O(1)\) |
Dequeue | Remove an element from the front of the queue | Time and Space: \(O(1)\) |
Peek | Get the value of the front of the queue without removing it | Time and Space: \(O(1)\) |
IsEmpty | Check if the queue is empty | Time and Space: \(O(1)\) |
Binary Trees
A binary tree is a tree data structure in which each parent node can have at most two children. Each node of a binary tree consists of three items:
- value of the node,
- the address to the left child,
- the address to the right child.
Name | Description | Complexity |
---|---|---|
Construct | Construct a binary tree | Time and Space: \(O(N)\) |
Travers | Traverse a binary tree | Time \(O(N)\) and Space: \(O(height)\) |
Binary trees are generaly represetned by linked nodes structures.
class Node:
def __init__(self, value):
self.value = value
self.left = None
self.right = None
def pre_order_traverse(self):
print(self.value)
if self.left:
self.left.pre_order_traverse()
if self.right:
self.right.pre_order_traverse()
def in_order_traverse(self):
if self.left:
self.left.in_order_traverse()
print(self.value)
if self.right:
self.right.in_order_traverse()
def post_order_traverse(self):
if self.left:
self.left.post_order_traverse()
if self.right:
self.right.post_order_traverse()
print(self.value)
def build(array):
= None
root = len(array)
n
def add_child(idx):
if idx < n:
= Node(array[idx])
node
= add_child(idx*2 + 1)
node.left
= add_child(idx*2 + 2)
node.right
return node
= add_child(0)
root return root
However, binary trees can be also represented using arrays in which:
- root node is stored at index 0,
- left child is stored at index \((i \cdot 2) + 1\) where, i is the index of the parent,
- right child is stored at index \((i \cdot 2) + 1\), i is the index of the parent.
class Tree:
def __init__(self, array):
self.array = array
def left(self, parent_idx):
if (parent_idx * 2) + 1 < len(self.array):
return self.array[(parent_idx * 2) + 1]
def right(self, parent_idx):
if (parent_idx * 2) + 2 < len(self.array):
return self.array[(parent_idx * 2) + 2]
def set_left(self, val, parent_idx):
self.array[(parent_idx * 2) + 1] = val
def set_right(self, val, parent_idx):
self.array[(parent_idx * 2) + 2] = val
def in_order(self, parent_idx=0):
if self.left(parent_idx):
self.in_order((parent_idx * 2) + 1)
print(self.array[parent_idx])
if self.right(parent_idx):
self.in_order((parent_idx * 2) + 2)
Binary Search Tree
Binary search tree is a data structure that quickly allows us to maintain a sorted list of numbers and search trought it.
The properties that separate a binary search tree from a regular binary tree are:
- all nodes of left subtree are less than the root node;
- all nodes of right subtree are more than the root node;
- both subtrees of each node are also BSTs i.e. they have the above two properties.
Searching is extreamly efficent as can be done in \(O(\log N)\) time and constant space. Intuitively, searching is so efficent as we can analys only one of the two subtrees based on the relation between the current node’s value and the looked for value. Thus, at every step we half the searching space.
Name | Description | Complexity |
---|---|---|
Insertion | Insert a node to the tree | Time \(O(\log N)\) and Space: \(O(N)\) |
Search | Search for an element in the tree | Time \(O(\log N)\) and Space: \(O(N)\) |
Deletion | Remove an element from the tree | Time \(O(\log N)\) and Space: \(O(N)\) |
class Node:
def __init__(self, value):
self.value = value
self.left, self.right = None, None
def search(self, value):
if self.value == value:
return True
elif value < self.value and self.value.left is not None:
return self.value.left.search(value)
elif value > self.value and self.value.right is not None:
return self.value.right.search(value)
else:
return False
@staticmethod
def insert(node, value):
if node is None:
return Node(value)
if value < node.value:
= Node.insert(node.left, value)
node.left else:
= Node.insert(node.right, value)
node.right
return node
@staticmethod
def remove(node, value):
if value < node.value:
= Node.remove(node.left, value)
node.left elif value > node.value:
= Node.remove(node.right, value)
node.right
else:
# case 1: it is a leaf node
if node.left is None and node.right is None:
return None
# case 2: there is only 1 child
elif node.left is not None and node.right is None:
= node.left.value
node.value = None
node.left elif node.left is None and node.right is not None:
= node.right.value
node.value = None
node.right
# case 3: take as new node, the right child or the left node of the right child if it exist
else:
# get new min value from right
= node.right
temp = node
prev while temp.left is not None:
= temp, temp.left
prev, temp
= temp.value
node.value
if prev != node:
= temp.right
prev.left else:
= temp.right
node.right return node
return node