# Variables and assigning values¶

In :
```# Give variable "x" a value of 10
x = 10
x
```
Out:
`10`
In :
```# Give variable "a" a value of "GATTACA"
a = 'GATTACA'
a = "GATTTACA"
a
```
Out:
`'GATTTACA'`
In :
```# Give i a value of 100
i = 100
print('Value of i:', i)

# Give j a value of 5
j = 5
print('Value of j:', j)
```
```Value of i: 100
Value of j: 5
```
In :
```# Give k the product of i and j
k = i * j
print('Value of k:', k)
```
```Value of k: 500
```
In :
```# k is the product of i and j,
# if the value of i or j changes, does k also change?
print('Value of i, j, k:', i, j, k)

# Change i to 200
i = 200

print('Value of i, j, k:', i, j, k)
```
```Value of i, j, k: 100 5 500
Value of i, j, k: 200 5 500
```

# String and list operations¶

In :
```# Variable s has a value "GATTACA"
s = 'GATTACA'
```
In :
```# Print only the first character
print(s)

# Assign the first character to a variable
s0 = s
print('Value of s0:', s0)
```
```G
Value of s0: G
```
In :
```# Assign the last character to a variable
sn1 = s[-1]
print('Value of sn1:', sn1)
```
```Value of sn1: A
```
In :
```# Print the middle three characters
s_mid = s[2:5]  # GA[TTA]CA
print('Value of s_mid:', s_mid)
```
```Value of s_mid: TTA
```
In :
```start = 2
end = 5
s_mid_v = s[start:end]  # GA[TTA]CA
print('Value of s_mid_v:', s_mid_v)
```
```Value of s_mid_v: TTA
```
In :
```# Split a string into a list of characters
l = list('GATTACA')
l
```
Out:
`['G', 'A', 'T', 'T', 'A', 'C', 'A']`
In :
```# This doesn't work
'GATTACA'.split()  # splits whitespace
```
Out:
`['GATTACA']`
In :
```# This also doesn't work
'GATTACA'.split('')  # not allowed
```
```---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-3829490077ac> in <module>()
1 # This also doesn't work
----> 2 'GATTACA'.split('')  # not allowed

ValueError: empty separator```
In :
```# This will work
'G A T T A C A'.split()  # split by space
```
Out:
`['G', 'A', 'T', 'T', 'A', 'C', 'A']`
In :
```# Split a string in variable s into a list of characters
l = list(s)
l
```
Out:
`['G', 'A', 'T', 'T', 'A', 'C', 'A']`
In :
```# Change the 3rd character in the list to C
l = 'C'  # 0,1,2 -> 2 is the 3rd character's position
l
```
Out:
`['G', 'A', 'C', 'T', 'A', 'C', 'A']`
In :
```# Change it back to T
l = 'T'
l
```
Out:
`['G', 'A', 'T', 'T', 'A', 'C', 'A']`
In :
```# Get the first codon from the list
codon = l[0:3]
codon
```
Out:
`['G', 'A', 'T']`
In :
```# Get the first codon from the string
codon = s[0:3]
codon
```
Out:
`'GAT'`
In :
```# Make the list l into a string again
''.join(l)
```
Out:
`'GATTACA'`
In :
```''.join(l)
```
Out:
`'GATTACA'`
In :
```# This doesn't work
str(l)  # it makes representation of the list into a string
```
Out:
`"['G', 'A', 'T', 'T', 'A', 'C', 'A']"`
In :
```# Split the sequence GATTACAAA into codons of 3 characters
s = 'GATTACAAA'
codon0 = s[0:3]
codon1 = s[3:6]
codon2 = s[6:9]

print('Value of s:', s)
print('Values of codons 0,1,2:', codon0, codon1, codon2)
```
```Value of s: GATTACAAA
Values of codons 0,1,2: GAT TAC AAA
```
In :
```# Split the sequence GATTACAAA into codons of 3 characters
# using variables to specify positions
s = 'GATTACAAA'
start = 0
end = start + 3
codon0 = s[start:end]  # s[0:3]

start = start + 3
end = start + 3
codon1 = s[start:end]  # s[3:6]

start = start + 3
end = start + 3
codon2 = s[start:end]  # s[6:9]

print('Value of s:', s)
print('Values of codons 0,1,2:', codon0, codon1, codon2)
```
```Value of s: GATTACAAA
Values of codons 0,1,2: GAT TAC AAA
```

# Loops¶

In :
```# Let's use a LOOP to make this easier
s = 'GATTACAAA'
for start in range(0, len(s), 3):
print(s[start: start+3])
```
```GAT
TAC
AAA
```
In :
```# Store the result of each loop in a list
s = 'GATTACAAA'

codon_list = []  # create an empty list

for start in range(0, len(s), 3):
codon_list.append(s[start: start+3])  # add the result to the end of the list

# Let's look at what happens to the list
print('Value of codon_list:', codon_list)
```
```Value of codon_list: ['GAT']
Value of codon_list: ['GAT', 'TAC']
Value of codon_list: ['GAT', 'TAC', 'AAA']
```
In :
```# You can also use a variable first, before appending to the list
# Store the result of each loop in a list
s = 'GATTACAAA'

codon_list = []  # create an empty list

for start in range(0, len(s), 3):
codon = s[start: start+3]  # codon will hold the result at first

codon_list.append(codon)   # the result is added to the end of the
# list using the codon variable

# Let's look at what happens to the list
print('Value of codon_list:', codon_list)
```
```Value of codon_list: ['GAT']
Value of codon_list: ['GAT', 'TAC']
Value of codon_list: ['GAT', 'TAC', 'AAA']
```
In :
```# Convert a string into a list manually using a loop
s = 'GATTACAAA'

l = []

for i in range(len(s)):
l.append(s[i])

# Let's look at what happens to the list
print('Value of l:', l)
```
```Value of l: ['G']
Value of l: ['G', 'A']
Value of l: ['G', 'A', 'T']
Value of l: ['G', 'A', 'T', 'T']
Value of l: ['G', 'A', 'T', 'T', 'A']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C', 'A']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C', 'A', 'A']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C', 'A', 'A', 'A']
```
In :
```# Another way to convert a string into a list manually using a loop
s = 'GATTACAAA'

l = []

for char in s:
l.append(char)

# Let's look at what happens to the list
print('Value of l:', l)
```
```Value of l: ['G']
Value of l: ['G', 'A']
Value of l: ['G', 'A', 'T']
Value of l: ['G', 'A', 'T', 'T']
Value of l: ['G', 'A', 'T', 'T', 'A']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C', 'A']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C', 'A', 'A']
Value of l: ['G', 'A', 'T', 'T', 'A', 'C', 'A', 'A', 'A']
```

# Control structures (if-then-else)¶

In :
```# Compare if two sequences are the same
s1 = 'GATTACAAA'
s2 = 'GATTANAAA'

s1 == s2
```
Out:
`False`
In :
```if s1 == s2:
print('same')
else:
print('not the same')
```
```not the same
```
In :
```# Compare if two sequences are the same
# Find the position/s and the character/s that are different
s1 = 'GATTACAAA'
s2 = 'GATTANAAA'

for i in range(0, len(s1)):
if s1[i] != s2[i]:
print(i, s1[i], s2[i])
```
```5 C N
```
In :
```# Compare if two sequences are the same
# For each position, print the position, then
# print "same" if they are the same,
# print "different" if they are different, and what the charaters are

s1 = 'GATTACAAA'
s2 = 'GATTANAAA'

for i in range(0, len(s1)):
if s1[i] == s2[i]:
print(i, 'same')
else:
print(i, 'different', s1[i], s2[i])
```
```0 same
1 same
2 same
3 same
4 same
5 different C N
6 same
7 same
8 same
```
In :
```# Compare if two sequences are the same
# For each position, print the position, then
# print "same" if they are the same,
# print "different" if they are different, and what the charaters are
# print "maybe the same" if one of the characters is an N

s1 = 'GATTACAAA'
s2 = 'GATTANAAA'

for i in range(0, len(s1)):
if (s1[i] == 'N') or (s2[i] == 'N'):  # True if s1[i] == N AND s2[i] == N
# otherwise False
print(i, 'maybe the same')
else:
if s1[i] == s2[i]:
print(i, 'same')
else:
print(i, 'different', s1[i], s2[i])
```
```0 same
1 same
2 same
3 same
4 same
5 maybe the same
6 same
7 same
8 same
```
In :
```# Count the number of A's in the sequence
s = 'GATTACAAA'

# Is the character an "A"?
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
print(s, s == 'A')
```
```G False
A True
T False
T False
A True
C False
A True
A True
A True
```
In :
```# Count the number of A's in the sequence
s = 'GATTACAAA'

num_a = 0

for i in range(len(s)):
nucleotide = s[i]

# Test if nucleotide is A
if nucleotide == 'A':
num_a = num_a + 1

# Let's look at what happens
print('Nucleotide %s at position %i' % (nucleotide, i))
print('Is it an A?:', nucleotide == 'A')
print('Number of A so far:', num_a)
print('\n')

print('Number of A\'s:', num_a)
```
```Nucleotide G at position 0
Is it an A?: False
Number of A so far: 0

Nucleotide A at position 1
Is it an A?: True
Number of A so far: 1

Nucleotide T at position 2
Is it an A?: False
Number of A so far: 1

Nucleotide T at position 3
Is it an A?: False
Number of A so far: 1

Nucleotide A at position 4
Is it an A?: True
Number of A so far: 2

Nucleotide C at position 5
Is it an A?: False
Number of A so far: 2

Nucleotide A at position 6
Is it an A?: True
Number of A so far: 3

Nucleotide A at position 7
Is it an A?: True
Number of A so far: 4

Nucleotide A at position 8
Is it an A?: True
Number of A so far: 5

Number of A's: 5
```
In :
```# Count the number of A's in the sequence
# Shortcut
s = 'GATTACAAA'

num_a = 0

for nucleotide in s:
# Test if nucleotide is A
if nucleotide == 'A':
num_a = num_a + 1

# Let's look at what happens
print('Nucleotide %s at position %i' % (nucleotide, i))
print('Is it an A?:', nucleotide == 'A')
print('Number of A so far:', num_a)
print('\n')

print('Number of A\'s:', num_a)
```
```Nucleotide G at position 8
Is it an A?: False
Number of A so far: 0

Nucleotide A at position 8
Is it an A?: True
Number of A so far: 1

Nucleotide T at position 8
Is it an A?: False
Number of A so far: 1

Nucleotide T at position 8
Is it an A?: False
Number of A so far: 1

Nucleotide A at position 8
Is it an A?: True
Number of A so far: 2

Nucleotide C at position 8
Is it an A?: False
Number of A so far: 2

Nucleotide A at position 8
Is it an A?: True
Number of A so far: 3

Nucleotide A at position 8
Is it an A?: True
Number of A so far: 4

Nucleotide A at position 8
Is it an A?: True
Number of A so far: 5

Number of A's: 5
```
In :
```# Count the number of each nucleotide in the sequence
s = 'GATTACAAA'

num_t = 0
num_c = 0
num_a = 0
num_g = 0

for i in range(len(s)):
nucleotide = s[i]

if nucleotide == 'T':
num_t += 1           # this means num_t = num_t + 1
elif nucleotide == 'C':
num_c += 1           # this means num_c = num_c + 1
elif nucleotide == 'A':
num_a += 1           # this means num_a = num_a + 1
elif nucleotide == 'G':
num_g += 1           # this means num_g = num_g + 1

# Let's look at what happens
print(i, nucleotide)
print('Counts of T,C,A,G:', num_t, num_c, num_a, num_g)
print('\n')

print('Final counts of T,C,A,G:', num_t, num_c, num_a, num_g)
print('Length equal to the sum of counts?', len(s) == (num_t + num_c + num_a + num_g))
```
```0 G
Counts of T,C,A,G: 0 0 0 1

1 A
Counts of T,C,A,G: 0 0 1 1

2 T
Counts of T,C,A,G: 1 0 1 1

3 T
Counts of T,C,A,G: 2 0 1 1

4 A
Counts of T,C,A,G: 2 0 2 1

5 C
Counts of T,C,A,G: 2 1 2 1

6 A
Counts of T,C,A,G: 2 1 3 1

7 A
Counts of T,C,A,G: 2 1 4 1

8 A
Counts of T,C,A,G: 2 1 5 1

Final counts of T,C,A,G: 2 1 5 1
Length equal to the sum of counts? True
```
In :
```# Count the number of each nucleotide in the sequence
# Using a list to store counts
s = 'GATTACAAA'

cnts = [0, 0, 0, 0]  # counts of T, C, A, G

for i in range(len(s)):
nucleotide = s[i]

if nucleotide == 'T':
cnts += 1           # this means num_t = num_t + 1
elif nucleotide == 'C':
cnts += 1           # this means num_c = num_c + 1
elif nucleotide == 'A':
cnts += 1           # this means num_a = num_a + 1
elif nucleotide == 'G':
cnts += 1           # this means num_g = num_g + 1

# Let's look at what happens
print(i, nucleotide)
print('Counts of T,C,A,G:', cnts)
print('\n')

print('Final counts of T,C,A,G:', cnts)
print('Length equal to the sum of counts?', len(s) == sum(cnts))
```
```0 G
Counts of T,C,A,G: [0, 0, 0, 1]

1 A
Counts of T,C,A,G: [0, 0, 1, 1]

2 T
Counts of T,C,A,G: [1, 0, 1, 1]

3 T
Counts of T,C,A,G: [2, 0, 1, 1]

4 A
Counts of T,C,A,G: [2, 0, 2, 1]

5 C
Counts of T,C,A,G: [2, 1, 2, 1]

6 A
Counts of T,C,A,G: [2, 1, 3, 1]

7 A
Counts of T,C,A,G: [2, 1, 4, 1]

8 A
Counts of T,C,A,G: [2, 1, 5, 1]

Final counts of T,C,A,G: [2, 1, 5, 1]
Length equal to the sum of counts? True
```
In :
```# Count the number of each nucleotide in the sequence
# Using a dictionary to store counts
s = 'GATTACAAA'

# cnts = [0, 0, 0, 0]  # counts of T, C, A, G
cnts = {'T':0, 'C':0, 'A':0, 'G':0}

for i in range(len(s)):
nucleotide = s[i]

if nucleotide == 'T':
cnts['T'] += 1           # this means num_t = num_t + 1
elif nucleotide == 'C':
cnts['C'] += 1           # this means num_c = num_c + 1
elif nucleotide == 'A':
cnts['A'] += 1           # this means num_a = num_a + 1
elif nucleotide == 'G':
cnts['G'] += 1           # this means num_g = num_g + 1

# Let's look at what happens
print(i, nucleotide)
print('Counts of T,C,A,G:', cnts)
print('\n')

print('Final counts of T,C,A,G:', cnts)
print('Length equal to the sum of counts?', len(s) == sum(cnts.values()))
```
```0 G
Counts of T,C,A,G: {'T': 0, 'C': 0, 'A': 0, 'G': 1}

1 A
Counts of T,C,A,G: {'T': 0, 'C': 0, 'A': 1, 'G': 1}

2 T
Counts of T,C,A,G: {'T': 1, 'C': 0, 'A': 1, 'G': 1}

3 T
Counts of T,C,A,G: {'T': 2, 'C': 0, 'A': 1, 'G': 1}

4 A
Counts of T,C,A,G: {'T': 2, 'C': 0, 'A': 2, 'G': 1}

5 C
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 2, 'G': 1}

6 A
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 3, 'G': 1}

7 A
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 4, 'G': 1}

8 A
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 5, 'G': 1}

Final counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 5, 'G': 1}
Length equal to the sum of counts? True
```
In :
```# Count the number of each nucleotide in the sequence
# Using a dictionary to store counts
s = 'GATTACAA'

cnts = {'T':0, 'C':0, 'A':0, 'G':0}

for i in range(len(s)):
nucleotide = s[i]

cnts[nucleotide] += 1

# Let's look at what happens
print(i, nucleotide)
print('Counts of T,C,A,G:', cnts)
print('\n')

print('Final counts of T,C,A,G:', cnts)
print('Length equal to the sum of counts?', len(s) == sum(cnts.values()))
```
```0 G
Counts of T,C,A,G: {'T': 0, 'C': 0, 'A': 0, 'G': 1}

1 A
Counts of T,C,A,G: {'T': 0, 'C': 0, 'A': 1, 'G': 1}

2 T
Counts of T,C,A,G: {'T': 1, 'C': 0, 'A': 1, 'G': 1}

3 T
Counts of T,C,A,G: {'T': 2, 'C': 0, 'A': 1, 'G': 1}

4 A
Counts of T,C,A,G: {'T': 2, 'C': 0, 'A': 2, 'G': 1}

5 C
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 2, 'G': 1}

6 A
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 3, 'G': 1}

7 A
Counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 4, 'G': 1}

Final counts of T,C,A,G: {'T': 2, 'C': 1, 'A': 4, 'G': 1}
Length equal to the sum of counts? True
```
In [ ]:
```# Count the GC content of the sequence
# Using dictionary, but even shorter method
s = 'GATTACAAA'

cnts = {'T':0, 'C':0, 'A':0, 'G':0}

for nucleotide in s:
cnts[nucleotide] += 1

print('Final counts of T,C,A,G:', cnts)

gc_cnt = cnts['G'] + cnts['C']
gc_pct = gc_cnt / len(s)

print('GC count:', gc_cnt)
print('Percent GC:', gc_pct)
```
In :
```# Given the genetic code, translate the sequence into amino acids
# Go to goo.gl/Hby7WL to copy this dictionary
GENETIC_CODE = {
'AAA': 'K',
'AAC': 'N',
'AAG': 'K',
'AAT': 'N',
'ACA': 'T',
'ACC': 'T',
'ACG': 'T',
'ACT': 'T',
'AGA': 'R',
'AGC': 'S',
'AGG': 'R',
'AGT': 'S',
'ATA': 'I',
'ATC': 'I',
'ATG': 'M',
'ATT': 'I',
'CAA': 'Q',
'CAC': 'H',
'CAG': 'Q',
'CAT': 'H',
'CCA': 'P',
'CCC': 'P',
'CCG': 'P',
'CCT': 'P',
'CGA': 'R',
'CGC': 'R',
'CGG': 'R',
'CGT': 'R',
'CTA': 'L',
'CTC': 'L',
'CTG': 'L',
'CTT': 'L',
'GAA': 'E',
'GAC': 'D',
'GAG': 'E',
'GAT': 'D',
'GCA': 'A',
'GCC': 'A',
'GCG': 'A',
'GCT': 'A',
'GGA': 'G',
'GGC': 'G',
'GGG': 'G',
'GGT': 'G',
'GTA': 'V',
'GTC': 'V',
'GTG': 'V',
'GTT': 'V',
'TAA': '*',
'TAC': 'Y',
'TAG': '*',
'TAT': 'Y',
'TCA': 'S',
'TCC': 'S',
'TCG': 'S',
'TCT': 'S',
'TGA': '*',
'TGC': 'C',
'TGG': 'W',
'TGT': 'C',
'TTA': 'L',
'TTC': 'F',
'TTG': 'L',
'TTT': 'F',
}
```
In :
```# Translate nucleotide sequence into amino acid sequence
s = 'GATTACAAA'
aa_list = []

for start in range(0, len(s), 3):
codon = s[start:start+3]  # get the codon sequence from the nucleotide sequence
aa = GENETIC_CODE[codon]  # use GENETIC_CODE to look up the corresponding amino acid

aa_list.append(aa)        # add the amino acid to the list

# Let's look at what happens
print(codon, aa)
print('Amino acid sequence:', aa_list)
print('\n')

# Convert list to string
aa = ''.join(aa_list)

print('Amino acid sequence:', aa)
```
```GAT D
Amino acid sequence: ['D']

TAC Y
Amino acid sequence: ['D', 'Y']

AAA K
Amino acid sequence: ['D', 'Y', 'K']

Amino acid sequence: DYK
```
In :
```# Translate nucleotide sequence into amino acid sequence
# Don't use a list
s = 'GATTACAAA'
aa = ''

for start in range(0, len(s), 3):
codon = s[start:start+3]  # get the codon sequence from the nucleotide sequence

# use GENETIC_CODE to look up the corresponding single-letter amino acid
# directly add to the end of the aa string
aa += GENETIC_CODE[codon]

# Let's look at what happens
print(codon)
print('Amino acid sequence:', aa)
print('\n')

print('Amino acid sequence:', aa)
```
```GAT
Amino acid sequence: D

TAC
Amino acid sequence: DY

AAA
Amino acid sequence: DYK

Amino acid sequence: DYK
```
In :
```# Translate nucleotide sequence into amino acid sequence
# Using while loop
s = 'GATTACAAA'
aa = ''

c = 0  # counter
while c < len(s):
codon = s[c:c+3]
aa += GENETIC_CODE[codon]

# Increment the counter by 3 to move 3 characters right
c += 3

print(codon)
print('Amino acid sequence:', aa)
print('Counter:', c)
print('\n')

print('Amino acid sequence:', aa)
```
```GAT
Amino acid sequence: D
Counter: 3

TAC
Amino acid sequence: DY
Counter: 6

AAA
Amino acid sequence: DYK
Counter: 9

Amino acid sequence: DYK
```
In :
```# Translate nucleotide sequence into amino acid sequence
# Using while loop to "consume" s
s = 'GATTACAAA'
aa = ''

while len(s) > 0:
codon = s[:3]
s = s[3:]
aa += GENETIC_CODE[codon]

print(codon)
print('Amino acid sequence:', aa)
print('Sequence:', s)
print('\n')

print('Amino acid sequence:', aa)
```
```GAT
Amino acid sequence: D
Sequence: TACAAA

TAC
Amino acid sequence: DY
Sequence: AAA

AAA
Amino acid sequence: DYK
Sequence:

Amino acid sequence: DYK
```

# Bonus¶

In [ ]:
```# Count the number of each nucleotide in the sequence
# Using a dictionary to store counts
# Even shorter
from collections import Counter

s = 'GATTACAAA'

# cnts = {'T':0, 'C':0, 'A':0, 'G':0}
cnts = Counter(s)

print('Final counts of T,C,A,G:', cnts)
print('Length equal to the sum of counts?', len(s) == sum(cnts.values()))
```
In [ ]:
```# Count the GC content of the sequence
# Using list comprehension
s = 'GATTACAAA'

# This will create a list [1, 0, 0, 0, 0, 1, 0, 0, 0]
# Then get the sum
gc_cnt = sum([1 if nucleotide in ['G', 'C'] else 0 for nucleotide in s])

gc_pct = gc_cnt / len(s)

print('GC count:', gc_cnt)
print('Percent GC:', gc_pct)
```
In [ ]:
```# Count the GC content of the sequence
# Using list comprehension
s = 'GATTACAAA'

# This will create a list [1, 1]
# Then get the sum
gc_cnt = sum([1 for nucleotide in s if nucleotide in ['G', 'C']])

gc_pct = gc_cnt / len(s)

print('GC count:', gc_cnt)
print('Percent GC:', gc_pct)
```
In [ ]:
```# Translate nucleotide sequence into amino acid sequence
# Shortcut
s = 'GATTACAAA'
aa_list = [GENETIC_CODE[s[start:start+3]] for start in range(0, len(s), 3)]
aa = ''.join(aa_list)

print('Amino acid sequence:', aa)
```