Bladeren bron

Final report WIP

Min 5 jaren geleden
bovenliggende
commit
59dcd32b49

File diff suppressed because it is too large
+ 1 - 1
docs/final_report/5-methods.tex


+ 50 - 24
docs/final_report/6-results.tex

@@ -96,28 +96,14 @@ A number of and programs have been written to test both processors. These involv
 	\item[$\bullet$ Printing:] Sends data to UART. It includes waiting until UART is available for transmission. 
 	\item[$\bullet$ Printing unsinged integer:] Uses binary-coded decimal algorithm to convert 8 or 16bit binary value to decimal value and print it. 
 	\item[$\bullet$ 16bit multiplication:] Uses simple matrix multiplication. 
-	\item[$\bullet$ 16bit division:] Uses Long division algorithm to divide two 16bit numbers, including reminder.
-	\item[$\bullet$ 16bit modulus:] Uses "Russian Peasant Multiplication" algorithm to perform Modulo operation with two 16bit numbers.
-	\item[$\bullet$ Prime number calculator:] Uses Sieve of Atkins algorithm to calculate primer numbers up to from number 5 to $2^{16}$. 
+	\item[$\bullet$ 16bit division:] Uses Long division algorithm to divide two 16bit numbers, result including a reminder.
+	\item[$\bullet$ 16bit modulo:] Uses "Russian Peasant Multiplication" algorithm to perform Modulo operation with two 16bit numbers.
+	\item[$\bullet$ Prime number calculator:] Uses Sieve of Atkins algorithm to calculate primer number, operates on 16bit numbers and utilise 16bit multiplication and modulo functions. 
 \end{description}
 
-\subsubsection{Performance}
-This subsection investigates time and clock cycles to run benchmark programs. Simulation was sued to find a number of cycles required to execute each function. 
-
-Print 16bit decimal and modulus were executed with different arguments to show the worst and the best case scenarios as algorithms length depend on inputs. This is not the case for 16bit multiplication as this it has no branching. 
-
-Results are shown in Figure \ref{fig:cycles}. In most cases, OISC requires around 55-67\% more instruction, with some exceptions. These results can be better explained in following subsection \ref{subsec:instr_comp}.
-
-\begin{colfigure}
-	\centering
-	\includegraphics[width=\linewidth]{../tests/cycles.eps}
-	\captionof{figure}{Simulated results of cycles that taken to perform function.}
-	\label{fig:cycles}
-\end{colfigure}
-
-
 
 \subsubsection{Instruction composition}\label{subsec:instr_comp}
+
 This test is performed to investigate instruction composition of each function to see how similar it is between RISC and OISC processors. 
 \begin{description}
 	\item[$\bullet$ MOVE] - All instructions that move data around internal processor registers.
@@ -212,6 +198,13 @@ Each function was ran on simulated processor, program counter and instruction be
 	\end{lstlisting}
 \end{blockpage}
 
+\begin{figure*}[ht!]
+	\centering
+	\includegraphics[width=\linewidth]{../tests/instr_comp.eps}
+	\caption{Graph of instruction composition for every benchmark program.}
+	\label{fig:instr_comp}
+\end{figure*}
+
 \begin{blockpage}
 	\begin{lstlisting}[frame=single, caption={OISC assembly frame for executring tests}, emph={setup,start,done} label=asm_oisc_test]
 	setup:
@@ -231,13 +224,43 @@ Each function was ran on simulated processor, program counter and instruction be
 
 Each function recorded file then was further analysed and each instruction was grouped. Recorded program counter was used to find effective program space. This has been achieved by calculating unique instances of program counter and summing up instruction size for each of them. In RISC, dynamic instruction size has been taken into account. 
 
-Results or each function composition are represented in figure \ref{fig:instr_comp}. 
-\begin{figure*}[t]
+From results in Figure \ref{fig:instr_comp} few key differences can be seen. Across every test, OISC has much more \textit{BRANCH} destination and \textit{MOVE} source groups. \textit{BRANCH} group can be explained by emulated \texttt{CALL}, \texttt{RET} and \texttt{JUMP} instruction explained in section \ref{subsec:oisc_pc}.
+High number of \textit{MOVE} source group instructions may be explained by using immediate values as separate source, where RISC uses instruction that integrate with immediate in instructions such as \texttt{ADDI}. In most cases \textit{ALU} group instructions are also higher than for OISC comparing to RISC. This shows lower OISC ALU efficiency, mostly due to need to move data to septate accumulators.
+
+\subsubsection{Performance}
+This subsection investigates time and clock cycles to run benchmark programs. Simulation was sued to find a number of cycles required to execute each function. Note that prime number calculator was not simulated due to too complex dynamic nature of program. 
+
+Print 16bit decimal and modulo operation were executed with different arguments to show the worst and the best case scenarios as algorithms length depend on inputs. This is not the case for 16bit multiplication as this it has no branching. 
+
+Results are shown in Figure \ref{fig:cycles}. In most cases, OISC requires around 55-67\% more instruction, with some exceptions. These results can be better explained in following subsection \ref{subsec:instr_comp}.
+
+\begin{colfigure}
 	\centering
-	\includegraphics[width=\linewidth]{../tests/instr_comp.eps}
-	\caption{Graph of instruction composition for every benchmark program.}
-	\label{fig:instr_comp}
-\end{figure*}
+	\includegraphics[width=\linewidth]{../tests/cycles.eps}
+	\captionof{figure}{Simulated results of cycles that taken to perform function.}
+	\label{fig:cycles}
+\end{colfigure}
+
+Another set of benchmarks have been performed and on both processors once they been implemented on FPGA. Time taken for perform each set has been recorded. This have been done via UART connection, a single character was sent to indicate start and stop of benchmark. In order to void slight timing variation due low baud rate of UART, each benchmark was performed many iterations. Figure \ref{fig:timing} represents results.
+
+\begin{colfigure}
+	\centering
+	\includegraphics[width=\linewidth]{../tests/timing.eps}
+	\captionof{figure}{Time taken perform each benchmark on FPGA.}
+	\label{fig:timing}
+\end{colfigure}
+
+Results indicate that on average OISC takes about 71\% longer to execute same benchmark. This is close to results found with simulation. Prime number calculator have taken 3.26 times longer.
+
+Benchmarks include:
+\begin{description}
+	\item[$\bullet$ Prime Numbers:] Calculate every prime number between 5 to $2^{16}$.  
+	\item[$\bullet$ Multipy:] 16bit multiplication iterated 65536 times.
+	\item[$\bullet$ Modulo 0010h:] 16bit \textit{0010h} modulo that operated on every number between 0 and 65536.
+	\item[$\bullet$ Modulo FFFFh:] 16bit \textit{FFFFh} modulo that operated on every number between 0 and 65536.
+	\item[$\bullet$ BDC:] Encoded 16bit binary to ASCII decimal number without printing.
+\end{description}
+
 
 \subsubsection{Program space}
 
@@ -249,6 +272,9 @@ Figure \ref{fig:program_size} represents effective program size for each test fu
 	\label{fig:program_size}
 \end{colfigure}
 
+
+
+
 \subsection{Maximum clock frequency}
 To find maximum clock frequency, processors were loaded with basic print string function an d 16bit multiplication. Then frequency was constantly increased until resulting output though UART was not correct. 
 

BIN
docs/final_report/index.pdf


+ 6 - 6
docs/final_report/index.toc

@@ -90,18 +90,18 @@
 \defcounter {refsection}{0}\relax 
 \contentsline {subsection}{\numberline {5.3}Benchmark Programs}{19}{subsection.5.3}% 
 \defcounter {refsection}{0}\relax 
-\contentsline {subsubsection}{\numberline {5.3.1}Performance}{19}{subsubsection.5.3.1}% 
+\contentsline {subsubsection}{\numberline {5.3.1}Instruction composition}{19}{subsubsection.5.3.1}% 
 \defcounter {refsection}{0}\relax 
-\contentsline {subsubsection}{\numberline {5.3.2}Instruction composition}{20}{subsubsection.5.3.2}% 
+\contentsline {subsubsection}{\numberline {5.3.2}Performance}{22}{subsubsection.5.3.2}% 
 \defcounter {refsection}{0}\relax 
-\contentsline {subsubsection}{\numberline {5.3.3}Program space}{21}{subsubsection.5.3.3}% 
+\contentsline {subsubsection}{\numberline {5.3.3}Program space}{23}{subsubsection.5.3.3}% 
 \defcounter {refsection}{0}\relax 
-\contentsline {subsection}{\numberline {5.4}Maximum clock frequency}{21}{subsection.5.4}% 
+\contentsline {subsection}{\numberline {5.4}Maximum clock frequency}{23}{subsection.5.4}% 
 \defcounter {refsection}{0}\relax 
 \contentsline {subsection}{\numberline {5.5}Future work}{23}{subsection.5.5}% 
 \defcounter {refsection}{0}\relax 
 \contentsline {section}{\numberline {6}Conclusion}{23}{section.6}% 
 \defcounter {refsection}{0}\relax 
-\contentsline {section}{\numberline {7}Appendix}{25}{section.7}% 
+\contentsline {section}{\numberline {7}Appendix}{26}{section.7}% 
 \defcounter {refsection}{0}\relax 
-\contentsline {subsection}{\numberline {7.1}Processor instruction set tables}{25}{subsection.7.1}% 
+\contentsline {subsection}{\numberline {7.1}Processor instruction set tables}{26}{subsection.7.1}% 

+ 21 - 0
docs/tests/cycles.m

@@ -54,6 +54,27 @@ legend("RISC", "OISC");
 grid on
 %set(gcf, 'Color', 'None')
 
+%%
+data = [
+    20.316 3.474 26.438 2.033 14.705
+    66.126 3.998 48.040 3.542 23.044
+]';
+grid on
+legend
+B = bar(1:length(data),data);
+x_labels = [
+    {'Prime Numbers'}
+    {'Multiply'}
+    {'Modulo 0010h'}
+    {'Modulo FFFFh'}
+    {'BCD'}
+];
+set(gca,'XTickLabel', x_labels);
+ylabel('Time (s)')
+title("Time taken for each benchmark")
+grid on
+legend("RISC", "OISC");
+xtickangle(30);
 
 %%
 figure

File diff suppressed because it is too large
+ 2584 - 2369
docs/tests/instr_comp.eps


File diff suppressed because it is too large
+ 1497 - 0
docs/tests/instr_mul_u16.eps


+ 133 - 28
docs/tests/parts.m

@@ -1,41 +1,146 @@
 close all; clc
-load 'risc8.mat' risc8
-load 'oisc8.mat' oisc8
+% reportrisc = csvread('report_risc.csv');
+% reportoisc = csvread('report_oisc.csv');
+% load 'risc8.mat' risc8
+% load 'oisc8.mat' oisc8
 
-data = table2array(risc8(:,2:end));
-names = table2array(risc8(:,1));
-gnames = risc8.Properties.VariableNames(2:end);
+data = table2array(reportrisc(:,2:end-1));
+names = table2array(reportrisc(:,1));
+gnames = reportrisc.Properties.VariableNames(2:end-1);
+data = data-data(6,:);
+data(6,:) = [];
+data(1,:) = [];
 
-data2 = table2array(oisc8(:,2:end));
-names2 = table2array(oisc8(:,1));
-gnames2 = oisc8.Properties.VariableNames(2:end);
+data2 = table2array(reportoisc(:,2:end-1));
+data2 = data2-data2(6,:); data2(data2<0)=0;
+data2(6,:) = [];
+data2(1,:) = [];
+
+names2 = table2array(reportoisc(:,1));
+gnames2 = reportoisc.Properties.VariableNames(2:end-1);
 gnames2_dst = erase(gnames2(1:2:end),"dst");
 gnames2_src = erase(gnames2(2:2:end),"src");
 
+namesf = {
+%     "16bit division 0001h / 0001h";
+    "16bit Modulo 0001h % FFFFh";
+    "16bit Modulo FFFFh % 0001h";
+    "16bit Modulo FFFFh % FFFFh";
+    "16bit Multiplication";
+%     "test functions";
+    "Print Character";
+    "Print 16bit unsigned int FFFFh";
+    "Print 8bit unsigned int 00h";
+    "Print 8bit unsigned int FFh";
+};
+
+d3names = {'Mod 0001h % FFFFh' 'Mod FFFFh % 0001h' ...
+    'Mod FFFFh % FFFFh' '16bit multiply' ...
+    'Print char' 'Print uint16 FFFFh' ...
+    'Print uint8 00h' 'Print uint8 FFh'};
+x2 = categorical(d3names);
+x2 = reordercats(x2,d3names); 
+data3 = [table2array(reportrisc(:,end))'; table2array(reportoisc(:,end))']';
+data3(6,:) = [];
+data3(1,:) = [];
+bar(x2, data3, 1);
+grid on
+ylabel('Program size in bits')
+legend('RISC', 'OISC')
+xtickangle(60)
+title('Benchmark functions effective program size')
+%%
+
 x = categorical(gnames);
 x = reordercats(x,gnames);
-
+% 
 % t = tiledlayout(5,2);
-for i=1:10
-    i=4;
-    d0 = data(6,:);
+[ha, pos] = tight_subplot(4,2,[.05 .05],[.15 .05],[.07 .01]);
+
+for i=1:8
+    axes(ha(i));
+%     subplot(4,2,i)
+    d0 = data(i,:);
     d1 = data2(i,:);
     d_src = d1(1:2:end);
     d_dst = d1(2:2:end);
-    
-    figure;
-    pie(d0(~d0==0))
-    legend(gnames(~d0==0), "interpreter", "None")
-    title("RISC 'multiply 16bit' function instruction composition");
-    
-    figure;
-    pie(d_src(~d_src==0))
-    legend(gnames2_src(~d_src==0), "interpreter", "None")
-    title("OISC 'multiply 16bit' function src. instruction composition");
-    
-    figure
-    pie(d_dst(~d_dst==0))
-    legend(gnames2_dst(~d_dst==0), "interpreter", "None")
-    title("OISC 'multiply 16bit' function dest. instruction composition");
-    break
+    B = bar(x, [d0; d_src; d_dst]', 1);
+    if mod(i,2)==1
+        ylabel('Instructions')
+    end
+    grid on
+    title([namesf(i)])
+%     set(gcf,'Position',[100 100 500 300])
 end
+set(ha(1:6),'XTickLabel','');
+legend({'RISC', 'OISC Destination', 'OISC Source'})
+
+
+
+
+%%
+
+OISCF = 1705;
+RISCF = 3218;
+% ALU, MEM
+x = categorical({'RISC', 'OISC'});
+y0 = [
+ 293 RISCF-2845 RISCF-2563 RISCF-((RISCF-2845) + (RISCF-2563))-293;
+ 293 486 225 701;
+];
+
+bar(x, y0, 'stacked');
+legend('COMMON', 'ALU', 'MEMORY', 'OTHER')
+grid on
+ylabel("Logic elements")
+title("Processors FPGA logic element composition")
+
+figure
+y1 = [
+ 170 1 407-315 144;
+ 170 142 86 328;
+];
+bar(x, y1, 'stacked');
+legend('COMMON', 'ALU', 'MEMORY', 'OTHER')
+grid on
+ylabel("Registers")
+title("Processors FPGA register usage composition")
+
+
+% EMPTY Processor
+%Total logic elements	293 / 22,320 ( 1 % )  // 294?? 
+%Total registers	170
+
+% RISC FULL
+%Total logic elements	3,218 / 22,320 ( 14 % )
+%Total registers	407
+
+% RISC without ALU
+%Total logic elements	2,845 / 22,320 ( 13 % )
+%Total registers	406
+
+% RISC without Memory
+%Total logic elements	2,563 / 22,320 ( 11 % )
+%Total registers	315
+
+%
+
+%OISC without rom
+%Total logic elements	291 / 22,320 ( 1 % )
+%Total registers	170
+
+% OISC FULL
+%Total logic elements	1,705 / 22,320 ( 8 % )
+%Total registers	726
+%Total memory bits	93,184 / 608,256 ( 15 % )
+%Embedded Multiplier 9-bit elements	1 / 132 ( < 1 % )
+
+% OISC without ALU
+%Total logic elements	1,219 / 22,320 ( 5 % )
+%Total registers	584
+%Total memory bits	93,184 / 608,256 ( 15 % )
+
+% OISC without mem/stack logic
+%Total logic elements	1,480 / 22,320 ( 7 % )
+%Total registers	640
+%Total memory bits	93,184 / 608,256 ( 15 % )

+ 11 - 3
docs/tests/power_tests.m

@@ -1,7 +1,7 @@
 % matrix of [shunt volt mean, shunt volt std, supply volt, supply std]
 shunt=1.020;  %ohms
 data = [
-    80.599e-3   49e-6     4.0162  2e-3      % Empty FPGA
+%     80.599e-3   49e-6     4.0162  2e-3      % Empty FPGA
     89.47e-3    36e-6     4.0243  2.214e-3  % Empty socket test
     89.849e-3   33.1e-6   4.026   6e-3      % OISC8 mult 16bit loop
     89.968e-3   35.6e-6   4.0222  2.47e-3   % RISC8 mult 16bit loop
@@ -11,6 +11,14 @@ I=data(:,1)*shunt;  % current vector
 P=(data(:,3)-data(:,1)).*I;  % power in W
 Pstd=data(:,2).*data(:,4);
 
-bar(1:4,P)                
+xnames = {'Auxilary' 'OISC' 'RISC'};
+x = categorical(xnames);
+x = reordercats(x,xnames); 
+bar(x,P)   
+
 hold on
-er = errorbar(1:4,P,-Pstd./2,+Pstd./2); 
+er = errorbar(x,P,-Pstd./2,+Pstd./2); 
+er.Color = [0 0 0];                            
+er.LineStyle = 'none';  
+grid on
+

+ 11 - 5
docs/tests/report_oisc.csv

@@ -1,5 +1,11 @@
-,ALU_ACC-dst,ALU_ACC-src,ALU-dst,ALU-src,REGS-dst,REGS-src,BRP_REG-dst,BRP_REG-src,MEMP_REG-dst,MEMP_REG-src,MEMORY-dst,MEMORY-src,STACK-dst,STACK-src,COM-dst,COM-src,BRANCH-dst,BRANCH-src,OTHER-dst,OTHER-src,IMMEDIATE-dst,IMMEDIATE-src
-oisc8_mod_u16.log,356,2,0,272,39,101,338,0,70,0,66,186,38,38,0,0,169,0,0,0,0,477
-oisc8_sieve.log,116207,8688,0,76418,13480,27516,90230,0,30208,11540,22468,36268,17720,17716,0,0,45673,0,0,0,0,157840
-oisc8_none.log,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,3,0,0,0,0,9
-oisc8_mul_u16.log,21,3,0,16,6,6,4,0,4,0,8,6,10,10,0,0,2,0,0,0,0,14
+,MOVE-dst,MOVE-src,ALU-dst,ALU-src,MEMORY-dst,MEMORY-src,STACK-dst,STACK-src,COM-dst,COM-src,BRANCH-dst,BRANCH-src,OTHER-dst,OTHER-src,size
+oisc8_div_u16_0001_0001.log,52,479,262,163,79,51,51,51,0,0,300,0,0,0,1794
+oisc8_mod_u16_0001_ffff.log,4,28,13,11,4,3,1,3,0,0,23,0,0,0,507
+oisc8_mod_u16_ffff_0001.log,38,586,356,272,134,186,38,38,0,0,516,0,0,0,1521
+oisc8_mod_u16_ffff_ffff.log,8,66,28,18,16,13,8,8,0,0,45,0,0,0,1261
+oisc8_mul_u16.log,6,23,21,16,12,6,10,10,0,0,6,0,0,0,689
+oisc8_none.log,0,9,0,0,0,0,0,0,0,0,9,0,0,0,104
+oisc8_print_char.log,1,22,3,2,0,0,2,2,3,1,18,0,0,0,338
+oisc8_print_u16_ffff.log,51,324,173,107,14,60,43,43,15,5,243,0,0,0,1898
+oisc8_print_u8_00.log,6,39,9,5,0,0,6,6,3,1,27,0,0,0,650
+oisc8_print_u8_ff.log,8,80,27,15,0,0,12,12,9,3,54,0,0,0,1105

+ 11 - 11
docs/tests/report_risc.csv

@@ -1,11 +1,11 @@
-,MOVE,COPY,ALU,MEM_LOAD,MEM_SAVE,CI,STACK,COM,BRANCH,CALL,OTHER
-risc8_mod_u16_0001_ffff.log,0,4,4,0,2,2,10,0,6,3,0
-risc8_mod_u16_ffff_ffff.log,0,4,11,2,4,8,16,0,13,3,0
-risc8_div_u16_0001_0001.log,326,170,1380,64,34,48,1786,748,872,1518,0
-risc8_print_u16_ffff.log,9,12,14,3,3,5,32,8,15,27,0
-risc8_print_u8_00.log,1,2,5,0,0,0,12,2,7,6,0
-risc8_mul_u16.log,0,4,18,8,8,0,8,0,4,3,0
-risc8_none.log,0,0,0,0,0,0,0,0,4,0,0
-risc8_print_u8_ff.log,2,6,14,0,0,1,18,6,10,12,0
-risc8_print_char.log,0,1,2,0,0,0,2,2,5,3,0
-risc8_mod_u16_ffff_0001.log,0,4,176,32,34,115,136,0,150,3,0
+,MOVE,ALU,MEMORY,STACK,COM,BRANCH,OTHER,size
+risc8_div_u16_0001_0001.log,544,1380,98,1786,748,2390,0,3512
+risc8_mod_u16_0001_ffff.log,6,4,2,10,0,9,0,416
+risc8_mod_u16_ffff_0001.log,119,176,66,136,0,153,0,992
+risc8_mod_u16_ffff_ffff.log,12,11,6,16,0,16,0,864
+risc8_mul_u16.log,4,18,16,8,0,7,0,792
+risc8_none.log,0,0,0,0,0,4,0,72
+risc8_print_char.log,1,2,0,2,2,8,0,224
+risc8_print_u16_ffff.log,26,14,6,32,8,42,0,1136
+risc8_print_u8_00.log,3,5,0,12,2,13,0,448
+risc8_print_u8_ff.log,9,14,0,18,6,22,0,680

File diff suppressed because it is too large
+ 1304 - 0
docs/tests/timing.eps