Chapter 13: Real-Time BeagleBone Interfacing

Chapter 15: Real-Time Interfacing with the PRU-ICSS

/Chapter 15: Real-Time Interfacing with the PRU-ICSS
Chapter 15: Real-Time Interfacing with the PRU-ICSS2019-01-11T16:13:51+00:00


This is the chapter web page to support the content in Chapter 15 of the book: Exploring BeagleBone – Tools and Techniques for Building with Embedded Linux. The summary introduction to the chapter is as follows:

In this chapter you are introduced to real-time interfacing with the Beagle boards. The AM335x has two programmable real-time units (PRUs) that can be used for certain real-time operations, and these are the focus of this chapter, which describes input and output examples that help explain the operation of the PRUs and their encompassing industrial communication subsystem (PRU-ICSS). Finally, the real-time capabilities of the AM335x are demonstrated using two applications—the first generates a custom waveform on a GPIO, and the second uses a low-cost ultrasonic distance sensor that requires precise timing in order to communicate the distance to an obstacle.

Learning Outcomes

After completing this chapter, you should be able to do the following:
 Describe real-time kernel and hardware solutions that can be used on the Beagle boards.
 Use tools such as the PRU Debugger and Texas Instruments’ PRU Code Generation Tools (CGT) for PRU-ICSS application development.
 Write a PRU program that can flash an LED and transfer it to the PRU-ICSS using remoteproc.
 Describe the important features of the PRU-ICSS, such as its structure, registers, memory addressing, and assembly language instructions.
 Write a PRU program that shares memory with a Linux host application.
 Write a PRU program that interfaces to regular GPIOs that are in Linux host space.
 Write a PRU program that generates PWM signals, and adapt it to output user-defined analog waveforms on a GPIO pin.
 Apply the PRU to sensor interfacing applications for which time measurement is important, such as interfacing to ultrasonic distance sensors.

Digital Media Resources

Here the digital resources referred to in the chapter web page are provided. There are high-resolution versions of some of the important figures and links to videos, resources and websites that are described in the chapter.

BeagleBone Poster Icon Image

The PRU Instruction Set Summary

This summary sheet was created with permission from information that is courtesy of Texas Instruments. It is a version of Figure 13-7 of the book:

Additional Content

When possible I am adding additional content to this website to support the book by tackling different topics, in particular those topics:

  • that could not be covered within the time frame of the development of the book due to their complexity or specificity,
  • that are subject to so much change that a book could not reasonably capture those changes and remain up to date,
  • that could not be covered within the page numbers available in order to keep the price of the book reasonable.

In this section two main topics are covered in detail: High-speed Analog-to-Digital Conversion using the PRU-ICSS (an advanced topic) and Clock-Signal Generator Circuits using the PRU-ICSS.

High-Speed Analog to Digital Conversion (ADC) using the PRU-ICSS

The MCP3008 8-Channel 10-bit ADC (100kSps max)

[Added February 2015] Chapter 13 in the book describes the operation of the PRU-ICSS and provides several different examples for reading and writing from/to GPIOs, including high-speed digital interfacing examples.  However, one topic that is not covered is high-speed analog sampling. As described in the chapter, the PRU-ICSS is a very powerful device but it does not have straightforward access to the AM335x’s on-board ADC.

This is a complex topic and it should not be your first interaction with the PRU-ICSS. Please review the earlier examples in Chapter 13 before tackling this topic. It is the most challenging PRU-ICSS code that I have written to date – the final outcome may seem reasonably straightforward, but there were several incorrect intermediate versions (several!).

This example utilizes the work that is described in Chapter 8 – in particular, the additional content that was added in January 2015 on an “SPI Analog to Digital Converter (ADC) Example”. I have purposefully structured it this way, so that you can build, test and become familiar with the limitations of the circuit in Chapter 8 before progressing to this, the high-speed version.

The MCP3008 is used again for the first version of this circuit. It is a low-cost PDIP chip that has eight selectable channels. This fact allows for a breadboard implementation, but there are more advanced ICs available (generally surface mounted), such as the ADS7883.

Here are the features of the solution that is presented in this discussion:

    • It has a configurable sampling rate – up to 100KSps with this IC. The sampling rate can be configured from within Linux userspace. Higher sample rates are possible with alternative ADCs (to follow soon).
    • The samples can be captured free from jitter. Both PRUs are employed in order to achieve this.
    • The input channel on the MCP3008 and the single-ended/differential inputs can be chosen and configured from Linux userspace.
    • The quantity of data to be captured can be configured from Linux userspace and it is not limited by the relatively low PRU memory space size. The current solution is limited by the amount of unused DDR memory.
    • The PRU programs automatically determine Linux memory addresses and size limitations.
    • The program supports 10-bit, 12-bit and 16-bit ADCs (The MCP3008 is a 10-bit ADC).
  • A custom device tree overlay (DTO) is made available for this example.

Here are the current downsides of the code example that is presented in this solution:

    • The sample duration is currently limited by the amount of DDR memory made available to the PRU-ICSS (this can be many megabytes). I have not written the userspace code to “consume” the samples as they arrive in Linux userspace (I don’t think it is overly problematic – that task is marked TODO).
    • The code has not been overly optimized (for pedagogical reasons) – it’s still fast enough!
  • The sampling rate is regular (jitter free), but the rate isn’t necessarily precise (e.g., 100kHz might be 100.01kHz and will always be 100.01kHz) – if you have a very specific rate in mind for your application then you can tweak the code to achieve a very precise rate (with 5ns period increments).

The Circuit

The Circuit is configured as in Figure 13.A1. Four lines are required for this IC as follows:

    • P9_27 pr1_pru0_pru_r30_5 SPI_CS0 (Chip Select) – The chip select is used to initiate communication with the MCP3008. This is an active-low line that must be brought low in order to capture a sample. This line must be set high in between each sample.
    • P9_30 pr1_pru0_pru_r30_2 SPI_SCLK (CLK) – This is the data transfer clock (This is NOT the ADC sample clock). The BeagleBone PRU code generates this data transfer clock, which synchronizes communication between the BBB and the MCP3008.
    • P9_29 pr1_pru0_pru_r30_1 SPI_D1 (MOSI) – Master out, Slave in. This line is used by the BBB PRU code to configure the MCP3008 (i.e., select which input and the ADC type – for example, to select single-ended mode on Channel 0, we can send 0x01 0x80 0x00)
  • P9_28 pr1_pru0_pru_r31_3 SPI_D0 (MISO) – Master in, Slave out. This line is used by the MCP3008 to transfer a 10-bit sample back to the BBB. The MOSI and MISO lines communicate data simultaneously.

Any PRU input/output pins can be used for this task – there is nothing unique about the particular PRU pins that are chosen in this example (i.e., there is no special SPI communication functionality on these pins). The colors of the lines are kept consistent throughout this example.

Figure 13.A1: The PRU-ICSS ADC circuit (click any figure in this section for a high-resolution version)

A device tree overlay (DTO) is available to configure the pins correctly (see Listing 13.A1). The pins are configured by using the tables in Chapter 6. There is no requirement for pull-up/down resistors in this case, so they have not been enabled. Remember from the discussion in Chapter 13 that pru0 designated pins are accessible from PRU0, and pru1 designated pins are accessible from PRU1. In this solution the SPI code is executed on PRU0 and the timer code is executed on PRU1. Also, remember that r30 refers to an output, and r31 refers to an input; hence, pin Mode 5 and Mode 6 are chosen as follows:

  • 0x1a4 0x0d // CS   P9_27 pr1_pru0_pru_r30_5, MODE5 | OUTPUT | DIS  00001101=0x0d
  • 0x19c 0x2e // MISO P9_28 pr1_pru0_pru_r31_3, MODE6 | INPUT  | DIS  00101110=0x2e
  • 0x194 0x0d // MOSI P9_29 pr1_pru0_pru_r30_1, MODE5 | OUTPUT | DIS  00001101=0x0d
  • 0x198 0x0d // CLK  P9_30 pr1_pru0_pru_r30_2, MODE5 | OUTPUT | DIS  00001101=0x0d
  • 0x0a4 0x0d // SAMP P8_46 pr1_pru1_pru_r30_1, MODE5 | OUTPUT | DIS  00001101=0x0d

The last entry in the device tree overlay is purely for testing and can be removed. This test output can be used to test that the code on PRU1 is working correctly by connecting P8_46 to an oscilloscope or logic analyzer in order to validate the clock pulse signal.

Listing 13.A1: The Device Tree Overlay for this Example

The Programs

This example uses six different key steps, with four different code examples. The architecture of this solution requires the use of both PRUs, which are controlled from Linux userspace using a separate program. The architecture is described as follows, and is illustrated in Figure 13.A2 below:

1 Load the Device Tree Overlay (virtual cape) as above.
2 Allocate DDR external RAM for the sample data using Linux userspace kernel module tools.
3 The main Linux executable (pruadc). This program loads the two PRU programs into the PRU-ICSS transfers the configuration to the PRU memory spaces and starts the execution of both PRU programs. The source code is in PRUADC.c
4 The PRU ADC code (bin). This program is placed in PRU0 and it performs the sampling role by communicating via SPI to the MCP3008. It also transfers the sample data back to Linux userspace DDR external RAM. The source code is in PRUADC.p
5 The PRU sample clock (bin). This program acts as an internal sample clock. Its frequency can be configured from Linux userspace. If you wish to preserve this PRU you could replace this functionality with the use of an external crystal oscillator. The source code is in PRUClock.p
6 The Memory -> File program (mem2file). This Linux userspace program takes the samples from DDR external RAM and outputs them to the standard output which can be re-directed to a file. The source code is in mem2file.c (this program is based on the devmem2 program that is discussed throughout the book).

Figure 13.A2: The structure and interaction between the various programs (click any figure in this section for a high-resolution version)

The steps/programs are identified in Figure 13.A2 above and are now described below: Step 1. The DTO must be loaded for this code to be executed. As before, please disable the HDMI overlay, using the steps in the chapter, and use the following steps and check that the overlay has loaded correctly:

Step 2. The PRU-ICSS has a UIO driver that exports host event out interrupts, L3 RAM and DDR RAM to Linux userspace so that applications can interact with the PRU-ICSS. This driver is automatically loaded when the device tree overlay is loaded in Step 1. You can see this by using the lsmod command:

The module can be unloaded from the Linux kernel using the rmmod application so that we can reload it and alter its behavior:

The modprobe application can then be used to add the module back to the Linux kernel, however with the DDR external RAM size updated to a larger value. In this example a pool of 2,000,000 bytes is allocated for the sample data (i.e., 1 million 16-bit samples). 2,000,000 is 0x1E8480 in hexadecimal. The source code of uio_pruss.c can be used to identify module parameters, where you can see two parameters:

    • sram_pool_sz – SRAM pool size to allocate (default 16K).
  • extram_pool_sz – The external RAM pool size to allocate (default 256K).

The external RAM pool size can be modified as follows:

This modification can be tested using sysfs, as follows:

You can see that the size is set correctly and the base address is 0x9f600000 (this will vary). The pruadc.c program loads these values automatically using sysfs and transfers them to PRU0 memory (0x00000004 and 0x00000008) so that the PRUADC program can write directly to this external RAM pool in Linux userspace, directly from PRU0.

Please note that Step 2 was motivated and informed by the excellent work by Elias Bakken at Hipstercircuits.

Step 3. The pruadc program can be executed. This program loads the two PRU binaries (PRUClock.bin and PRUADC.bin) and loads them into PRU1 and PRU0 respectively. The source code for pruadc.c is provided in Listing 13.A2 below. You can see that the program also loads configuration values into PRU memory as follows:

  • The SPI Command String (4 bytes in PRU0 memory 0x00000000) – this value is the command that is sent from the BBB to the MCP3008. In the example, this is 0x01800000, where the six first most significant bytes are used. Note, you will see this exact same string of data used in the example in the additional materials in Chapter 8.
  • The DDR Address (4 bytes in PRU0 memory 0x00000004) – this value is the base address of the DDR external RAM pool so that the PRU0 can write directly to this memory space in order to store sample data. This avoids the tight limits in PRU memory space. In this example case the address is 0x9f600000 and this is determined automatically using sysfs (as above).
  • The DDR Size (4 bytes in PRU0 memory 0x00000008) – this value is the size of the DDR external RAM pool. This is determined automatically using sysfs (as above) and has the size 0x1E8480 in this example (i.e., 2,000,000 bytes).
  • The Clock Frequency (4 bytes in PRU1 memory 0x00002000) – this value is a period that is proportional to the sample clock frequency. There is a struct in pruadc.c that provides some sample periods (e.g., FREQ_100kHz).
  • The Clock Running flag (4 bytes in PRU1 memory 0x00002004) – the two LSBs of this value allow for the clock to be turned on/off or updated from Linux userspace, as described in the PRU Clock Example above.
  • The Sample Clock value (4 bytes in PRU shared memory 0x00010000) – this value is shared between PRU0 and PRU1 and allows the PRU0 to capture a sample whenever the clock that is driven by PRU1 generates a rising edge.

When this program is executed the PRUADC program captures 1 million 16-bit samples at a sample rate of 100KSps. Therefore, it takes about 10 seconds to execute. The program can be executed as follows:

Listing 13.A2: The PRUADC.c Program Listing (please note that you can expand or open the code in a new window using the controls in the top-right of the display box)

Step 4. The PRUClock PRU program (in Listing 13.A3) is executed automatically. It continues to output a clock signal at the chosen clock frequency. This signal is outputted to pr1_pru1_pru_30_1 (P9_46) for debugging purposes (Note: you can de-allocate this pin if required and modify the code slightly). More importantly, this value is updated in the LSB of the PRU shared memory 0x00010000 value Sample CLK. Listing 13.A3: The PRUClock.p Program Listing

Please note that there is a full discussion on PRU-based clocks in the next section below (PRU-based Clock Signal Generators).

Step 5. The PRUADC PRU program (in Listing 13.A4) is executed automatically when the pruadc program executes. The source code writes 24 bits to the MOSI pin (P9_29) on the rising edge of the data clock pulse (P9_30) and simultaneously reads 24 bits from the MISO pin (P9_28) on the falling edge of the data clock pulse. The CS pin is pulled low (P9_27) to instigate a sample request. This pin must go high between each requested sample. Figure 13.A3 illustrates the data transaction that takes place between the PRUADC program and the MCP3008. Again, this is based on the code that is presented in the additional material in Chapter 8.

Listing 13.A4: The PRUADC.p Program Listing

Figure 13.A3: The MCP3008 Data Communications Transaction

Figure 13.A4 is a capture of the Analog Discovery Logic Analyzer, which is using an SPI interpreter to decode the data being transmitted and received on the SPI bus (P9_27, P9_30, P9_29, and P9_28). In this example the ADC reference voltage is 3.3V and the Channel 0 input voltage is set at 453mV. The command 0x018000 is transmitted (i.e., 24-bits) and the response from the MCP3008 is 0x00008C. Only the last 12 bits are used and the remainder are ignored, giving a value of 0x8C = 140 decimal. 3.3 × (140/1024) = 0.451mV, which confirms correct operation of and communication with the MCP3008.       Figure 13.A4: Capture of a live data transaction between the PRUADC program and the MCP3008 IC

Step 6. Once the pruadc application has executed, the sample data is transferred to the DDR external memory pool where it remains unless the program is executed again or the uio_pruss kernel module is unloaded/re-loaded.  The mem2file program is a simple program that takes the data from memory and outputs it to the standard output so that it can be stored to a flat text-format file. The data is stored in DDR external memory with two bytes for each sample – the ADC program has a maximum of 16 bits of resolution per sample, which is sufficient for most low-cost ADCs and most applications. There is a script, plot (in Listing 13.A5), that captures the data from memory to a file and plots it to a PostScript file, which is further converted into a PDF format file. Listing 13.A5 The plot script

It can be executed as follows:

You can see some results from this application below in Figure 13.A5 and in the repository directory /chp13/adc/examples. In this example the Analog Discovery Waveform Generator applies a 500Hz sine wave (1.65V amplitude, +1.65V offset) to the Channel 0 input. Figure 13.A5 displays the sampled version of this wave — it is clear from this figure that the sample rate is very regular (albeit the sample frequency is slighly different than 100KSps). A PDF version of this figure is available here: plot_2000_samples_500Hz_input.

Figure 13.A5: An example output of the PRU SPI ADC application


While the structure for this application is slightly complex, it functions without any external oscillator for the sample clock and captures data at good sample rates considering the overall cost of the hardware configuration. The only limitation of the current implementation is that the sample size is limited by the available DDR memory pool size. That limitation can be addressed by streaming the data to the eMMC or to the network connection; however, it will required additional development work. If you use this work in your research, please cite this book.

The ADS7883 Single-Channel 12-bit ADC (1MSps max)

The MCP3008 can be used to sample up to 100kSps at 3.3V. For higher sampling rates there are alternative SPI ADC solutions available. Unfortunately, many are only available in surface-mount packages.

The ADS7883 (see the datasheet) is one such example. It is a 12-bit ADC that is capable of sampling at rates of up to 2MSps at 3.3V. Unfortunately it is only available in a 6-pin SOT23 package, which is a very small package for manual soldering. SOT to DIP form (0.1″) adapter boards are available, but even at that, it is a very small format IC. The ADS7883 is capable of sampling at 2MSps but the configuration that is presented in this discussion can only drive this IC at just over 1MSps approximately, as the software-controlled serial data clock frequency is approaching the limit of what is possible to generate using the PRU-ICSS.

The Circuit

The circuit configuration is illustrated in Figure 13.A6 and is very similar to the one that is used for the MCP3008 (in Figure 13.A1), with the exception that there is no requirement for the MOSI line, as there is no channel selection option on this IC. Rather, this IC samples and begins transmitting sample data as soon as the slave select (CS) line is pulled low by the BeagleBone. The serial clock line (CLK) is used for conversion and for synchronizing the serial data output.

Figure 13.A6: The PRU-ADC circuit for the ADS7883 SPI ADC

The circuit uses the same device tree overlay as is used in the MCP3008 example above, and can be loaded in the same way. In addition, the program code has the exact same form as that described in Figure 13.A2 above. The only difference is that there are two versions of the PRUADC.p program:

  • PRUADC_fixed_1MHz.p — this program can be used to sample at 1MHz (or a value close to that, which can be configured using the sample clock value in PRUADC.c). Replace the PRUADC.p program with this version and then use the build script.
  • PRUADC_variable_rate.p — this program can be configured to sample at any frequency up to 500kSps. You can set the desired clock frequency in the PRUADC.c program.

All of the source code is available in the GitHub repository directory: /chp13/adc/ADS7883/

Using the Example Code

The device tree overlay can be loaded, and the DDR memory can be configured to store the sample data as follows. The sample capacity can be configured to contain up to 8MB of data — in this case 8,000,000 bytes (0x7A1200 HEX) are allocated, which is sufficient to capture 4 million 12-bit data samples (16-bits is the default data size for this code structure):

The programs will automatically detect the available memory space using sysfs and will capture data until this buffer is filled with 2-byte samples (containing 12-bits in this case). At a sample rate of 1MSps, this buffer will store four seconds of data.

The data can then be plotted using the plot script, which will take some time to plot 4 million sample points on the PostScript file and even more time to convert this into a PDF format on the BeagleBone. You can transfer this PDF file to your desktop machine using sftp so that it can be viewed. In addition, the number of samples to plot can be configured by modifying the first line of the plot script. Figure 13-A7 illustrates the capture of 4 million samples at 1MSps of a 1Hz sine wave (For this example, a 10nF capacitor was present on the Vin line/GND to reduce high-frequency impulse noise).

Figure 13-A7: Four million samples at 1MSps of a 1Hz input sine wave (click for larger image)

Data Communication

Figure 13-A8 captures a data exchange on the PRU pseudo-SPI bus as it is sampling at a rate of 1MSps. There is no data transmitted from the PRU to the ADS7883 on the MOSI line (orange), only from the ADS7883 on the MISO line (green). In this figure you can also see the sample clock that is generated using PRU1 on the bottom row of the figure. The rising edge of the sample clock (using the PRU shared memory address) causes the PRU0 to pull the CS line low (as in the MCP3008 example). That triggers the ADS7883 to take a sample. The data communications serial clock (SCLK) is toggled and the data is captured and transmitted on the rising edge of the data clock pulse (Note: I spent quite some time with this as the datasheet is not clear that the data is centered on the rising edge! Connecting the IC to a scope and using true SPI resolved this fact. Also, pay careful attention to the figures in the datasheet as GND and Vin are transposed in some of the figures).

Figure 13-A8: The PRU SPI communications data transactions at 1MSps

You can see a single data sample transaction on the pseudo-SPI bus in Figure 13-A9 below. The data clock (CLK) uses the PRU instructions to transmit 16 samples in approximately 750nS (an effective rate of approximately 21.3MHz). Unfortunately, this is getting close to the physical limit of the PRUs, as each PRU instruction takes exactly 5ns, but the code to receive and capture the data must be intermixed with the code to toggle the data clock signal. The only way to achieve much faster sample rates would be to use a parallel ADC, which would itself create problems with enhanced GPIO availability. The rate could be pushed to approx. 1.5MSps with this application, but I don’t think that a much higher sample rates are possible with my code structure — perhaps 2MSps at most!

Figure 13-A9: The PRU SPI communications data transaction at 1MSps (1 data transaction)

In this example, the data that is transmitted in Figure 13-A9 on the MISO line (green) is 0x1ACC, which is 0001 1010 1100 1100 in binary. For this IC, the first two leading bits are always zero, as are the last two bits, giving: 0110 1011 0011, the 12-bit data sample which is 0x6B3 or 1715/4095 in decimal. The PRU code shifts the resulting sample right by two positions and ORs it with 0x00000FFF to ensure that the remainder of the data sample value is ignored. It then stores the sample data in external DDR memory.


The circuit can be tested by applying input signals and measuring the resulting sampled data. The datasheet for the ADS7883 recommends the use of decoupling capacitors on the  supply (See Figure 25 in the datasheet — note that Vin and GND are transposed) to reduce the sampling noise, and they are certainly necessary. Additionally, the datasheet recommends the use of buffering on the supply voltage (see Figure 27 in the datasheet) which is likely necessary if you are using the BeagleBone supply. With decoupling capacitors alone there is an unusual noise response on the input, only when the device is sampling. This can be observed in Figure 13.A10.

Figure 13.A10: Example signal noise that is present on the ADS7883 Vin input during sampling

The noise occurs every 10uS but is impulsive in nature. To counter this noise a small capacitor (e.g., 10nF — please choose this to suit the impedance of your application sensor type) can be placed between Vin and GND. You can choose this capacitor to suit the sampling rate required for your application. At 1MSps the Nyquist frequency is 500kHz so a high-frequency low-pass filter on the input can greatly improve the signal quality. The shared analog/digital GNDand Vref/Vcc likely reduces the overall IC cost, but it appears to affect the sample quality.

(a) 1 million samples at 1MSps — 5Hz input with noise (no Vin cap)

(b) 1 million samples at 1MSps — 5Hz input with Vin cap (10nF)

(c) first 1,000 of 1 million samples at 1MSps — 10kHz input with Vin cap (10nF)

(d) first 1,000 of 1 million samples at 1MSps — 100kHz input with Vin cap (10nF)

Figure 13.A11: Sample results with input signals of different frequences

The results in Figure 13.A11 illustrate some example outputs from this application. The first sample (a) illustrates the impact of the impulse noise on the captured signal. The second sample (b) illustrates the impact of adding a 10nF capacitor across Vin/GND for this example. In (c), only the first 1,000 samples are displayed of the signal which was sampled at 1MSps — you can see that the sampling rate is regular. Finally, in (d) the first 1,000 samples of a 100kHz input signal is displayed. At this rate there are only 10 samples for each period of the input signal, leading to the aliasing pattern, the regularity of which indicates that there is very low jitter.

Rebuilding the uio_pruss.ko Kernel Module

It may be necessary for you to rebuild the uio_pruss kernel module for your application. For example, you could alter the default external memory allocation so that you would not have to remove the module and reload it with the use of an argument that defines the size of memory to allocate.

The first step is to ensure that your BBB is set up to compile kernel modules. To do that you need to install the Linux-headers for your exact distribution. You can find them at Robert Nelson’s website — for example, at: http://rcn-ee.net/deb/precise-armhf/. Use uname -a to determine your exact distribution, and then download and install those Linux-headers on your BBB using the website — for example:

Under the 3.8.13 bone50 BBB Debian distribution you may have to create an empty file timex.h (i.e., touch timex.h) in the directory /usr/src/linux-headers-3.8.13-bone50/arch/arm/include/mach.

Then, using the source code in the GitHub repository you can alter the uio_pruss.c file to suit your application. In this source code example the default external memory size is set for 512K, rather than 256K:

Once the module is built, it will appear as uio_pruss.ko in the current directory. You can load this module and test it as follows:

You can see that the uio_pruss kernel module external memory now has a default size of 0x80000 —  524,288 bytes, which is 512Kbytes (note the big K) as specified in the new uio_pruss.c example code. To replace the default module, copy the uio_pruss.ko file to the bottom location below (remembering to make a backup of the original):

PRU-based Clock Signal Generators

[Added Feb, 2015] In this example a real-time variable-frequency clock is built using the PRU-ICSS. In Chapter 13, a PWM example is developed over a number of sections and the implementation in this discussion is relatively straightforward in comparison to that example. However, a PRU clock is a useful addition to the range of PRU examples available, and it is further used to explain how to develop a PRU program that can be executed “permanently” on a PRU, while remaining configurable from Linux userspace. The circuit required for this example is very straightforward. It uses the same DTD that is described in the chapter in order to output the clock signal on pin P9_27, which is pru0_pru_r30_5. The HDMI cape has been disabled, as discussed at the beginning of the chapter. In all examples within this section the Analog Discovery Oscilloscope is connected to P9_27 (and GND — e.g., pin P9_2) in order to measure the output response.

A Fixed-Frequency PRU-based Clock

It is useful to begin with a straightforward example before continuing to the variable-frequency clock example. In fact, if all you require is a fixed-frequency clock then the following example will suffice. The first code example demonstrates how a clock signal how a clock signal, which has a 50% duty cycle, can be configured by hard-coding a delay into the PRU program code. Remember that with the PRU-ICSS, each instruction takes exactly 5ns to execute. Using this fact the following program outputs a 20MHz square wave clock signal on P9_27. You must load the PRU overlay EBB-PRU in advance, and can execute the program using the following steps:

The source code for this example follows below. The full project is available in the /chp13/fixedPRUClock directory of the GitHub repository, but the important code is presented here along with comments describing its operation. Please note that there was a duplicated instruction required to ensure that the output clock signal has a 50% duty cycle (i.e., on for 50% of the time and off for 50% of the time).

An oscilloscope can be attached to the output pin P9_27 and it will provide you with output clock signals as captured below in Figure 13.B1.

  Figure 13.B1: Clock signal on P9_27 (a) 1MHz signal using a hard-coded delay value of 48 and (b) 100kHz using a hard-coded delay value of 498 utilize both PRUs.

One important feature of these signals is that they do not suffer from jitter as they are executing independent of the load that the embedded Linux processor is currently undertaking. The downside is that (without careful programming) you can only have two such signals, each of which is occupying a valuable PRU. An alternative configuration is to use an external crystal, but that is not necessary if your embedded application does not utilize both PRUs.

A Variable-Frequency PRU-based Clock

The example in the last section is improved in this section so that the frequency of the clock can be adjusted from the Linux userspace, even while the PRU program is running. For this example, the devmem2 application is utilized again; however, a C/C++ program could be written in its place. Once the PRU overlay is loaded and the oscilloscope is attached to P9_27 as before, the program can be executed as follows:

The program outputs a 1MHz clock signal by default. The program also identifies the PRU memory base address as above (at 0x4a300000) at which the period value can be placed (in 4 bytes). To output a clock signal at 1MHz a value of 47 is used, which is 0x2f in hexadecimal. The devmem2 program can be used to query this address, which will result in the following output:

The same devmem2 program can be used to write a new value (word in this case, using w) to the address — in this case 497, which is 0x1F1 in hexadecimal. This corresponds to an output clock signal frequency of approximately 100kHz:

However, the output will not be updated until the value in the subsequent memory address 0x4a300004 is update to a value of 3 (i.e., both LSBs are set to 1). It has been programmed this way to ensure that it is possible to update the clock frequency (see the PRU code):