more register-level STM32: SPI, DMA, DShot
This is my second attempt at this post, so it’ll probably be a bit shorter than the first time.
I first realized that the HAL_Init() function in the code autogenerated by STM32CubeMX actually does more than just initializing the HAL, it sets up flash, and the SysTick interrupt.
When setting up the flash, I needed to set the proper flash latency for the CPU speed and voltage. Consulting the table, I found 5 to be the correct option.
| Wait States (Latency) | 2.7V – 3.6V | 2.4V – 2.7V | 2.1V – 2.4V | 1.8V – 2.1V Prefetch OFF |
|---|---|---|---|---|
| 0 WS (1 CPU cycle) | 0 < HCLK ≤ 30 | 0 < HCLK ≤ 24 | 0 < HCLK ≤ 22 | 0 < HCLK ≤ 20 |
| 1 WS (2 CPU cycles) | 30 < HCLK ≤ 60 | 24 < HCLK ≤ 48 | 22 < HCLK ≤ 44 | 20 < HCLK ≤ 40 |
| 2 WS (3 CPU cycles) | 60 < HCLK ≤ 90 | 48 < HCLK ≤ 72 | 44 < HCLK ≤ 66 | 40 < HCLK ≤ 60 |
| 3 WS (4 CPU cycles) | 90 < HCLK ≤ 120 | 72 < HCLK ≤ 96 | 66 < HCLK ≤ 88 | 60 < HCLK ≤ 80 |
| 4 WS (5 CPU cycles) | 120 < HCLK ≤ 150 | 96 < HCLK ≤ 120 | 88 < HCLK ≤ 110 | 80 < HCLK ≤ 100 |
| 5 WS (6 CPU cycles) | 150 < HCLK ≤ 168 | 120 < HCLK ≤ 144 | 110 < HCLK ≤ 132 | 100 < HCLK ≤ 120 |
| 6 WS (7 CPU cycles) | — | 144 < HCLK ≤ 168 | 132 < HCLK ≤ 154 | 120 < HCLK ≤ 140 |
| 7 WS (8 CPU cycles) | — | — | 154 < HCLK ≤ 168 | 140 < HCLK ≤ 160 |
Enabling instruction and prefetch, and data and instruction cache is essentially free performance for my use case.
FLASH->ACR =
// Set the flash latency to 5 cycles to account for 168MHz clock
// [RM0090 3.5.1 & Table 11]
FLASH_ACR_LATENCY_5WS
// Enable CPU Instruction prefetch, instruction cache, and data cache
// [RM0090 3.5.2]
| FLASH_ACR_PRFTEN | FLASH_ACR_ICEN | FLASH_ACR_ICRST;
I then looked into NVIC priority grouping. According to the following table I could select a certain group and subgroup count. Because “Only the group priority determines preemption of interrupt exceptions,” and because I had 16 groups was more than I needed, 16 groups and 0 subgroups was the correct decision.
| PRIGROUP [2:0] | Binary Point | Group Priority Bits | Subpriority Bits | Group Priorities | Subpriorities |
|---|---|---|---|---|---|
| 0b0xx | 0bxxxx | [7:4] | None | 16 | None |
| 0b100 | 0bxxx.y | [7:5] | [4] | 8 | 2 |
| 0b101 | 0bxx.yy | [7:6] | [5:4] | 4 | 4 |
| 0b110 | 0bx.yyy | [7] | [6:4] | 2 | 8 |
| 0b111 | 0b.yyyy | None | [7:4] | None | 16 |
// Set the NVIC priority grouping to [PM0214 2.3.6, 4.4.5, Table 51, Table 48]
// This allows unique preemption levels for all the used interrupts
NVIC_SetPriorityGrouping(0);
The 1ms SysTick clock is not timing-critical, unlike much of the DShot protocol, and still less so then the SPI interrupts. Counterintuitvely, lower priority values are the most “higher” priority, so I set the SysTick interrupt priority to 15, the highest value available with my chosen grouping.
NVIC_SetPriority(SysTick_IRQn, 15);
I also quickly enable the configuration clock which I forgot to include before.
RCC->APB2ENR |= RCC_APB2ENR_SYSCFGEN;
(void)RCC->APB2ENR;
I then added the DMA controller clock enable lines to this block from last week.
// Enable Peripheral Clocks
RCC->AHB1ENR |=
// Enable GPIO [DS8626 Table 9 & Figure 12, RM0090 6.3.5]
RCC_AHB1ENR_GPIOAEN // For TIM1
| RCC_AHB1ENR_GPIOBEN // For SPI2
| RCC_AHB1ENR_GPIOCEN // For TIM8
| RCC_AHB1ENR_GPIOHEN // For External Oscilator
// Enable DMA1 Controller Clock
| RCC_AHB1ENR_DMA1EN // For SPI2
| RCC_AHB1ENR_DMA2EN; // For TIM1 & TIM8
(void)RCC->AHB1ENR;
We then tell the CPU to respond to the RX DMA transfer complete interval, which is responsible for checking the SPI hardware CRC pass buffer, unpacking the DShot data bits, and calculating each DShot frame’s CRC. This is lower priority than the DShot interrupts themselves, so we assign it priority of 8.
// SPI Transaction completed RX [RM0090 Table 43]
// Priority 8 because it shouldn't interrupt timing-critical DShot interrupts
NVIC_SetPriority(DMA1_Stream3_IRQn, 8);
// Tell CPU to react to the interrupt flag
NVIC_EnableIRQ(DMA1_Stream3_IRQn);
We then enable the SPI peripheral clock, complete the basic SPI configuration, notably enabling the RX and TX DMA requests, before enabling the SPI peripheral itself.
// Enable SPI2 Clock
RCC->APB1ENR |= RCC_APB1ENR_SPI2EN;
(void)RCC->APB1ENR;
SPI2->CR1 =
// Set data capture on first capture edge
(0 << SPI_CR1_CPHA_Pos)
// Set idle low
| (0 << SPI_CR1_CPOL_Pos)
// Set slave mode
| (0 << SPI_CR1_MSTR_Pos)
// Set MSB first
| (0 << SPI_CR1_LSBFIRST_Pos)
// Enable software SSM
| SPI_CR1_SSM
// Set software SSM to be always high
| SPI_CR1_SSI
// Set full duplex
| (0 << SPI_CR1_RXONLY_Pos);
// Enable RX and TX DMA requests when RXNE and TXE flags are set
SPI2->CR2 = SPI_CR2_RXDMAEN | SPI_CR2_TXDMAEN;
// Enable SPI peripheral
SPI2->CR1 |= SPI_CR1_SPE;
Note that I have not finished the polynomial configuration for the hardware CRC.
Then set the pins to alternate function mode with the correct mapping according to this table:
| Port | … | AF5 (SPI1/SPI2/I2S2/I2S2ext) | … |
|---|---|---|---|
| … | … | … | |
| PB13 | … | SPI2_SCK, I2S2_CK | … |
| PB14 | … | SPI2_MISO | … |
| PB15 | … | SPI2_MOSI, I2S2_SD | … |
| … | … | … | … |
// Set SPI pins to alternate function mode
// [RM0090 8.3.7 & Figure 26, DS8626 Table 7]
GPIOB->MODER &=
GPIO_MODER_MODE12_Msk | GPIO_MODER_MODE13_Msk | GPIO_MODER_MODE15_Msk;
// Set SPI pins to correct alternate funciton
// [RM0090 8.3.7 & Figure 26, DS8626 Table 7]
GPIOB->AFR[1] &=
GPIO_AFRH_AFSEL13_Msk | GPIO_AFRH_AFSEL14_Msk | GPIO_AFRH_AFSEL15_Msk;
GPIOB->AFR[1] |=
(5 << GPIO_AFRH_AFSEL13_Pos)
| (5 << GPIO_AFRH_AFSEL14_Pos)
| (5 << GPIO_AFRH_AFSEL15_Pos);
Each pin also has a corresponding OSPEEDR value that enables higher I/O speed but also makes the pin more vulnerable to signal noise. Given the format of the table I won’t include it here, but I settled on the medium value, which would enable, at my voltage, and at a high capacitance, 25MHz SPI, which gives me a healthy overhead.
// Set SPI pins OSPEEDR value to medium [DS8626 Table 50]
GPIOB->OSPEEDR &= GPIO_OSPEEDR_OSPEED13_Msk | GPIO_OSPEEDR_OSPEED13_Msk |
GPIO_OSPEEDR_OSPEED15_Msk;
GPIOB->OSPEEDR |= (1 << GPIO_OSPEEDR_OSPEED13_Pos) |
(1 << GPIO_OSPEEDR_OSPEED14_Pos) |
(1 << GPIO_OSPEEDR_OSPEED13_Pos);
I then set up the RX DMA and mostly finished on the TX DMA, although that is not included here.
// buffer is 9 bytes (11 bit data * 8 motors + 1 CRC byte)
volatile uint8_t cmd_buf[9];
volatile uint8_t erpm_buf[9];
// Reset Stream
DMA1_Stream3->CR = 0;
while (DMA1_Stream0->CR & DMA_SxCR_EN);
DMA1_Stream3->CR =
DMA_SxCR_TCIE // Enable transfer complete interrupt
| (0 << DMA_SxCR_DIR_Pos) // Peripheral to Memory
| DMA_SxCR_MINC // Enable memory increment mode
| (0 << DMA_SxCR_PSIZE_Pos) // Set 8-bit peripheral data size
| (0 << DMA_SxCR_MSIZE_Pos) // Set 8-bit memory data size
// Set priority level to medium. Doesn't actually matter since TIMs and
// IDR are on on DMA2
| (1 << DMA_SxCR_PL_Pos) |
(0 << DMA_SxCR_CHSEL_Pos); // Set channel 0 [RM0090 Table 43]
// Transfer 9 bytes
DMA1_Stream3->NDTR = 9;
// Set source peripheral pointer
DMA1_Stream3->PAR = (uint32_t)&SPI2->DR;
// Set buffer pointers
DMA1_Stream3->M0AR = (uint32_t)cmd_buf;
DMA1_Stream3->FCR = 0; // Enabled direct mode (no FIFO);
Glossing over a lot of details here—like determining which channel to select. I am going to use the transfer complete interrupt enabled here and registered earlier to check whether hardware CRC passed, unpack the data values, calculate each frame’s CRC, and then prepare the values for each pins CCR.