Calc sum in PySpark over date range

Is there any way using lag or some other method to achieve this, so that I don’t have to go through complex operations?

Yes, you can use the lag function in many programming languages to achieve this. The lag function allows you to access the value of a previous row in a dataset or sequence. By using the lag function, you can avoid complex operations and directly retrieve the desired value.

For example, in SQL, you can use the LAG function to retrieve the value from the previous row in a specific column. Here’s an example query:

SELECT column_name, LAG(column_name) OVER (ORDER BY ordering_column) AS previous_value
FROM table_name;

Replace column_name with the column you want to retrieve the previous value from, ordering_column with the column used to determine the order of the rows, and table_name with the name of your table.

In other programming languages, such as Python or R, you can use libraries like pandas or dplyr, respectively, to achieve similar results using the lag function.

Remember to adapt the syntax to the specific programming language and tool you are using.