reliability analyses - Reach Within

PDF Reader
Full Text

i

REACH Within

Preliminary Findings from a Review of Data Systems and Youth Assessments

Submitted To REACH Within November 2013

Reach Within Preliminary Findings from a Review of Data Systems and Youth Assessments For more information, please contact: Alexandra Clay, B.A. [email protected] 303-839-9422 ext. 178

Pallavi Visvanathan, PhD [email protected] 303-839-9422 ext. 194

For General Inquiries/Questions

OMNI Institute

p. 303-839-9422

899 Logan Street, Suite 600

f. 303-839-9420

Denver, CO 80203 www.omni.org

OMNI Contributors: Anu Atre, Maddie Frost, Chandra Winder, Alexis Zimmerman

Table of Contents Introduction ...................................................................................................................................................................... 1 Methods.............................................................................................................................................................................. 2 Measure ......................................................................................................................................................................... 2 Data Preparation ........................................................................................................................................................ 3 Reliability Analyses .................................................................................................................................................... 5 Results ................................................................................................................................................................................. 6 Participant Characteristics ..................................................................................................................................... 6 Outcomes ...................................................................................................................................................................... 8 Review of Findings ....................................................................................................................................................... 10 Recommendations ....................................................................................................................................................... 11 Conclusion ...................................................................................................................................................................... 14 Appendix A .................................................................................................................................................................... XV Appendix B.................................................................................................................................................................... XVI

Figures Figure 1: Map Display of Participating Locations ................................................................................................ 1 Figure 2: Age of Participants ....................................................................................................................................... 7 Figure 3: Gender of Participants ................................................................................................................................ 7 Figure 4: Change in SDQ Scale Scores ..................................................................................................................... 9 Figure 5: Change in SDQ Scale Scores - Males (M) Vs. Females (F) ............................................................ 10

Tables Table 1: Data Submitted and Matched by Site ..................................................................................................... 5 Table 2: Time Between SDQ Completion ............................................................................................................... 6 Table 3: Summary of Movement in SDQ Classifications ................................................................................... 8

Introduction The Bartholomew J. Lawson Foundation for Children, a US based 501(c)(3), developed REACH Grenada in 2008 as a principal program to improve the health and wellbeing of Grenada’s most vulnerable youth. The philosophy of this program originates from the belief that children’s healthy relationships are borne from self-respect as well as through meaningful, consistent attachments to adults and other children. It is also believed that fulfillment in school, work, and play depend on a sense of belonging and feelings of safety and comfort in these contexts. Since its establishment in 2008, the programming has been further developed into a multi-faceted program that utilizes yoga and mindfulness techniques in order to improve self-regulation, increase emotional literacy, and enhance social skills. Programming takes place in a group setting and is currently being provided to formerly maltreated youth living in residential care facilities and in Grenada. The program teaches valuable skills to support children in leading empowered lives to help them, ultimately, overcome adversity. Figure 1 below provides an overview of the residential care facilities in which programming has been implemented.

Figure 1: Map Display of Participating Locations

Prepared by OMNI Institute

1

In 2013, REACH contracted with OMNI Institute (OMNI), a nonprofit research and evaluation firm in Denver, to:  Analyze data from the Strengths and Difficulties Questionnaire (SDQ) collected on youth between September 2010 and December 2012, and  Review evaluation methods and refine current data entry processes used by REACH staff. OMNI is a Colorado-based, nonprofit social science agency with the mission of advancing the public and nonprofit sectors through integrated evaluation research, capacity building, and technology solutions. Since 1976, OMNI has served a range of foundations, governmental agencies, and nonprofits in addressing critical social problems that cut across the fields of public and behavioral health; education and early childhood; youth development; justice; and systems improvement. OMNI utilizes a client-centered, collaborative approach in managing its projects and recognizes the importance of working in partnership with clients to obtain the best possible outcome, allowing flexibility in responding to emerging issues and ensuring that a client’s needs are fully met.

Methods This section of the report provides information regarding the evaluation measure and the methods used to prepare the data.

MEASURE In order to assess the emotional and psychological well-being of the youth that reside in residential care facilities in Grenada, the Strengths and Difficulties Questionnaire (SDQ) was administered to program facilitators, psychologist, yoga instructors, and other onsite staff. Program staff rated youth on a 3-point scale based on behavior they had observed of youth in the past 6 months. The SDQ is a screening assessment used to identify behavioral and emotional strengths and difficulties in children and adolescents aged 3 to 17. The assessment consists of 25 items and measures the following constructs:  Emotional Difficulties (5-items): This is measured by the Emotional Symptoms subscale of the

SDQ and generally assesses youths’ expression of negative emotion, such as worrying, unhappiness/tearfulness, nervousness, and fearfulness.

Prepared by OMNI Institute

2

 Behavioral Difficulties (5-items): This is measured by the Conduct Problems subscale of the

SDQ and generally assesses youths’ problem behaviors, such as temper tantrums, disobedience, fighting, lying, and stealing.  Hyperactivity (5-items): The items within this subscale generally assess behaviors indicative

of hyperactivity and inattention, such as restlessness, fidgeting, or being easily distracted.  Peer Relationship Difficulties (5-items): This is measured by the Peer Problems subscale of the

SDQ and generally assesses observed relationships and interactions between youth and their peers, such as a tendency towards solitariness, and being picked on or bullied by other children.  Prosocial Behavior (5-items): This is measured by the Prosocial subscale of the SDQ and

generally assesses positive and helpful behaviors observed in youth, such as being considerate of other peoples’ feelings, sharing with other children, and volunteering to help others. Scores for each subscale ranged from 0-10; additionally, a Total Difficulties scale score, which ranges from 0-40, is calculated by summing scores from all scales except the prosocial behavior scale. The assessment tool is highly reliable and has been validated across many languages and cultural contexts. There are three versions of the SDQ: Parent-report, teacher-report, and selfreport, each of which are scored differently and designed to assess different age groups of youth. The teacher version, used for the current evaluation, has three versions for different age groups. The assessment designed for four to ten year olds can be found in Appendix A. For more information about the SDQ or to view the measure for other age groups, visit http://www.sdqinfo.org/.

DATA PREPARATION All available SDQ data collected on youth between August 2010 and December 2012 was submitted to OMNI in September 2013. As changes had been made to the method in which the SDQ data were managed internally by REACH staff over the past few years, data were submitted in three separate forms:  Summary reports populated from SDQ’s that were hand-entered into the online scoring tool (www.youthinmind.info).  Paper versions of the SDQ completed by hand.  Excel files that contained item by item results for each site at different time points.

Prepared by OMNI Institute

3

Data listed above were combined by hand entering data into a single excel file. Data cleaning processes were implemented in order to ensure accurate reporting of data and upon an initial review of the data, the following issues were identified:  Missing birth date, gender, and administration date.  Inconsistent formatting of administration date, leading to confusion about the appropriate time point in which data were collected.  Records containing inconsistent spellings of first and/or last name, making it difficult to appropriately match data across time points.  Inconsistent birth date for records that appeared to be completed for the same youth, making it difficult to determine if data across time points were for the same youth.  Data files that were named differently but contained identical data. In order to address these issues, discrepant records and data files were reviewed by REACH staff, data files were resubmitted to OMNI, and OMNI staff re-entered data to combine information for analysis and check data one more time. Once data issues had been fixed, data were pulled into IBM SPSS software for further cleaning and preparation for analyses. In SPSS, data were reformatted and scored based on the scoring rules provided on the SDQ information website. For more information about scoring the SDQ, see Appendix B or visit http://www.sdqinfo.org/. Decisions were made regarding data to include, exclude, and re-code, and a summary of these decisions can be found below.  In the fall of 2010, multiple staff were asked to complete the SDQ for each of the youth with whom they had worked. As a result, many children had behavioral ratings provided by up to four staff members. For consistency, OMNI researchers selected data provided by only one rater – this decision was made based on the rater’s experience with the SDQ and their role within the REACH program. For example, raters who had administered a greater number of SDQs were selected over ones who had less experience completing the SDQ.  Since the SDQ was designed to assess youth aged 3-17, youth who did not fall into this age range were removed from the analyses.  Since the purpose of this evaluation was to assess program impact by looking at change over time, youth who had only one time point of data were removed from the analyses. The majority of these youth came from the Government of Grenada Emergency Shelter (ES) where the length of stay is typically very short unlike at the other program sites.

Prepared by OMNI Institute

4

 Assessments completed less than 5 months after the earliest assessment were removed from the analyses as the SDQ was designed to be administered at 6 month intervals. Once the data were cleaned, OMNI Researchers matched data across time points and identified the appropriate time points of data to include in the analyses. Twenty-two youth had two assessments recorded and nine youth had three assessments recorded. Since the number of youth who had three assessments recorded was small, it was not possible to examine change across more than two time points. Therefore, for the current analyses, change was examined from the first recorded assessment to the next assessment completed at least 5 months later. Table 1 on the next page provides an overview of the data submitted to OMNI, including, the number of assessments submitted, the number of youth who had at least one time point of data upon submission, and the number of youth included in our analyses (i.e., those with valid, clean, and matched data for two time points). As demonstrated in Table 1 below, data for a total of 213 surveys, across 71 REACH program participants, were collected between September 2010 and December 2012. Of the data submitted to OMNI, only 31 youth (44%) had two or more valid assessments per our data cleaning steps described above.

Table 1: Data Submitted and Matched by Site # of Surveys

# of Youth

Matched Youth

Bel Air Home for Children and Adolescents (BA)

56

22

14

Father Mallaghan's Home for Boys (FM)

15

10

5

Government of Grenada Emergency Shelter (ES)

15

15

0

Queen Elizabeth Home (QEH)

127

24

12

Total

213

71

31

Site

RELIABILITY ANALYSES Although the SDQ is a widely used and well validated measure, reliability tests were conducted to measure the consistency of responses to items within the total difficulties scale and each of the 5 subscales. The purpose of combining data into a scale is that a scale, as opposed to individual items, provides stronger and more valid results. However, the ability to assess concepts using a scale is only possible if participants respond consistently to each of the related items. A high reliability coefficient, or score, indicates that respondents answered related questions in a similar way. A low reliability coefficient suggests that there is little consistency in how respondents are answering related questions and can be due to one or more factors including, having a small

Prepared by OMNI Institute

5

number of respondents, respondents not understanding one or more of the questions being asked, or use of questions that were created for a different population than the one currently being examined. When reliability coefficients are lower than recommended cutoffs, it is not advisable to use and interpret scale scores. Results from the reliability analyses indicated adequate reliability for the total scale and three of the five subscales (alphas > .70). The emotional difficulties scale demonstrated somewhat low reliability (alpha = .66) at pre but had adequate reliability at post. This scale was retained in analyses but findings should be interpreted cautiously. The peer relationship difficulties scale demonstrated low reliability (pre alpha = 0.47 and post alpha=0.57) suggesting that there is variation in how SDQ raters think about or interpret these items. Therefore, this scale was excluded from further analyses. Note that item level data were available for only 19 of the 31 SDQs representing the first time point. Thus, the reliability coefficients may be either overestimated or underestimated. Suggestions to address these issues in future data collection are made in the recommendations section.

Results In order to determine when the first SDQ was completed with regards to the second, the average, minimum, and maximum number of months between assessments were calculated. The average number of days was measured using the median. The median is typically used to measure the average when data are not symmetrically distributed as this metric is unaffected by outliers and thereby, provides a more accurate reflection of the “true” average value. As shown in Table 2 below, SDQs were, on average, completed 11 months apart, with the time between assessments ranging from 11 to 13 months.

Table 2: Time Between SDQ Completion Months Min

11

Max

13

Median

11.00

PARTICIPANT CHARACTERISTICS Figure 2 and 3 on the next page summarize the characteristics of youth included in our analyses. Participants’ age was calculated using birth date and administration date, and represents how old

Prepared by OMNI Institute

6

the participant was at the time of their first assessment. The age of REACH participants included in our analyses ranged from three to seventeen years old with the average age being eight years old. For ease of reporting, ages were broken down into three categories which were designed to reflect the age groups for which the SDQ was intended (i.e., 3 year olds, 4-10 year olds, and 11-17 year olds). As you can see in Figure 2 below, most participants fell within the 4-10 years old range and about a third were 11-17 years old (58% and 35%, respectively).

Figure 2: Age of Participants 100% 80% 58%

60%

35%

40% 20%

6%

0% 3

4-10

11-17

Participants’ gender was also assessed and results can be found in Figure 3 below. Overall, there was an equal representation of male and female participants, with slightly more male participants.

Figure 3: Gender of Participants

45% 55%

Male

Female

Prepared by OMNI Institute

7

OUTCOMES For each youth, SDQ scale scores were classified as normal, borderline, and abnormal, and crosstabulation analyses were conducted to identify the proportion of youth who stayed within the same classification as well as the proportion that moved into a new classification, indicating either improvement or regression. Results are presented in Table 3 below. The “Improved” category included youth who moved from abnormal to either borderline or normal as well as youth who moved from borderline to normal. The “Regressed” category included youth who moved from normal to either borderline or abnormal as well as youth who moved from borderline to abnormal. The “Stayed Normal” category included youth who started in the normal category and did not move out of this category. The “Stayed Borderline/Abnormal” category included youth who started either borderline or abnormal and stayed within these categories. Overall, results indicated that the majority of youth improved or stayed within the normal classification one year later (combined percentages ranged from 55% on the hyperactivity subscale to 84% on the emotional difficulties subscale). Between 16% on the emotional difficulties subscale and 45% on the hyperactivity subscale stayed within the borderline/abnormal classifications or regressed.

Table 3: Summary of Movement in SDQ Classifications Scale Emotional Difficulties Behavioral Difficulties Hyperactivity Prosocial Behavior Total Difficulties Scale

Total N 31 31 31 31 31

Improved n 7 11 9 10 12

% 23% 35% 29% 32% 39%

Stayed Normal n 19 9 8 14 8

% 61% 29% 26% 45% 26%

Stayed Borderline/ Abnormal n 1 9 9 4 7

% 3% 29% 29% 13% 23%

Regressed n 4 2 5 3 4

% 13% 6% 16% 10% 13%

Paired sample t-tests were also conducted on the matched sample in order to detect changes in psychosocial well-being over time and to determine if these changes were statistically significant. Overall, the total difficulties scale score decreased across time and this change was in the desired direction, and found to be marginally significant (p=.07). Scale-level findings, presented in Figure 4 on the next page, illustrate that trends for the four subscales presented were in the desired direction. Children demonstrated a significant improvement in prosocial behavior and a significant reduction in behavioral difficulties (p-value’s < 0.01). Changes in emotional difficulties and hyperactivity were not statistically significant (p values > .10).

Prepared by OMNI Institute

8

Figure 4: Change in SDQ Scale Scores 10 ** = Statistically Significant (p<.05)

8 6.94 6

4

6.26 5.81

5.55

4.19 3.29

2

2.94 2.45

0 Time 1 Mean

Time 2 Mean

Analyses were also conducted to test whether results varied as a function of gender (i.e., in 2x2 ANOVAs with gender as a between-subjects factor). Results indicated no interaction with gender in the case of total difficulties, prosocial behavior, and behavioral difficulties. In other words, changes in these three domains did not vary by gender. However, with regard to emotional difficulties and hyperactivity, there was a marginally significant interaction between time and gender (p = .06 and p = .08, respectively), suggesting that change over time varied for males versus females. In particular, males, but not females, demonstrated a significant decrease in emotional difficulties and females, but not males, demonstrated a significant decrease in hyperactivity (pvalues < .05). Figure 5 on the next page depicts Time 1 and Time 2 scores for males and females for these two scales.

Prepared by OMNI Institute

9

Figure 5: Change in SDQ Scale Scores - Males (M) Vs. Females (F) 10 ** = Statistically Significant (p<.05)

8

6

4

6.50 6.06

Hyperactivity (M)

6.29

4.64 3.53

3.50

3.00 2

1.59

0 Time 1 Mean

Time 2 Mean

Review of Findings Results of this evaluation provide tentative support for the positive impact of the REACH program on the psychosocial well-being of youth served. On average, children demonstrate improvement in prosocial behavior and a reduction in behavioral difficulties. Additionally, girls served by REACH show improvements in hyperactive and inattentive behavior and boys show a reduction in emotional difficulties. Results should be considered tentative and exploratory in light of a number of data limitations. First, multiple data points were available for only 44% of the children included in the dataset provided to OMNI. Thus, the current results may not accurately reflect the experience of all, or even most, children served by the program. Second, there is likely a fair bit of variation in the proximity of the time 1 assessment to the start of service provision. In other words, it is possible that the time 1 assessment captured behavioral and emotional challenges close to the time of admission into the REACH program for some and several months after admission into the REACH

Prepared by OMNI Institute

10

program for others. As a result, the change in scores documented here may not represent the true change that would be seen if all children had their time 1 assessment at the time of admission into REACH. Third, assessments for children were often conducted by different raters with varying levels of experience with the SDQ and/or understanding of children’s developmental and behavioral needs. Although this may be unavoidable due to staff turnover, it is possible that having different raters introduces an element of inconsistency in how behavioral and psychosocial difficulties are rated from child to child, and even for the same child across time. The poor reliability coefficients obtained for the peer relationship difficulties scale and the emotional difficulties scale also suggest that staff may not be similarly rating children’s behavioral and emotional difficulties. Finally, there were a few different sources of inconsistency in how data were tracked (see description in the Methods section) and although many of these were resolved it may be that results were influenced by remaining issues1.

Recommendations A review of the data provided for analysis surfaced multiple opportunities for improvements. Suggestions for improving the collection of high quality data and recommendations for future evaluation are summarized below. We hope that these recommendations will help ensure that results from future evaluations are representative of all children served by the program and that data capture the impact of the REACH Within program across multiple domains.  Background Information o Collecting appropriate demographic (i.e., age, DOB, race, gender, grade level, education) and background (e.g., concurrent behavioral or mental health information, family history, trauma history, etc.) information would enable a better understanding of the characteristics of those who are served and would allow for future analyses to explore outcomes across different groups of individuals (e.g., males and females, age groups, etc.). o Data collected for this component would be collected once upon intake into residential care facilities.  Outcome Measurement o It is recommended that the SDQ be reviewed to assess whether it adequately captures change in domains that are specifically targeted by program services. The SDQ is designed to provide a broad assessment in multiple psychosocial and behavioral arenas but may not capture improvements in skill acquisition that result

1

For example, the SDQ has slightly differing versions of the questionnaire for different age groups but it appears that these versions may not have been accurately applied with REACH participants. Prepared by OMNI Institute

11

from targeted interventions. If determined to be necessary, it may be helpful to develop an instrument that is tailored to the program curriculum in order to measure the specific outcomes for the desired intervention. o Any evaluation implemented at the Government of Grenada Emergency Shelter will likely need to be structured differently than implemented at other sites due to the many differences in the duration that the children are served in these settings, the nature of the services provided, and the expected outcomes of these services. Thus, it is recommended that evaluation methods be reviewed and targeted to better fit this particular setting. o It is also recommended that assessments be appropriately designed and/or implemented for youth of different age groups. It is unlikely to find an instrument that is appropriate for very young children as well as for adolescents and data quality may be affected when survey items not designed for the targeted age group are used.  Pre- Post-Evaluation Method o The current method of SDQ completion at annual intervals introduces a great deal of variation in the proximity of the assessment to the initiation of children’s participation with REACH. o In order to better assess the impact of programming on outcomes, it is recommended that all evaluation measures be administered upon intake into the REACH program and at regular intervals thereafter. Or, measures should be administered immediately prior to any services being delivered and immediately after the services end.  Program Fidelity o Although it is expected that minor modifications are made to the curriculum at each site to ensure appropriateness for the population being served (e.g., different activities may be more appropriate for certain age groups), such modifications may have an impact on the effectiveness of the curriculum. As the new curriculum is implemented across participating sites, it will be important to examine the effect of fidelity to the original design of the curriculum on child outcomes. o This can be achieved through the development of a Fidelity Checklist that program facilitators can complete after each session in order to rate the degree to which targeted concepts and skills were covered in the session.  Program Satisfaction o In order to monitor program performance, it is critical to receive feedback from youth who are receiving services as well as staff who are facilitating the program. Therefore, it is suggested that a “satisfaction” survey be developed for both youth and facilitators to be completed directly following the program. o Collecting structured and direct feedback could yield important information regarding the strengths and challenges experienced, which could, ultimately, be used to help direct program improvement.

Prepared by OMNI Institute

12

The youth satisfaction survey should be brief, age-appropriate, and assess constructs, such as whether children liked the program, what they liked/disliked about the program, and whether they perceive the program activities to be helpful. o The facilitator satisfaction survey should also be brief and would provide facilitators with an opportunity to provide their perceptions of the impact of the program, particular components that they believe are most effective and why, and also any components that are challenging to implement with suggestions for improvement. o The information obtained from this component can be utilized to continually monitor and improve ongoing efforts on a regular basis and would support findings (such as program effectiveness) from the outcome evaluation.  Services Received o Tracking the specific services each youth receives could be useful information to have, as funders often request information such as the type of services delivered and the number of youth being reached by those services in a given year.  Program Dosage o When conducting analyses, it would be useful to know whether children attended all, most, or only a few of the program sessions as the expectation for positive impact of the program will likely depend on this factor. This can be achieved by tracking program attendance at each session offered. With this information, analyses could be conducted to identify the minimum number of sessions (i.e., dose) required for the program to have a positive impact on participating youth. o Additionally, by tracking the enrollment and discharge date, analyses could be conducted to determine if the amount of time youth spend housed in residential care facilities has a positive impact on participating youth. o

Recommendations regarding the process by which information is collected, tracked, and managed on an ongoing basis are provided below:  Consistent Rater Across Time o It is important to consider the staff member designated to complete the assessment as the staff members’ position and role within the program likely affects the type and level of interaction they have with youth enrolled in the program. For example, an onsite psychologist may have more knowledge of the psychological trauma youth are experiencing, whereas teachers, yoga instructors, and program facilitators would have a better sense of behaviors such as peer interaction. o Therefore, it is recommended that the staff member completing the survey for a single youth across time be consistent. This approach is, understandably, not feasible when conducting annual administrations as the program may experience staff turnover from year to year. However, utilization of the pre/post data collection approach, as suggested above, may make this more possible.

Prepared by OMNI Institute

13

Additionally, it is suggested that the staff members’ position and level of interaction with the youth be consistent across sites and cohorts.  Survey Administration Training o It is important that staff members completing the survey have a clear understanding of the measurement tool and what each question means. This can be achieved by offering training on the measure to ensure that staff are adequately and consistently trained across participating sites. o As mentioned in the methods section, low reliability was found for the peer relationship difficulties scale which could be attributed to staff having different levels of understanding of these questions. If the SDQ is used in future evaluations, it will be important that further explanation, clarification, and training be provided to staff regarding the items on this scale.  Refinement of Data Entry and Management System o As illustrated in the methods section, many steps were taken to clean and prepare data for analyses. While data cleaning is essential, the amount of time and resources spent completing this work can be reduced through the creation of a consistent and systematic data entry system. A systematic data entry system should be a common place in which all program data can be entered and should allow for data to be directly and easily pulled into statistical analysis software (i.e., one line per youth as opposed to one tab per youth).  Protection of Youth Information o It is important to ensure that adequate steps are taken to preserve the rights and confidentiality of children, staff, and families participating in the evaluation. o Below are a few examples for how this could be achieved in future evaluations if deemed appropriate:  Use of consent/assent forms.  De-identifying information sent to the evaluator (e.g., use of ID’s instead of first and last names).  Sending sensitive data through a protected and secure medium (e.g., Dropbox or Egnyte instead of through email) and making sure documents are password protected. o

Conclusion A review of existing evaluation data provides preliminary support for the positive impact of the REACH program on children served and also suggests that effects in some domains vary by gender. Review of data collected thus far, and the data systems used to collect information, suggests that the REACH Within program would benefit from greater systematization and consistency in data collection to better document services provided as well as the impact of these services on the children aided by the program.

Prepared by OMNI Institute

14

Appendix A

Prepared by OMNI Institute

XV

Appendix B

Prepared by OMNI Institute

XVI

Prepared by OMNI Institute

XVII

reliability analyses - Reach Within

i REACH Within Preliminary Findings from a Review of Data Systems and Youth Assessments Submitted To REACH Within November 2013 Reach Within Prel...

Download PDF

672KB Sizes 2 Downloads 9 Views

reliability analyses - Reach Within

reliability analyses - Reach Within

Recommend Documents